Stream: git-wasmtime

Topic: wasmtime / issue #6135 simple_gvn: recognize commutative ...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 03 2023 at 17:11):

Kmeakin commented on issue #6135:

Didn't realize when I made this branch, but this fixes https://github.com/bytecodealliance/wasmtime/issues/6129!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 03 2023 at 21:31):

Kmeakin commented on issue #6135:

Thanks. Are fmin_pseudo and fmax_pseudo commutative as well then?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 03 2023 at 21:32):

Kmeakin commented on issue #6135:

We plan to remove all the non-egraph parts sometime soon, so it isn't really worth making this change here. Instead it should happen somewhere in the vicinity of insert_pure_enode in cranelift/codegen/src/egraph.rs. The most similar place to what you've done here would be NewOrExistingInst::get_inst_key.

I'll add an egraph implementation in another commit ASAP

view this post on Zulip Wasmtime GitHub notifications bot (Apr 03 2023 at 21:32):

cfallin commented on issue #6135:

  • fmin and fmax are indeed commutative, because we've specifically defined away the weird x86-specific behavior that privileges the first operand in some NaN cases. (This "canonicalization" is, incidentally, what some of the opcodes in the Relaxed-SIMD Wasm proposal are looking to, well, relax; but those will be different opcodes.)

Thanks. Are fmin_pseudo and fmax_pseudo commutative as well then?

No, unfortunately not (or at least, not on all CPUs).

view this post on Zulip Wasmtime GitHub notifications bot (Apr 03 2023 at 21:49):

jameysharp commented on issue #6135:

I was curious whether fmin_pseudo and fmax_pseudo are commutative as well. I think the answer is a well-defined "no": if either argument is NaN or both arguments are ±0, the result is defined to always be the first argument. So swapping the arguments can produce a different NaN or a differently-signed zero.

The WebAssembly spec for fmin/fmax nearly guarantees that the results will be the same if the arguments are swapped, except that if both arguments are NaN, it's not specified which one will be returned. But that unspecified case means we can choose whichever one we want while remaining conforming, so it's still okay to treat it as commutative. That is a little weird for our "no undefined behavior" policy in Cranelift though…

Also, today I learned that signed integer min/max produces the WebAssembly-specified result for fmin/fmax as long as neither argument is NaN. Neat.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 03 2023 at 21:50):

Kmeakin commented on issue #6135:

However, I think it's better to rewrite the actual instructions in the data-flow graph

Could you point me to the correct place to make this change? The egraph code is quite complex

view this post on Zulip Wasmtime GitHub notifications bot (Apr 03 2023 at 22:00):

jameysharp commented on issue #6135:

Yeah, there's a lot of subtlety to the egraph code!

I think the best option here is in insert_pure_enode. It's given a NewOrExistingInst which defines the two cases we care about here:

In both cases we want to do this before the first use of gvn_map in that function, so that it's normalized before we compute a hash key for it.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 03 2023 at 23:52):

jameysharp commented on issue #6135:

Nice idea with the reflexive icmp/fcmp normalization. That one shouldn't change the results of an egraph pass since we have rules that rewrite those cases to constants, but a hit in the GVN map means we can avoid calling simplify at all, so it's still worth doing.

I'm not entirely sold on the "Change formatting of normalize_in_place" commit—I think especially after adding the reflexive cases that it was more clear with a single match on IntCompare and a single match on FloatCompare—but I'm not inclined to hold up merging this PR for that.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 04 2023 at 18:56):

jameysharp commented on issue #6135:

I remembered only after merging this that it might cause a performance regression, because it breaks these rules for moving constants. I mentioned this in the "Challenges" section of #6129 but then forgot about it.

The pulldown-cmark and spidermonkey benchmarks in Sightglass show small regressions, when compared against the immediately preceding commit on main:

compilation :: cpu-cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 21305700.50 ± 9802702.99 (confidence = 99%)

  main-bf1aaba06.so is 1.01x to 1.04x faster than normalize-gvn-57e42d0c4.so!

  [798371867 803835104.60 841073014] main-bf1aaba06.so
  [814510194 825140805.10 848435012] normalize-gvn-57e42d0c4.so

compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 16370993.55 ± 111369.32 (confidence = 99%)

  main-bf1aaba06.so is 1.01x to 1.01x faster than normalize-gvn-57e42d0c4.so!

  [1566424391 1566654680.05 1566986371] main-bf1aaba06.so
  [1582847533 1583025673.60 1583356398] normalize-gvn-57e42d0c4.so

execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 1289.05 ± 201.02 (confidence = 99%)

  main-bf1aaba06.so is 1.00x to 1.00x faster than normalize-gvn-57e42d0c4.so!

  [21058403 21058589.05 21059116] main-bf1aaba06.so
  [21059745 21059878.10 21060457] normalize-gvn-57e42d0c4.so
compilation :: cpu-cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 414681539.40 ± 180320641.50 (confidence = 99%)

  main-bf1aaba06.so is 1.01x to 1.03x faster than normalize-gvn-57e42d0c4.so!

  [19755236415 20095956851.10 20596830121] main-bf1aaba06.so
  [20147733881 20510638390.50 20966561627] normalize-gvn-57e42d0c4.so

compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 483524391.30 ± 2430599.58 (confidence = 99%)

  main-bf1aaba06.so is 1.01x to 1.01x faster than normalize-gvn-57e42d0c4.so!

  [37523394580 37531517182.20 37533623288] main-bf1aaba06.so
  [38006693589 38015041573.50 38016782123] normalize-gvn-57e42d0c4.so

execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 4010130.85 ± 77289.63 (confidence = 99%)

  normalize-gvn-57e42d0c4.so is 1.00x to 1.00x faster than main-bf1aaba06.so!

  [2677225595 2677345556.30 2677547575] main-bf1aaba06.so
  [2673233921 2673335425.45 2673499424] normalize-gvn-57e42d0c4.so

The spidermonkey benchmark actually runs a tiny bit faster with this PR, showing that in this case merging more operations through GVN is a slightly bigger win than the loss due to the regression in constant-folding.

This PR is clearly a good idea but we should figure out how to fix constant-folding. We may decide to temporarily revert this PR while sorting that out.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 04 2023 at 19:05):

cfallin commented on issue #6135:

Thanks for catching the regression, @jameysharp (and sorry for not noticing this either). And thanks for the PR in any case @Kmeakin as I think it's still generally a good idea to do something about commutativity!

IMHO with compilation regressions up to 3-4% we should probably revert and see if we can find a way to do better: such a hit is acceptable if we also see gains in runtime performance (maybe, with a 1:1 rule-of-thumb, 3-4% speedups) but otherwise the numbers as-is are not quite there.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 04 2023 at 20:12):

jameysharp commented on issue #6135:

I'm confident we can sort out the regression. So we'll revert this PR temporarily but we should be able to merge it again.

It's possible that there's a compile-time cost to the normalization itself, but I think that's unlikely. We're just adding a couple compares and a conditional swap per CLIF instruction, and that time should be dwarfed by the hash-map lookup that immediately follows it.

It's also possible that higher hit-rates in the GVN map make compilation slower, but I can't imagine how.

So I'm assuming that this regression is only due to a conflict with the existing ISLE rules.

A straightforward but tedious solution would be to find all mid-end rules that match a constant on the right, and if the matched operator is commutative, add the corresponding rule with the constant on the left. The associativity constant-folding rules need similar treatment.

The more satisfying fix would be to deal with #6128 and then write rules with the appropriate commutative matches baked in.


Last updated: Oct 23 2024 at 20:03 UTC