cfallin commented on issue #5909:
I have a vague sense that it would be nice to come up with primitives at the ISLE level of some sort so we could write rules about
vconst
hoisting specifically, but I won't block anything here on that. This is a totally reasonable approach and heuristic for now, I think.I was worried about impact on SpiderMonkey.wasm in particular though, because constant rematerialization (of the integer variety) was important there and there is one giant loop in particular, the interpreter loop, with high register pressure. However I think we're fine because remat is a separate mechanism that overrides LICM, and the unchanged filetests in
licm.clif
seem to confirm this. Just to be extra-safe though, would you mind testing SM via Sightglass before we merge?
alexcrichton commented on issue #5909:
Oh I wonder if this works....
/bench_x64
jlb6740 commented on issue #5909:
compilation :: cycles :: benchmarks/bz2/benchmark.wasm
Δ = 20386224.96 ± 9458874.85 (confidence = 99%)
commit.so is 1.05x to 1.13x faster than main.so!
[206699528 221541225.76 248828442] commit.so
[225941572 241927450.72 280238918] main.soinstantiation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[189136 206701.52 235040] commit.so
[191910 213628.96 382480] main.soinstantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[490272 522666.00 655030] commit.so
[482422 508389.36 534288] main.soinstantiation :: cycles :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[163288 183903.04 236106] commit.so
[165690 179144.64 191834] main.socompilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[274605536 301715860.72 334694818] commit.so
[275865682 300141379.52 353410662] main.soexecution :: cycles :: benchmarks/bz2/benchmark.wasm
No difference in performance.
[112537148 115649290.16 119222514] commit.so
[112444412 116049765.20 117620050] main.soexecution :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm
No difference in performance.
[8872564 9065157.84 9461668] commit.so
[8896812 9034417.84 9306780] main.socompilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[6765882046 6891988617.28 7001096270] commit.so
[6782510656 6910459005.44 6988000084] main.soexecution :: cycles :: benchmarks/spidermonkey/benchmark.wasm
No difference in performance.
[1036212126 1046099038.96 1067491184] commit.so
[1033616880 1045758186.64 1057583540] main.so
alexcrichton commented on issue #5909:
Locally the numbers I got were:
main.so is 1.06x to 1.08x faster than licm.so!
-compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm
main.so is 1.05x to 1.05x faster than licm.so!
-compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm
licm.so is 1.01x to 1.05x faster than main.so!
-execution :: cycles :: benchmarks/bz2/benchmark.wasm
with everything else at "no difference"
cfallin commented on issue #5909:
main.so is 1.06x to 1.08x faster than licm.so!
-compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm
main.so is 1.05x to 1.05x faster than licm.so!
-compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm
Hmm -- that's kind of unfortunate; I would've hoped for less impact. Probably from increased regalloc time due to higher register pressure? (
perf diff
could tell us?)If so, one way we could lessen the impact here is to apply the heuristic only to certain opcodes -- define a predicate called something like
Opcode::is_expensive_nullary()
true only forvconst
initially and use that to conditionalize your change to the hoist-level.(That raises the question too: what other opcodes are being hoisted?)
alexcrichton commented on issue #5909:
Hm ok as I remeasure this I think that I'm just showing noise on my machine. The exact same
*.cwasm
is produced before and after this change forspidermonkey.wasm
in the sightglass repository, and the input wasm module validates without the simd feature enabled so there's definitely novconst
anywhere in there.I know @jameysharp you mentioned awhile back that you went to great lengths to measure small changes in performance, do you know if it'd be easy-ish to use that setup to measure a change like this?
cfallin commented on issue #5909:
The exact same
*.cwasm
is produced before and after this change forspidermonkey.wasm
in the sightglass repository,Given that, I'm happy to see this merge then! "Exactly the same code produced" is about as rigorous an argument for "no perf impact" as one can make :-)
jameysharp commented on issue #5909:
For future reference, my notes on low-variance profiling are at https://github.com/bytecodealliance/sightglass/blob/main/docs/cpu-isolation.md if that helps.
Last updated: Nov 22 2024 at 17:03 UTC