cfallin commented on issue #4071:
I plan to measure perf impacts using Sightglass tomorrow.
github-actions[bot] commented on issue #4071:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
cfallin commented on issue #4071:
Sightglass results:
- compilation:
spidermonkey
compiles 1-5% faster (99% conf); no statistically-significant delta on other benchmarks- runtime:
meshoptimizer
runs 0-2% faster (99% conf); no statistically-significant delta on other benchmarks<details>
<summary>Raw output</summary>
compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasmNo difference in performance.
[96952328 108855882.00 153663990] new.so
[97420583 126474185.60 236630061] old.socompilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm
No difference in performance.
[290866158 326577946.00 461006510] new.so
[292270968 379434523.20 709912570] old.soexecution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm
No difference in performance.
[329790 386819.00 492639] new.so
[359659 432876.40 542195] old.soexecution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm
No difference in performance.
[109926 128935.20 164207] new.so
[119882 144286.80 180725] old.so
compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm
No difference in performance.
[25885394 40076919.40 125071043] new.so
[26219789 39964380.85 129711310] old.socompilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm
No difference in performance.
[77658329 120234399.75 375223492] new.so
[78661871 119896890.20 389146306] old.soexecution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm
No difference in performance.
[359827 511799.55 718133] new.so
[367750 552108.65 931043] old.soexecution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm
No difference in performance.
[119938 170594.25 239371] new.so
[122579 184029.90 310338] old.so
compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm
No difference in performance.
[118172745 129572819.80 169109539] new.so
[115979079 148610225.20 264135389] old.socompilation :: cycles :: benchmarks-next/bz2/benchmark.wasm
No difference in performance.
[354528901 388730155.20 507343881] new.so
[347947706 445844089.40 792430008] old.soexecution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm
No difference in performance.
[27040663 36381782.80 43965436] new.so
[27063345 34608139.00 47209747] old.soexecution :: cycles :: benchmarks-next/bz2/benchmark.wasm
No difference in performance.
[81124432 109148633.40 131900278] new.so
[81192478 103827542.20 141633504] old.so
compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm
No difference in performance.
[1282376053 1326468067.00 1451645218] new.so
[1272432714 1357904857.20 1654942504] old.socompilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm
No difference in performance.
[427446460 442143377.80 483867902] new.so
[424132108 452622008.00 551631726] old.soexecution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm
Δ = 147589246.80 ± 98683346.93 (confidence = 99%)
new.so is 1.00x to 1.02x faster than old.so!
old.so is 0.98x to 1.00x faster than new.so![15953718734 15986742228.20 16019689935] new.so
[16083185443 16134331475.00 16216644357] old.soexecution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm
Δ = 49195008.40 ± 32893508.70 (confidence = 99%)
new.so is 1.00x to 1.02x faster than old.so!
old.so is 0.98x to 1.00x faster than new.so![5317754171 5328761687.40 5339743942] new.so
[5360908506 5377956695.80 5405393539] old.so
compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm
No difference in performance.
[735805278 782317116.80 900108996] new.so
[731913754 837217882.00 1190772847] old.socompilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm
No difference in performance.
[245261607 260765122.00 300027990] new.so
[243964468 279064868.20 396913247] old.soexecution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm
No difference in performance.
[6447417 15244423.00 23764193] new.so
[6230053 14582477.00 19118086] old.soexecution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm
No difference in performance.
[2149079 5081332.60 7921177] new.so
[2076626 4860690.20 6372518] old.so
compilation :: nanoseconds :: benchmarks-next/spidermonkey/benchmark.wasm
Δ = 162675183.40 ± 113727657.28 (confidence = 99%)
new.so is 1.01x to 1.05x faster than old.so!
old.so is 0.95x to 0.99x faster than new.so![5118140606 5134329955.80 5158376785] new.so
[5253448682 5297005139.20 5386482615] old.socompilation :: cycles :: benchmarks-next/spidermonkey/benchmark.wasm
Δ = 488040638.00 ± 341193520.58 (confidence = 99%)
new.so is 1.01x to 1.05x faster than old.so!
old.so is 0.95x to 0.99x faster than new.so![15354896528 15403466078.40 15475608797] new.so
[15760833306 15891506716.40 16159947443] old.soexecution :: cycles :: benchmarks-next/spidermonkey/benchmark.wasm
No difference in performance.
[9543126199 9582239412.20 9626118333] new.so
[9448596655 9495030913.20 9584393744] old.soexecution :: nanoseconds :: benchmarks-next/spidermonkey/benchmark.wasm
No difference in performance.
[3180943722 3193981056.20 3208606911] new.so
[3149434848 3164912455.40 3194699145] old.so
</details>I suspect that with alias analysis and redundant-load elimination, this might get a little better still: in some cases the same heap base pointer is loaded multiple times, and these lowering patterns don't handle that (the second pointer is a new address, and the intervening load is a side-effect over which we can't move the first load).
fitzgen commented on issue #4071:
Out of curiosity, have you verified that this properly fuses our
externref
write barriers?
cfallin commented on issue #4071:
Out of curiosity, have you verified that this properly fuses our
externref
write barriers?I just took a look, and it seems that this (it looks like the inc/dec is here?) is actually an atomic operation, so this PR does not handle it. There are also direct-on-memory forms for atomic add/sub (
lock add [mem], reg/imm
) which we could do if this becomes important for performance.
fitzgen commented on issue #4071:
Ah that's right, I forgot that our ref counting became atomic.
cfallin commented on issue #4071:
Thanks! Added a runtest as well.
Last updated: Dec 23 2024 at 12:05 UTC