wasmtime / issue #4071 x64 backend: add lowerings with l... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #4071 x64 backend: add lowerings with l...

Wasmtime GitHub notifications bot (Apr 26 2022 at 01:33):

I plan to measure perf impacts using Sightglass tomorrow.

Wasmtime GitHub notifications bot (Apr 26 2022 at 02:52):

github-actions[bot] commented on issue #4071:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:machinst", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

cfallin: isle

fitzgen: isle

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Apr 26 2022 at 16:50):

cfallin commented on issue #4071:

Sightglass results:

compilation: spidermonkey compiles 1-5% faster (99% conf); no statistically-significant delta on other benchmarks

runtime: meshoptimizer runs 0-2% faster (99% conf); no statistically-significant delta on other benchmarks

<details>
<summary>Raw output</summary>
compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

No difference in performance.

[96952328 108855882.00 153663990] new.so
[97420583 126474185.60 236630061] old.so

compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

No difference in performance.

[290866158 326577946.00 461006510] new.so
[292270968 379434523.20 709912570] old.so

execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

No difference in performance.

[329790 386819.00 492639] new.so
[359659 432876.40 542195] old.so

execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

No difference in performance.

[109926 128935.20 164207] new.so
[119882 144286.80 180725] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[25885394 40076919.40 125071043] new.so
[26219789 39964380.85 129711310] old.so

compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[77658329 120234399.75 375223492] new.so
[78661871 119896890.20 389146306] old.so

execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[359827 511799.55 718133] new.so
[367750 552108.65 931043] old.so

execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[119938 170594.25 239371] new.so
[122579 184029.90 310338] old.so

compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[118172745 129572819.80 169109539] new.so
[115979079 148610225.20 264135389] old.so

compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[354528901 388730155.20 507343881] new.so
[347947706 445844089.40 792430008] old.so

execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[27040663 36381782.80 43965436] new.so
[27063345 34608139.00 47209747] old.so

execution :: cycles :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[81124432 109148633.40 131900278] new.so
[81192478 103827542.20 141633504] old.so

compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

No difference in performance.

[1282376053 1326468067.00 1451645218] new.so
[1272432714 1357904857.20 1654942504] old.so

compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

No difference in performance.

[427446460 442143377.80 483867902] new.so
[424132108 452622008.00 551631726] old.so

execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 147589246.80 ± 98683346.93 (confidence = 99%)

new.so is 1.00x to 1.02x faster than old.so!
old.so is 0.98x to 1.00x faster than new.so!

[15953718734 15986742228.20 16019689935] new.so
[16083185443 16134331475.00 16216644357] old.so

execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 49195008.40 ± 32893508.70 (confidence = 99%)

new.so is 1.00x to 1.02x faster than old.so!
old.so is 0.98x to 1.00x faster than new.so!

[5317754171 5328761687.40 5339743942] new.so
[5360908506 5377956695.80 5405393539] old.so

compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[735805278 782317116.80 900108996] new.so
[731913754 837217882.00 1190772847] old.so

compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[245261607 260765122.00 300027990] new.so
[243964468 279064868.20 396913247] old.so

execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[6447417 15244423.00 23764193] new.so
[6230053 14582477.00 19118086] old.so

execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[2149079 5081332.60 7921177] new.so
[2076626 4860690.20 6372518] old.so

compilation :: nanoseconds :: benchmarks-next/spidermonkey/benchmark.wasm

Δ = 162675183.40 ± 113727657.28 (confidence = 99%)

new.so is 1.01x to 1.05x faster than old.so!
old.so is 0.95x to 0.99x faster than new.so!

[5118140606 5134329955.80 5158376785] new.so
[5253448682 5297005139.20 5386482615] old.so

compilation :: cycles :: benchmarks-next/spidermonkey/benchmark.wasm

Δ = 488040638.00 ± 341193520.58 (confidence = 99%)

new.so is 1.01x to 1.05x faster than old.so!
old.so is 0.95x to 0.99x faster than new.so!

[15354896528 15403466078.40 15475608797] new.so
[15760833306 15891506716.40 16159947443] old.so

execution :: cycles :: benchmarks-next/spidermonkey/benchmark.wasm

No difference in performance.

[9543126199 9582239412.20 9626118333] new.so
[9448596655 9495030913.20 9584393744] old.so

execution :: nanoseconds :: benchmarks-next/spidermonkey/benchmark.wasm

No difference in performance.

[3180943722 3193981056.20 3208606911] new.so
[3149434848 3164912455.40 3194699145] old.so
</details>

I suspect that with alias analysis and redundant-load elimination, this might get a little better still: in some cases the same heap base pointer is loaded multiple times, and these lowering patterns don't handle that (the second pointer is a new address, and the intervening load is a side-effect over which we can't move the first load).

Wasmtime GitHub notifications bot (Apr 26 2022 at 16:50):

fitzgen commented on issue #4071:

Out of curiosity, have you verified that this properly fuses our externref write barriers?

Wasmtime GitHub notifications bot (Apr 26 2022 at 17:01):

cfallin commented on issue #4071:

Out of curiosity, have you verified that this properly fuses our externref write barriers?

I just took a look, and it seems that this (it looks like the inc/dec is here?) is actually an atomic operation, so this PR does not handle it. There are also direct-on-memory forms for atomic add/sub (lock add [mem], reg/imm) which we could do if this becomes important for performance.

Wasmtime GitHub notifications bot (Apr 26 2022 at 17:02):

fitzgen commented on issue #4071:

Ah that's right, I forgot that our ref counting became atomic.

Wasmtime GitHub notifications bot (Apr 26 2022 at 21:23):

cfallin commented on issue #4071:

Thanks! Added a runtest as well.

Last updated: Apr 09 2025 at 19:03 UTC