Stream: git-wasmtime

Topic: wasmtime / issue #3942 Transition to regalloc2


view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2022 at 18:31):

cfallin opened issue #3942:

This issue is meant to track the status of migrating Cranelift to use regalloc2, our new register allocator. We started this work a while ago, and as detailed in our 2022 roadmap, we plan to finish the migration this year.

The major tasks remaining are:

The last task has been under development for the past 2.5 weeks or so. I'll make my private branch public shortly, after a bit of cleanup. Its current status is that it is fully functional (passes tests, runs benchmarks) on x86-64. There is work to do to move the other two backends over (aarch64, s390x) and I will do this before we merge. (I might not be able to do this before Mon Mar 28; I'm out-of-office and offline all of next week unfortunately, but wanted to get these results out first!)

The nature of the changes to Cranelift are such that we do have to do the transition atomically and remove regalloc.rs support at the same time; the whole MachInst infrastructure is basically built up around the regalloc abstractions, so swapping it out has a large effect. Fortunately though I think there is not too much of a downside (aside from the usual code-churn risk, which we mitigate with ongoing fuzzing and careful review) -- performance numbers look good.

Here is a current snapshot of some benchmark results:

Benchmark       Compilation (wallclock)     Execution (wallclock)
Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A

with full details here:

<details>
<summary>Benchmark methodology and raw output</summary>
As percentage improvement over baseline (old):

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 25% faster 28% faster
blake3-simd no diff no diff
meshoptimizer 19% faster 17% faster
pulldown-cmark 17% faster no diff
bz2 15% faster no diff
SpiderMonkey, 21% faster 2% faster
fib(30)
clang.wasm 42% faster N/A

As ratios (percent improvement above = 100% * (1 - 1/speedup_ratio))

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 1.34x faster 1.38x faster
blake3-simd no diff no diff
meshoptimizer 1.24x faster 1.21x faster
pulldown-cmark 1.21x faster no diff
bz2 1.18x faster no diff
SpiderMonkey, 1.26x faster 1.02x faster
fib(30)
clang.wasm 1.71x faster N/A

Methodology:

Comparing baseline of Wasmtime fdf063df98ad3839b0e0b78ea55b53b1a296abb0 (from
Mar 16) against my internal regalloc2 branch
9b89942cf62d262ee9ac3e7eab525ea8544a458b (from Mar 17) which last synced with
Wasmtime at eb1b71e31c035ff4250c5013ca0268deb931aa7c (from Feb 24).

Raw output of Sightglass below (instantiation excluded, not interesting).


compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 121531866.00 ± 51042761.18 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[478052996 501410277.40 591983000] new.so
[604955098 622942143.40 709527450] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 31981472.00 ± 13432120.92 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[125802142 131948268.40 155782325] new.so
[159196645 163929740.40 186715328] old.so

execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 36931.50 ± 3272.72 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[105358 106660.00 110728] new.so
[140608 143591.50 149787] old.so

execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 140341.60 ± 12437.21 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[400368 405315.60 420774] new.so
[534318 545657.20 569202] old.so


compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[112727304 139448014.80 189082604] new.so
[123143218 156732493.40 233512432] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[29664800 36696541.20 49758219] new.so
[32405712 41244760.40 61449541] old.so

execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[400672 739521.80 1042226] new.so
[498142 828791.40 1160786] old.so

execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[105439 194609.20 274267] new.so
[131088 218099.20 305464] old.so


compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 483775336.20 ± 24646158.96 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[2090515508 2113482784.00 2150210240] new.so
[2554359582 2597258120.20 2630111328] old.so

compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 127275628.40 ± 6480546.57 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[550127669 556172437.60 565836581] new.so
[672188482 683448066.00 692063546] old.so

execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 3386913742.00 ± 454568778.61 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[17786842514 17978520795.40 18352029814] new.so
[20863697992 21365434537.40 22139271504] old.so

execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 891020039.40 ± 119694835.02 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[4680694128 4731128047.40 4829411387] new.so
[5489883512 5622148086.80 5826025212] old.so


compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 213252595.20 ± 29303757.92 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[1120180378 1148350389.80 1203069094] new.so
[1340768136 1361602985.00 1397014596] old.so

compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 56118120.00 ± 7711578.76 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[294780634 302193792.40 316593182] new.so
[352828441 358311912.40 367631343] old.so

execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[8257780 8443755.80 8560944] new.so
[8455570 9495162.60 17648568] old.so

execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[2173072 2222013.50 2252853] new.so
[2225116 2498693.60 4644290] old.so


compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm

Δ = 58684068.80 ± 36909440.37 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[498967588 545831464.20 586460840] new.so
[540660276 604515533.00 635005118] old.so

compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

Δ = 15436153.00 ± 9714229.01 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[131305387 143637939.40 154329874] new.so
[142264400 159074092.40 167089438] old.so

execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[25932760 35978222.50 53794238] new.so
[28960083 29737468.90 35137211] old.so

execution :: cycles :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[98545894 136719075.20 204420658] new.so
[110059628 113008690.20 133522880] old.so
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2022 at 18:32):

cfallin edited issue #3942:

This issue is meant to track the status of migrating Cranelift to use regalloc2, our new register allocator. We started this work a while ago, and as detailed in our 2022 roadmap, we plan to finish the migration this year.

The major tasks remaining are:

The last task has been under development for the past 2.5 weeks or so. I'll make my private branch public shortly, after a bit of cleanup. Its current status is that it is fully functional (passes tests, runs benchmarks) on x86-64. There is work to do to move the other two backends over (aarch64, s390x) and I will do this before we merge. (I might not be able to do this before Mon Mar 28; I'm out-of-office and offline all of next week unfortunately, but wanted to get these results out first!)

The nature of the changes to Cranelift are such that we do have to do the transition atomically and remove regalloc.rs support at the same time; the whole MachInst infrastructure is basically built up around the regalloc abstractions, so swapping it out has a large effect. Fortunately though I think there is not too much of a downside (aside from the usual code-churn risk, which we mitigate with ongoing fuzzing and careful review) -- performance numbers look good.

Here is a current snapshot of some benchmark results:

Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A

with full details here:

<details>
<summary>Benchmark methodology and raw output</summary>
As percentage improvement over baseline (old):

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 25% faster 28% faster
blake3-simd no diff no diff
meshoptimizer 19% faster 17% faster
pulldown-cmark 17% faster no diff
bz2 15% faster no diff
SpiderMonkey, 21% faster 2% faster
fib(30)
clang.wasm 42% faster N/A

As ratios (percent improvement above = 100% * (1 - 1/speedup_ratio))

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 1.34x faster 1.38x faster
blake3-simd no diff no diff
meshoptimizer 1.24x faster 1.21x faster
pulldown-cmark 1.21x faster no diff
bz2 1.18x faster no diff
SpiderMonkey, 1.26x faster 1.02x faster
fib(30)
clang.wasm 1.71x faster N/A

Methodology:

Comparing baseline of Wasmtime fdf063df98ad3839b0e0b78ea55b53b1a296abb0 (from
Mar 16) against my internal regalloc2 branch
9b89942cf62d262ee9ac3e7eab525ea8544a458b (from Mar 17) which last synced with
Wasmtime at eb1b71e31c035ff4250c5013ca0268deb931aa7c (from Feb 24).

Raw output of Sightglass below (instantiation excluded, not interesting).


compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 121531866.00 ± 51042761.18 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[478052996 501410277.40 591983000] new.so
[604955098 622942143.40 709527450] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 31981472.00 ± 13432120.92 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[125802142 131948268.40 155782325] new.so
[159196645 163929740.40 186715328] old.so

execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 36931.50 ± 3272.72 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[105358 106660.00 110728] new.so
[140608 143591.50 149787] old.so

execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 140341.60 ± 12437.21 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[400368 405315.60 420774] new.so
[534318 545657.20 569202] old.so


compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[112727304 139448014.80 189082604] new.so
[123143218 156732493.40 233512432] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[29664800 36696541.20 49758219] new.so
[32405712 41244760.40 61449541] old.so

execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[400672 739521.80 1042226] new.so
[498142 828791.40 1160786] old.so

execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[105439 194609.20 274267] new.so
[131088 218099.20 305464] old.so


compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 483775336.20 ± 24646158.96 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[2090515508 2113482784.00 2150210240] new.so
[2554359582 2597258120.20 2630111328] old.so

compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 127275628.40 ± 6480546.57 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[550127669 556172437.60 565836581] new.so
[672188482 683448066.00 692063546] old.so

execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 3386913742.00 ± 454568778.61 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[17786842514 17978520795.40 18352029814] new.so
[20863697992 21365434537.40 22139271504] old.so

execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 891020039.40 ± 119694835.02 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[4680694128 4731128047.40 4829411387] new.so
[5489883512 5622148086.80 5826025212] old.so


compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 213252595.20 ± 29303757.92 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[1120180378 1148350389.80 1203069094] new.so
[1340768136 1361602985.00 1397014596] old.so

compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 56118120.00 ± 7711578.76 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[294780634 302193792.40 316593182] new.so
[352828441 358311912.40 367631343] old.so

execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[8257780 8443755.80 8560944] new.so
[8455570 9495162.60 17648568] old.so

execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[2173072 2222013.50 2252853] new.so
[2225116 2498693.60 4644290] old.so


compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm

Δ = 58684068.80 ± 36909440.37 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[498967588 545831464.20 586460840] new.so
[540660276 604515533.00 635005118] old.so

compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

Δ = 15436153.00 ± 9714229.01 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[131305387 143637939.40 154329874] new.so
[142264400 159074092.40 167089438] old.so

execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[25932760 35978222.50 53794238] new.so
[28960083 29737468.90 35137211] old.so

execution :: cycles :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[98545894 136719075.20 204420658] new.so
[110059628 113008690.20 133522880] old.so
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2022 at 18:32):

cfallin edited issue #3942:

This issue is meant to track the status of migrating Cranelift to use regalloc2, our new register allocator. We started this work a while ago, and as detailed in our 2022 roadmap, we plan to finish the migration this year.

The major tasks remaining are:

The last task has been under development for the past 2.5 weeks or so. I'll make my private branch public shortly, after a bit of cleanup. Its current status is that it is fully functional (passes tests, runs benchmarks) on x86-64. There is work to do to move the other two backends over (aarch64, s390x) and I will do this before we merge. (I might not be able to do this before Mon Mar 28; I'm out-of-office and offline all of next week unfortunately, but wanted to get these results out first!)

The nature of the changes to Cranelift are such that we do have to do the transition atomically and remove regalloc.rs support at the same time; the whole MachInst infrastructure is basically built up around the regalloc abstractions, so swapping it out has a large effect. Fortunately though I think there is not too much of a downside (aside from the usual code-churn risk, which we mitigate with ongoing fuzzing and careful review) -- performance numbers look good.

Here is a current snapshot of some benchmark results:

Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A

with full details here:

<details>
<summary>Benchmark methodology and raw output</summary>

As percentage improvement over baseline (old):

Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A

As ratios (percent improvement above = 100% * (1 - 1/speedup_ratio))

Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   1.34x faster                1.38x faster
blake3-simd     no diff                     no diff
meshoptimizer   1.24x faster                1.21x faster
pulldown-cmark  1.21x faster                no diff
bz2             1.18x faster                no diff
SpiderMonkey,   1.26x faster                1.02x faster
  fib(30)
clang.wasm      1.71x faster                N/A

Methodology:

- Sightglass with --processes 2 --iterations-per-process 5.
- Last two benchmarks running commandline wasmtime
  - rm -r ~/.cache/wasmtime
  - run `wasmtime run` once to ensure compiled
  - measure runtime 5x, take best of five
  - measure compile time with `wasmtime compile` 5x, take best of five
  - clang.wasm doesn't have a test harness, so is compile-only
- Testing on 12-core / 24-thread Ryzen 3900X, Linux/x86-64

Comparing baseline of Wasmtime fdf063df98ad3839b0e0b78ea55b53b1a296abb0 (from
Mar 16) against my internal regalloc2 branch
9b89942cf62d262ee9ac3e7eab525ea8544a458b (from Mar 17) which last synced with
Wasmtime at eb1b71e31c035ff4250c5013ca0268deb931aa7c (from Feb 24).

Raw output of Sightglass below (instantiation excluded, not interesting).

----

compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

  Δ = 121531866.00 ± 51042761.18 (confidence = 99%)

  new.so is 1.14x to 1.34x faster than old.so!
  old.so is 0.72x to 0.89x faster than new.so!

  [478052996 501410277.40 591983000] new.so
  [604955098 622942143.40 709527450] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

  Δ = 31981472.00 ± 13432120.92 (confidence = 99%)

  new.so is 1.14x to 1.34x faster than old.so!
  old.so is 0.72x to 0.89x faster than new.so!

  [125802142 131948268.40 155782325] new.so
  [159196645 163929740.40 186715328] old.so

execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

  Δ = 36931.50 ± 3272.72 (confidence = 99%)

  new.so is 1.32x to 1.38x faster than old.so!
  old.so is 0.72x to 0.77x faster than new.so!

  [105358 106660.00 110728] new.so
  [140608 143591.50 149787] old.so

execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

  Δ = 140341.60 ± 12437.21 (confidence = 99%)

  new.so is 1.32x to 1.38x faster than old.so!
  old.so is 0.72x to 0.77x faster than new.so!

  [400368 405315.60 420774] new.so
  [534318 545657.20 569202] old.so

----

compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

  No difference in performance.

  [112727304 139448014.80 189082604] new.so
  [123143218 156732493.40 233512432] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

  No difference in performance.

  [29664800 36696541.20 49758219] new.so
  [32405712 41244760.40 61449541] old.so

execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

  No difference in performance.

  [400672 739521.80 1042226] new.so
  [498142 828791.40 1160786] old.so

execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

  No difference in performance.

  [105439 194609.20 274267] new.so
  [131088 218099.20 305464] old.so

----

compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

  Δ = 483775336.20 ± 24646158.96 (confidence = 99%)

  new.so is 1.22x to 1.24x faster than old.so!
  old.so is 0.80x to 0.82x faster than new.so!

  [2090515508 2113482784.00 2150210240] new.so
  [2554359582 2597258120.20 2630111328] old.so

compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

  Δ = 127275628.40 ± 6480546.57 (confidence = 99%)

  new.so is 1.22x to 1.24x faster than old.so!
  old.so is 0.80x to 0.82x faster than new.so!

  [550127669 556172437.60 565836581] new.so
  [672188482 683448066.00 692063546] old.so

execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

  Δ = 3386913742.00 ± 454568778.61 (confidence = 99%)

  new.so is 1.16x to 1.21x faster than old.so!
  old.so is 0.82x to 0.86x faster than new.so!

  [17786842514 17978520795.40 18352029814] new.so
  [20863697992 21365434537.40 22139271504] old.so

execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

  Δ = 891020039.40 ± 119694835.02 (confidence = 99%)

  new.so is 1.16x to 1.21x faster than old.so!
  old.so is 0.82x to 0.86x faster than new.so!

  [4680694128 4731128047.40 4829411387] new.so
  [5489883512 5622148086.80 5826025212] old.so

----

compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

  Δ = 213252595.20 ± 29303757.92 (confidence = 99%)

  new.so is 1.16x to 1.21x faster than old.so!
  old.so is 0.82x to 0.86x faster than new.so!

  [1120180378 1148350389.80 1203069094] new.so
  [1340768136 1361602985.00 1397014596] old.so

compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

  Δ = 56118120.00 ± 7711578.76 (confidence = 99%)

  new.so is 1.16x to 1.21x faster than old.so!
  old.so is 0.82x to 0.86x faster than new.so!

  [294780634 302193792.40 316593182] new.so
  [352828441 358311912.40 367631343] old.so

execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [8257780 8443755.80 8560944] new.so
  [8455570 9495162.60 17648568] old.so

execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [2173072 2222013.50 2252853] new.so
  [2225116 2498693.60 4644290] old.so

----

compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm

  Δ = 58684068.80 ± 36909440.37 (confidence = 99%)

  new.so is 1.04x to 1.18x faster than old.so!
  old.so is 0.84x to 0.96x faster than new.so!

  [498967588 545831464.20 586460840] new.so
  [540660276 604515533.00 635005118] old.so

compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

  Δ = 15436153.00 ± 9714229.01 (confidence = 99%)

  new.so is 1.04x to 1.18x faster than old.so!
  old.so is 0.84x to 0.96x faster than new.so!

  [131305387 143637939.40 154329874] new.so
  [142264400 159074092.40 167089438] old.so

execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

  No difference in performance.

  [25932760 35978222.50 53794238] new.so
  [28960083 29737468.90 35137211] old.so

execution :: cycles :: benchmarks-next/bz2/benchmark.wasm

  No difference in performance.

  [98545894 136719075.20 204420658] new.so
  [110059628 113008690.20 133522880] old.so

</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2022 at 18:33):

cfallin edited issue #3942:

This issue is meant to track the status of migrating Cranelift to use regalloc2, our new register allocator. We started this work a while ago, and as detailed in our 2022 roadmap, we plan to finish the migration this year.

The major tasks remaining are:

The last task has been under development for the past 2.5 weeks or so. I'll make my private branch public shortly, after a bit of cleanup. Its current status is that it is fully functional (passes tests, runs benchmarks) on x86-64. There is work to do to move the other two backends over (aarch64, s390x) and I will do this before we merge. (I might not be able to do this before Mon Mar 28; I'm out-of-office and offline all of next week unfortunately, but wanted to get these results out first!)

The nature of the changes to Cranelift are such that we do have to do the transition atomically and remove regalloc.rs support at the same time; the whole MachInst infrastructure is basically built up around the regalloc abstractions, so swapping it out has a large effect. Fortunately though I think there is not too much of a downside (aside from the usual code-churn risk, which we mitigate with ongoing fuzzing and careful review) -- performance numbers look good.

Here is a current snapshot of some benchmark results:

Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A

with full details here:

<details>
<summary>Benchmark methodology and raw output</summary>
<pre>
As percentage improvement over baseline (old):

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 25% faster 28% faster
blake3-simd no diff no diff
meshoptimizer 19% faster 17% faster
pulldown-cmark 17% faster no diff
bz2 15% faster no diff
SpiderMonkey, 21% faster 2% faster
fib(30)
clang.wasm 42% faster N/A

As ratios (percent improvement above = 100% * (1 - 1/speedup_ratio))

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 1.34x faster 1.38x faster
blake3-simd no diff no diff
meshoptimizer 1.24x faster 1.21x faster
pulldown-cmark 1.21x faster no diff
bz2 1.18x faster no diff
SpiderMonkey, 1.26x faster 1.02x faster
fib(30)
clang.wasm 1.71x faster N/A

Methodology:

Comparing baseline of Wasmtime fdf063df98ad3839b0e0b78ea55b53b1a296abb0 (from
Mar 16) against my internal regalloc2 branch
9b89942cf62d262ee9ac3e7eab525ea8544a458b (from Mar 17) which last synced with
Wasmtime at eb1b71e31c035ff4250c5013ca0268deb931aa7c (from Feb 24).

Raw output of Sightglass below (instantiation excluded, not interesting).


compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 121531866.00 ± 51042761.18 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[478052996 501410277.40 591983000] new.so
[604955098 622942143.40 709527450] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 31981472.00 ± 13432120.92 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[125802142 131948268.40 155782325] new.so
[159196645 163929740.40 186715328] old.so

execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 36931.50 ± 3272.72 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[105358 106660.00 110728] new.so
[140608 143591.50 149787] old.so

execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 140341.60 ± 12437.21 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[400368 405315.60 420774] new.so
[534318 545657.20 569202] old.so


compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[112727304 139448014.80 189082604] new.so
[123143218 156732493.40 233512432] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[29664800 36696541.20 49758219] new.so
[32405712 41244760.40 61449541] old.so

execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[400672 739521.80 1042226] new.so
[498142 828791.40 1160786] old.so

execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[105439 194609.20 274267] new.so
[131088 218099.20 305464] old.so


compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 483775336.20 ± 24646158.96 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[2090515508 2113482784.00 2150210240] new.so
[2554359582 2597258120.20 2630111328] old.so

compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 127275628.40 ± 6480546.57 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[550127669 556172437.60 565836581] new.so
[672188482 683448066.00 692063546] old.so

execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 3386913742.00 ± 454568778.61 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[17786842514 17978520795.40 18352029814] new.so
[20863697992 21365434537.40 22139271504] old.so

execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 891020039.40 ± 119694835.02 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[4680694128 4731128047.40 4829411387] new.so
[5489883512 5622148086.80 5826025212] old.so


compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 213252595.20 ± 29303757.92 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[1120180378 1148350389.80 1203069094] new.so
[1340768136 1361602985.00 1397014596] old.so

compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 56118120.00 ± 7711578.76 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[294780634 302193792.40 316593182] new.so
[352828441 358311912.40 367631343] old.so

execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[8257780 8443755.80 8560944] new.so
[8455570 9495162.60 17648568] old.so

execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[2173072 2222013.50 2252853] new.so
[2225116 2498693.60 4644290] old.so


compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm

Δ = 58684068.80 ± 36909440.37 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[498967588 545831464.20 586460840] new.so
[540660276 604515533.00 635005118] old.so

compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

Δ = 15436153.00 ± 9714229.01 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[131305387 143637939.40 154329874] new.so
[142264400 159074092.40 167089438] old.so

execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[25932760 35978222.50 53794238] new.so
[28960083 29737468.90 35137211] old.so

execution :: cycles :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[98545894 136719075.20 204420658] new.so
[110059628 113008690.20 133522880] old.so
</pre>
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2022 at 20:42):

abrown commented on issue #3942:

(Can we add spidermonkey.wasm and clang.wasm to Sightglass?)

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2022 at 22:14):

cfallin commented on issue #3942:

(Can we add spidermonkey.wasm and clang.wasm to Sightglass?)

We could perhaps, yeah, with some hackery (building a toplevel harness mostly). In the SpiderMonkey case we need to add a WASI directory capability and feed in a JS file, and in the clang case we need a way to tell the infra that it's compile-only (I don't know how to run it). For now it's not too bad to run by hand though :-)

view this post on Zulip Wasmtime GitHub notifications bot (Mar 17 2022 at 22:49):

cfallin commented on issue #3942:

A little more benchmarking -- taking most of the modules from #911 and compiling with baseline and regalloc2:

Wasm (SHA256 of module) from #911         baseline compile (s)  regalloc2 compile (s)
0ddff0dac47311846e831cb25df5ec5fcb7c59a4  1.201                 0.262
256e0360aa2774d6ad1bb5589030b7a944a81c5d  0.680                 0.671
28276a409e576044bea8cdc46068426484bf7b06  0.035                 0.039
2e746b5b07c0a022415d6c1527815af44daae33e  0.006                 0.006
4286371e64c07f853a5d4de482d658f3c7f2c711  0.137                 0.365
6ccd889e8a97b9adb2697f9f60477e511ad50be4  0.721                 0.329
9850b3172ddb705be8caa06599cb92ead3cd251c  0.509                 0.645
bdb6099c0073360613f17cc9a7d2380d50f8eb9e  2.725                 0.061
bf8490f3bd1f3350a0d4a83670bb1d3d017cf8ef  0.074                 0.283
cb46921624763cf50eb826585d224bb3975a4234  0.693                 0.035
d31a6a6de65a08096dc855a17f49499114826a3e  0.057                 0.284
d51589b35a521c29420fc140b292383f2ca5fd70  3.180                 0.617
dfafaa30ecd41ab9bece126eec8129b42925a4dd  1.367                 1.011

In almost all cases things got faster, sometimes significantly so (3.18s -> 0.61s, 1.2s -> 0.26s, 2.7s -> 0.061s (!)). This tracks with my understanding of some of the bottlenecks I saw in profiling before and the efforts to keep away from quadratic explosions and nonlinear behavior in general in regalloc2 as far as possible. Some of the smaller modules see some increases (0.137s -> 0.365s, 0.057s -> 0.284s); I haven't conclusively resolved what's going on in those but it wouldn't surprise me if this comes from splitting heuristics being a little more aggressive. In any case nothing immediately jumps out in the profile.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2022 at 20:11):

alexcrichton labeled issue #3942:

This issue is meant to track the status of migrating Cranelift to use regalloc2, our new register allocator. We started this work a while ago, and as detailed in our 2022 roadmap, we plan to finish the migration this year.

The major tasks remaining are:

The last task has been under development for the past 2.5 weeks or so. I'll make my private branch public shortly, after a bit of cleanup. Its current status is that it is fully functional (passes tests, runs benchmarks) on x86-64. There is work to do to move the other two backends over (aarch64, s390x) and I will do this before we merge. (I might not be able to do this before Mon Mar 28; I'm out-of-office and offline all of next week unfortunately, but wanted to get these results out first!)

The nature of the changes to Cranelift are such that we do have to do the transition atomically and remove regalloc.rs support at the same time; the whole MachInst infrastructure is basically built up around the regalloc abstractions, so swapping it out has a large effect. Fortunately though I think there is not too much of a downside (aside from the usual code-churn risk, which we mitigate with ongoing fuzzing and careful review) -- performance numbers look good.

Here is a current snapshot of some benchmark results:

Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A

with full details here:

<details>
<summary>Benchmark methodology and raw output</summary>
<pre>
As percentage improvement over baseline (old):

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 25% faster 28% faster
blake3-simd no diff no diff
meshoptimizer 19% faster 17% faster
pulldown-cmark 17% faster no diff
bz2 15% faster no diff
SpiderMonkey, 21% faster 2% faster
fib(30)
clang.wasm 42% faster N/A

As ratios (percent improvement above = 100% * (1 - 1/speedup_ratio))

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 1.34x faster 1.38x faster
blake3-simd no diff no diff
meshoptimizer 1.24x faster 1.21x faster
pulldown-cmark 1.21x faster no diff
bz2 1.18x faster no diff
SpiderMonkey, 1.26x faster 1.02x faster
fib(30)
clang.wasm 1.71x faster N/A

Methodology:

Comparing baseline of Wasmtime fdf063df98ad3839b0e0b78ea55b53b1a296abb0 (from
Mar 16) against my internal regalloc2 branch
9b89942cf62d262ee9ac3e7eab525ea8544a458b (from Mar 17) which last synced with
Wasmtime at eb1b71e31c035ff4250c5013ca0268deb931aa7c (from Feb 24).

Raw output of Sightglass below (instantiation excluded, not interesting).


compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 121531866.00 ± 51042761.18 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[478052996 501410277.40 591983000] new.so
[604955098 622942143.40 709527450] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 31981472.00 ± 13432120.92 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[125802142 131948268.40 155782325] new.so
[159196645 163929740.40 186715328] old.so

execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 36931.50 ± 3272.72 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[105358 106660.00 110728] new.so
[140608 143591.50 149787] old.so

execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 140341.60 ± 12437.21 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[400368 405315.60 420774] new.so
[534318 545657.20 569202] old.so


compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[112727304 139448014.80 189082604] new.so
[123143218 156732493.40 233512432] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[29664800 36696541.20 49758219] new.so
[32405712 41244760.40 61449541] old.so

execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[400672 739521.80 1042226] new.so
[498142 828791.40 1160786] old.so

execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[105439 194609.20 274267] new.so
[131088 218099.20 305464] old.so


compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 483775336.20 ± 24646158.96 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[2090515508 2113482784.00 2150210240] new.so
[2554359582 2597258120.20 2630111328] old.so

compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 127275628.40 ± 6480546.57 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[550127669 556172437.60 565836581] new.so
[672188482 683448066.00 692063546] old.so

execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 3386913742.00 ± 454568778.61 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[17786842514 17978520795.40 18352029814] new.so
[20863697992 21365434537.40 22139271504] old.so

execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 891020039.40 ± 119694835.02 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[4680694128 4731128047.40 4829411387] new.so
[5489883512 5622148086.80 5826025212] old.so


compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 213252595.20 ± 29303757.92 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[1120180378 1148350389.80 1203069094] new.so
[1340768136 1361602985.00 1397014596] old.so

compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 56118120.00 ± 7711578.76 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[294780634 302193792.40 316593182] new.so
[352828441 358311912.40 367631343] old.so

execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[8257780 8443755.80 8560944] new.so
[8455570 9495162.60 17648568] old.so

execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[2173072 2222013.50 2252853] new.so
[2225116 2498693.60 4644290] old.so


compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm

Δ = 58684068.80 ± 36909440.37 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[498967588 545831464.20 586460840] new.so
[540660276 604515533.00 635005118] old.so

compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

Δ = 15436153.00 ± 9714229.01 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[131305387 143637939.40 154329874] new.so
[142264400 159074092.40 167089438] old.so

execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[25932760 35978222.50 53794238] new.so
[28960083 29737468.90 35137211] old.so

execution :: cycles :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[98545894 136719075.20 204420658] new.so
[110059628 113008690.20 133522880] old.so
</pre>
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2022 at 20:11):

alexcrichton labeled issue #3942:

This issue is meant to track the status of migrating Cranelift to use regalloc2, our new register allocator. We started this work a while ago, and as detailed in our 2022 roadmap, we plan to finish the migration this year.

The major tasks remaining are:

The last task has been under development for the past 2.5 weeks or so. I'll make my private branch public shortly, after a bit of cleanup. Its current status is that it is fully functional (passes tests, runs benchmarks) on x86-64. There is work to do to move the other two backends over (aarch64, s390x) and I will do this before we merge. (I might not be able to do this before Mon Mar 28; I'm out-of-office and offline all of next week unfortunately, but wanted to get these results out first!)

The nature of the changes to Cranelift are such that we do have to do the transition atomically and remove regalloc.rs support at the same time; the whole MachInst infrastructure is basically built up around the regalloc abstractions, so swapping it out has a large effect. Fortunately though I think there is not too much of a downside (aside from the usual code-churn risk, which we mitigate with ongoing fuzzing and careful review) -- performance numbers look good.

Here is a current snapshot of some benchmark results:

Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A

with full details here:

<details>
<summary>Benchmark methodology and raw output</summary>
<pre>
As percentage improvement over baseline (old):

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 25% faster 28% faster
blake3-simd no diff no diff
meshoptimizer 19% faster 17% faster
pulldown-cmark 17% faster no diff
bz2 15% faster no diff
SpiderMonkey, 21% faster 2% faster
fib(30)
clang.wasm 42% faster N/A

As ratios (percent improvement above = 100% * (1 - 1/speedup_ratio))

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 1.34x faster 1.38x faster
blake3-simd no diff no diff
meshoptimizer 1.24x faster 1.21x faster
pulldown-cmark 1.21x faster no diff
bz2 1.18x faster no diff
SpiderMonkey, 1.26x faster 1.02x faster
fib(30)
clang.wasm 1.71x faster N/A

Methodology:

Comparing baseline of Wasmtime fdf063df98ad3839b0e0b78ea55b53b1a296abb0 (from
Mar 16) against my internal regalloc2 branch
9b89942cf62d262ee9ac3e7eab525ea8544a458b (from Mar 17) which last synced with
Wasmtime at eb1b71e31c035ff4250c5013ca0268deb931aa7c (from Feb 24).

Raw output of Sightglass below (instantiation excluded, not interesting).


compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 121531866.00 ± 51042761.18 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[478052996 501410277.40 591983000] new.so
[604955098 622942143.40 709527450] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 31981472.00 ± 13432120.92 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[125802142 131948268.40 155782325] new.so
[159196645 163929740.40 186715328] old.so

execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 36931.50 ± 3272.72 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[105358 106660.00 110728] new.so
[140608 143591.50 149787] old.so

execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 140341.60 ± 12437.21 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[400368 405315.60 420774] new.so
[534318 545657.20 569202] old.so


compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[112727304 139448014.80 189082604] new.so
[123143218 156732493.40 233512432] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[29664800 36696541.20 49758219] new.so
[32405712 41244760.40 61449541] old.so

execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[400672 739521.80 1042226] new.so
[498142 828791.40 1160786] old.so

execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[105439 194609.20 274267] new.so
[131088 218099.20 305464] old.so


compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 483775336.20 ± 24646158.96 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[2090515508 2113482784.00 2150210240] new.so
[2554359582 2597258120.20 2630111328] old.so

compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 127275628.40 ± 6480546.57 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[550127669 556172437.60 565836581] new.so
[672188482 683448066.00 692063546] old.so

execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 3386913742.00 ± 454568778.61 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[17786842514 17978520795.40 18352029814] new.so
[20863697992 21365434537.40 22139271504] old.so

execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 891020039.40 ± 119694835.02 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[4680694128 4731128047.40 4829411387] new.so
[5489883512 5622148086.80 5826025212] old.so


compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 213252595.20 ± 29303757.92 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[1120180378 1148350389.80 1203069094] new.so
[1340768136 1361602985.00 1397014596] old.so

compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 56118120.00 ± 7711578.76 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[294780634 302193792.40 316593182] new.so
[352828441 358311912.40 367631343] old.so

execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[8257780 8443755.80 8560944] new.so
[8455570 9495162.60 17648568] old.so

execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[2173072 2222013.50 2252853] new.so
[2225116 2498693.60 4644290] old.so


compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm

Δ = 58684068.80 ± 36909440.37 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[498967588 545831464.20 586460840] new.so
[540660276 604515533.00 635005118] old.so

compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

Δ = 15436153.00 ± 9714229.01 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[131305387 143637939.40 154329874] new.so
[142264400 159074092.40 167089438] old.so

execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[25932760 35978222.50 53794238] new.so
[28960083 29737468.90 35137211] old.so

execution :: cycles :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[98545894 136719075.20 204420658] new.so
[110059628 113008690.20 133522880] old.so
</pre>
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2022 at 23:21):

cfallin edited issue #3942:

This issue is meant to track the status of migrating Cranelift to use regalloc2, our new register allocator. We started this work a while ago, and as detailed in our 2022 roadmap, we plan to finish the migration this year.

The major tasks remaining are:

The last task has been under development for the past 2.5 weeks or so. I'll make my private branch public shortly, after a bit of cleanup. Its current status is that it is fully functional (passes tests, runs benchmarks) on x86-64. There is work to do to move the other two backends over (aarch64, s390x) and I will do this before we merge. (I might not be able to do this before Mon Mar 28; I'm out-of-office and offline all of next week unfortunately, but wanted to get these results out first!)

The nature of the changes to Cranelift are such that we do have to do the transition atomically and remove regalloc.rs support at the same time; the whole MachInst infrastructure is basically built up around the regalloc abstractions, so swapping it out has a large effect. Fortunately though I think there is not too much of a downside (aside from the usual code-churn risk, which we mitigate with ongoing fuzzing and careful review) -- performance numbers look good.

Here is a current snapshot of some benchmark results:

Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A

with full details here:

<details>
<summary>Benchmark methodology and raw output</summary>
<pre>
As percentage improvement over baseline (old):

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 25% faster 28% faster
blake3-simd no diff no diff
meshoptimizer 19% faster 17% faster
pulldown-cmark 17% faster no diff
bz2 15% faster no diff
SpiderMonkey, 21% faster 2% faster
fib(30)
clang.wasm 42% faster N/A

As ratios (percent improvement above = 100% * (1 - 1/speedup_ratio))

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 1.34x faster 1.38x faster
blake3-simd no diff no diff
meshoptimizer 1.24x faster 1.21x faster
pulldown-cmark 1.21x faster no diff
bz2 1.18x faster no diff
SpiderMonkey, 1.26x faster 1.02x faster
fib(30)
clang.wasm 1.71x faster N/A

Methodology:

Comparing baseline of Wasmtime fdf063df98ad3839b0e0b78ea55b53b1a296abb0 (from
Mar 16) against my internal regalloc2 branch
9b89942cf62d262ee9ac3e7eab525ea8544a458b (from Mar 17) which last synced with
Wasmtime at eb1b71e31c035ff4250c5013ca0268deb931aa7c (from Feb 24).

Raw output of Sightglass below (instantiation excluded, not interesting).


compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 121531866.00 ± 51042761.18 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[478052996 501410277.40 591983000] new.so
[604955098 622942143.40 709527450] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 31981472.00 ± 13432120.92 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[125802142 131948268.40 155782325] new.so
[159196645 163929740.40 186715328] old.so

execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 36931.50 ± 3272.72 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[105358 106660.00 110728] new.so
[140608 143591.50 149787] old.so

execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 140341.60 ± 12437.21 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[400368 405315.60 420774] new.so
[534318 545657.20 569202] old.so


compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[112727304 139448014.80 189082604] new.so
[123143218 156732493.40 233512432] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[29664800 36696541.20 49758219] new.so
[32405712 41244760.40 61449541] old.so

execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[400672 739521.80 1042226] new.so
[498142 828791.40 1160786] old.so

execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[105439 194609.20 274267] new.so
[131088 218099.20 305464] old.so


compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 483775336.20 ± 24646158.96 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[2090515508 2113482784.00 2150210240] new.so
[2554359582 2597258120.20 2630111328] old.so

compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 127275628.40 ± 6480546.57 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[550127669 556172437.60 565836581] new.so
[672188482 683448066.00 692063546] old.so

execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 3386913742.00 ± 454568778.61 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[17786842514 17978520795.40 18352029814] new.so
[20863697992 21365434537.40 22139271504] old.so

execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 891020039.40 ± 119694835.02 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[4680694128 4731128047.40 4829411387] new.so
[5489883512 5622148086.80 5826025212] old.so


compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 213252595.20 ± 29303757.92 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[1120180378 1148350389.80 1203069094] new.so
[1340768136 1361602985.00 1397014596] old.so

compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 56118120.00 ± 7711578.76 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[294780634 302193792.40 316593182] new.so
[352828441 358311912.40 367631343] old.so

execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[8257780 8443755.80 8560944] new.so
[8455570 9495162.60 17648568] old.so

execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[2173072 2222013.50 2252853] new.so
[2225116 2498693.60 4644290] old.so


compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm

Δ = 58684068.80 ± 36909440.37 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[498967588 545831464.20 586460840] new.so
[540660276 604515533.00 635005118] old.so

compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

Δ = 15436153.00 ± 9714229.01 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[131305387 143637939.40 154329874] new.so
[142264400 159074092.40 167089438] old.so

execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[25932760 35978222.50 53794238] new.so
[28960083 29737468.90 35137211] old.so

execution :: cycles :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[98545894 136719075.20 204420658] new.so
[110059628 113008690.20 133522880] old.so
</pre>
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2022 at 23:21):

cfallin edited issue #3942:

This issue is meant to track the status of migrating Cranelift to use regalloc2, our new register allocator. We started this work a while ago, and as detailed in our 2022 roadmap, we plan to finish the migration this year.

The major tasks remaining are:

The last task has been under development for the past 2.5 weeks or so. I'll make my private branch public shortly, after a bit of cleanup. Its current status is that it is fully functional (passes tests, runs benchmarks) on x86-64. There is work to do to move the other two backends over (aarch64, s390x) and I will do this before we merge. (I might not be able to do this before Mon Mar 28; I'm out-of-office and offline all of next week unfortunately, but wanted to get these results out first!)

The nature of the changes to Cranelift are such that we do have to do the transition atomically and remove regalloc.rs support at the same time; the whole MachInst infrastructure is basically built up around the regalloc abstractions, so swapping it out has a large effect. Fortunately though I think there is not too much of a downside (aside from the usual code-churn risk, which we mitigate with ongoing fuzzing and careful review) -- performance numbers look good.

Here is a current snapshot of some benchmark results:

Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A

with full details here:

<details>
<summary>Benchmark methodology and raw output</summary>
<pre>
As percentage improvement over baseline (old):

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 25% faster 28% faster
blake3-simd no diff no diff
meshoptimizer 19% faster 17% faster
pulldown-cmark 17% faster no diff
bz2 15% faster no diff
SpiderMonkey, 21% faster 2% faster
fib(30)
clang.wasm 42% faster N/A

As ratios (percent improvement above = 100% * (1 - 1/speedup_ratio))

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 1.34x faster 1.38x faster
blake3-simd no diff no diff
meshoptimizer 1.24x faster 1.21x faster
pulldown-cmark 1.21x faster no diff
bz2 1.18x faster no diff
SpiderMonkey, 1.26x faster 1.02x faster
fib(30)
clang.wasm 1.71x faster N/A

Methodology:

Comparing baseline of Wasmtime fdf063df98ad3839b0e0b78ea55b53b1a296abb0 (from
Mar 16) against my internal regalloc2 branch
9b89942cf62d262ee9ac3e7eab525ea8544a458b (from Mar 17) which last synced with
Wasmtime at eb1b71e31c035ff4250c5013ca0268deb931aa7c (from Feb 24).

Raw output of Sightglass below (instantiation excluded, not interesting).


compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 121531866.00 ± 51042761.18 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[478052996 501410277.40 591983000] new.so
[604955098 622942143.40 709527450] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 31981472.00 ± 13432120.92 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[125802142 131948268.40 155782325] new.so
[159196645 163929740.40 186715328] old.so

execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 36931.50 ± 3272.72 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[105358 106660.00 110728] new.so
[140608 143591.50 149787] old.so

execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 140341.60 ± 12437.21 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[400368 405315.60 420774] new.so
[534318 545657.20 569202] old.so


compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[112727304 139448014.80 189082604] new.so
[123143218 156732493.40 233512432] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[29664800 36696541.20 49758219] new.so
[32405712 41244760.40 61449541] old.so

execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[400672 739521.80 1042226] new.so
[498142 828791.40 1160786] old.so

execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[105439 194609.20 274267] new.so
[131088 218099.20 305464] old.so


compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 483775336.20 ± 24646158.96 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[2090515508 2113482784.00 2150210240] new.so
[2554359582 2597258120.20 2630111328] old.so

compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 127275628.40 ± 6480546.57 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[550127669 556172437.60 565836581] new.so
[672188482 683448066.00 692063546] old.so

execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 3386913742.00 ± 454568778.61 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[17786842514 17978520795.40 18352029814] new.so
[20863697992 21365434537.40 22139271504] old.so

execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 891020039.40 ± 119694835.02 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[4680694128 4731128047.40 4829411387] new.so
[5489883512 5622148086.80 5826025212] old.so


compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 213252595.20 ± 29303757.92 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[1120180378 1148350389.80 1203069094] new.so
[1340768136 1361602985.00 1397014596] old.so

compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 56118120.00 ± 7711578.76 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[294780634 302193792.40 316593182] new.so
[352828441 358311912.40 367631343] old.so

execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[8257780 8443755.80 8560944] new.so
[8455570 9495162.60 17648568] old.so

execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[2173072 2222013.50 2252853] new.so
[2225116 2498693.60 4644290] old.so


compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm

Δ = 58684068.80 ± 36909440.37 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[498967588 545831464.20 586460840] new.so
[540660276 604515533.00 635005118] old.so

compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

Δ = 15436153.00 ± 9714229.01 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[131305387 143637939.40 154329874] new.so
[142264400 159074092.40 167089438] old.so

execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[25932760 35978222.50 53794238] new.so
[28960083 29737468.90 35137211] old.so

execution :: cycles :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[98545894 136719075.20 204420658] new.so
[110059628 113008690.20 133522880] old.so
</pre>
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2022 at 23:21):

cfallin edited issue #3942:

This issue is meant to track the status of migrating Cranelift to use regalloc2, our new register allocator. We started this work a while ago, and as detailed in our 2022 roadmap, we plan to finish the migration this year.

The major tasks remaining are:

The last task has been under development for the past 2.5 weeks or so. I'll make my private branch public shortly, after a bit of cleanup. Its current status is that it is fully functional (passes tests, runs benchmarks) on x86-64. There is work to do to move the other two backends over (aarch64, s390x) and I will do this before we merge. (I might not be able to do this before Mon Mar 28; I'm out-of-office and offline all of next week unfortunately, but wanted to get these results out first!)

The nature of the changes to Cranelift are such that we do have to do the transition atomically and remove regalloc.rs support at the same time; the whole MachInst infrastructure is basically built up around the regalloc abstractions, so swapping it out has a large effect. Fortunately though I think there is not too much of a downside (aside from the usual code-churn risk, which we mitigate with ongoing fuzzing and careful review) -- performance numbers look good.

Here is a current snapshot of some benchmark results:

Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A

with full details here:

<details>
<summary>Benchmark methodology and raw output</summary>
<pre>
As percentage improvement over baseline (old):

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 25% faster 28% faster
blake3-simd no diff no diff
meshoptimizer 19% faster 17% faster
pulldown-cmark 17% faster no diff
bz2 15% faster no diff
SpiderMonkey, 21% faster 2% faster
fib(30)
clang.wasm 42% faster N/A

As ratios (percent improvement above = 100% * (1 - 1/speedup_ratio))

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 1.34x faster 1.38x faster
blake3-simd no diff no diff
meshoptimizer 1.24x faster 1.21x faster
pulldown-cmark 1.21x faster no diff
bz2 1.18x faster no diff
SpiderMonkey, 1.26x faster 1.02x faster
fib(30)
clang.wasm 1.71x faster N/A

Methodology:

Comparing baseline of Wasmtime fdf063df98ad3839b0e0b78ea55b53b1a296abb0 (from
Mar 16) against my internal regalloc2 branch
9b89942cf62d262ee9ac3e7eab525ea8544a458b (from Mar 17) which last synced with
Wasmtime at eb1b71e31c035ff4250c5013ca0268deb931aa7c (from Feb 24).

Raw output of Sightglass below (instantiation excluded, not interesting).


compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 121531866.00 ± 51042761.18 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[478052996 501410277.40 591983000] new.so
[604955098 622942143.40 709527450] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 31981472.00 ± 13432120.92 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[125802142 131948268.40 155782325] new.so
[159196645 163929740.40 186715328] old.so

execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 36931.50 ± 3272.72 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[105358 106660.00 110728] new.so
[140608 143591.50 149787] old.so

execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 140341.60 ± 12437.21 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[400368 405315.60 420774] new.so
[534318 545657.20 569202] old.so


compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[112727304 139448014.80 189082604] new.so
[123143218 156732493.40 233512432] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[29664800 36696541.20 49758219] new.so
[32405712 41244760.40 61449541] old.so

execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[400672 739521.80 1042226] new.so
[498142 828791.40 1160786] old.so

execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[105439 194609.20 274267] new.so
[131088 218099.20 305464] old.so


compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 483775336.20 ± 24646158.96 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[2090515508 2113482784.00 2150210240] new.so
[2554359582 2597258120.20 2630111328] old.so

compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 127275628.40 ± 6480546.57 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[550127669 556172437.60 565836581] new.so
[672188482 683448066.00 692063546] old.so

execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 3386913742.00 ± 454568778.61 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[17786842514 17978520795.40 18352029814] new.so
[20863697992 21365434537.40 22139271504] old.so

execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 891020039.40 ± 119694835.02 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[4680694128 4731128047.40 4829411387] new.so
[5489883512 5622148086.80 5826025212] old.so


compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 213252595.20 ± 29303757.92 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[1120180378 1148350389.80 1203069094] new.so
[1340768136 1361602985.00 1397014596] old.so

compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 56118120.00 ± 7711578.76 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[294780634 302193792.40 316593182] new.so
[352828441 358311912.40 367631343] old.so

execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[8257780 8443755.80 8560944] new.so
[8455570 9495162.60 17648568] old.so

execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[2173072 2222013.50 2252853] new.so
[2225116 2498693.60 4644290] old.so


compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm

Δ = 58684068.80 ± 36909440.37 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[498967588 545831464.20 586460840] new.so
[540660276 604515533.00 635005118] old.so

compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

Δ = 15436153.00 ± 9714229.01 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[131305387 143637939.40 154329874] new.so
[142264400 159074092.40 167089438] old.so

execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[25932760 35978222.50 53794238] new.so
[28960083 29737468.90 35137211] old.so

execution :: cycles :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[98545894 136719075.20 204420658] new.so
[110059628 113008690.20 133522880] old.so
</pre>
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2022 at 23:21):

cfallin edited issue #3942:

This issue is meant to track the status of migrating Cranelift to use regalloc2, our new register allocator. We started this work a while ago, and as detailed in our 2022 roadmap, we plan to finish the migration this year.

The major tasks remaining are:

The last task has been under development for the past 2.5 weeks or so. I'll make my private branch public shortly, after a bit of cleanup. Its current status is that it is fully functional (passes tests, runs benchmarks) on x86-64. There is work to do to move the other two backends over (aarch64, s390x) and I will do this before we merge. (I might not be able to do this before Mon Mar 28; I'm out-of-office and offline all of next week unfortunately, but wanted to get these results out first!)

The nature of the changes to Cranelift are such that we do have to do the transition atomically and remove regalloc.rs support at the same time; the whole MachInst infrastructure is basically built up around the regalloc abstractions, so swapping it out has a large effect. Fortunately though I think there is not too much of a downside (aside from the usual code-churn risk, which we mitigate with ongoing fuzzing and careful review) -- performance numbers look good.

Here is a current snapshot of some benchmark results:

Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A

with full details here:

<details>
<summary>Benchmark methodology and raw output</summary>
<pre>
As percentage improvement over baseline (old):

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 25% faster 28% faster
blake3-simd no diff no diff
meshoptimizer 19% faster 17% faster
pulldown-cmark 17% faster no diff
bz2 15% faster no diff
SpiderMonkey, 21% faster 2% faster
fib(30)
clang.wasm 42% faster N/A

As ratios (percent improvement above = 100% * (1 - 1/speedup_ratio))

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 1.34x faster 1.38x faster
blake3-simd no diff no diff
meshoptimizer 1.24x faster 1.21x faster
pulldown-cmark 1.21x faster no diff
bz2 1.18x faster no diff
SpiderMonkey, 1.26x faster 1.02x faster
fib(30)
clang.wasm 1.71x faster N/A

Methodology:

Comparing baseline of Wasmtime fdf063df98ad3839b0e0b78ea55b53b1a296abb0 (from
Mar 16) against my internal regalloc2 branch
9b89942cf62d262ee9ac3e7eab525ea8544a458b (from Mar 17) which last synced with
Wasmtime at eb1b71e31c035ff4250c5013ca0268deb931aa7c (from Feb 24).

Raw output of Sightglass below (instantiation excluded, not interesting).


compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 121531866.00 ± 51042761.18 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[478052996 501410277.40 591983000] new.so
[604955098 622942143.40 709527450] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 31981472.00 ± 13432120.92 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[125802142 131948268.40 155782325] new.so
[159196645 163929740.40 186715328] old.so

execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 36931.50 ± 3272.72 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[105358 106660.00 110728] new.so
[140608 143591.50 149787] old.so

execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 140341.60 ± 12437.21 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[400368 405315.60 420774] new.so
[534318 545657.20 569202] old.so


compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[112727304 139448014.80 189082604] new.so
[123143218 156732493.40 233512432] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[29664800 36696541.20 49758219] new.so
[32405712 41244760.40 61449541] old.so

execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[400672 739521.80 1042226] new.so
[498142 828791.40 1160786] old.so

execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[105439 194609.20 274267] new.so
[131088 218099.20 305464] old.so


compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 483775336.20 ± 24646158.96 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[2090515508 2113482784.00 2150210240] new.so
[2554359582 2597258120.20 2630111328] old.so

compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 127275628.40 ± 6480546.57 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[550127669 556172437.60 565836581] new.so
[672188482 683448066.00 692063546] old.so

execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 3386913742.00 ± 454568778.61 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[17786842514 17978520795.40 18352029814] new.so
[20863697992 21365434537.40 22139271504] old.so

execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 891020039.40 ± 119694835.02 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[4680694128 4731128047.40 4829411387] new.so
[5489883512 5622148086.80 5826025212] old.so


compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 213252595.20 ± 29303757.92 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[1120180378 1148350389.80 1203069094] new.so
[1340768136 1361602985.00 1397014596] old.so

compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 56118120.00 ± 7711578.76 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[294780634 302193792.40 316593182] new.so
[352828441 358311912.40 367631343] old.so

execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[8257780 8443755.80 8560944] new.so
[8455570 9495162.60 17648568] old.so

execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[2173072 2222013.50 2252853] new.so
[2225116 2498693.60 4644290] old.so


compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm

Δ = 58684068.80 ± 36909440.37 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[498967588 545831464.20 586460840] new.so
[540660276 604515533.00 635005118] old.so

compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

Δ = 15436153.00 ± 9714229.01 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[131305387 143637939.40 154329874] new.so
[142264400 159074092.40 167089438] old.so

execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[25932760 35978222.50 53794238] new.so
[28960083 29737468.90 35137211] old.so

execution :: cycles :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[98545894 136719075.20 204420658] new.so
[110059628 113008690.20 133522880] old.so
</pre>
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Apr 14 2022 at 17:28):

cfallin closed issue #3942:

This issue is meant to track the status of migrating Cranelift to use regalloc2, our new register allocator. We started this work a while ago, and as detailed in our 2022 roadmap, we plan to finish the migration this year.

The major tasks remaining are:

The last task has been under development for the past 2.5 weeks or so. I'll make my private branch public shortly, after a bit of cleanup. Its current status is that it is fully functional (passes tests, runs benchmarks) on x86-64. There is work to do to move the other two backends over (aarch64, s390x) and I will do this before we merge. (I might not be able to do this before Mon Mar 28; I'm out-of-office and offline all of next week unfortunately, but wanted to get these results out first!)

The nature of the changes to Cranelift are such that we do have to do the transition atomically and remove regalloc.rs support at the same time; the whole MachInst infrastructure is basically built up around the regalloc abstractions, so swapping it out has a large effect. Fortunately though I think there is not too much of a downside (aside from the usual code-churn risk, which we mitigate with ongoing fuzzing and careful review) -- performance numbers look good.

Here is a current snapshot of some benchmark results:

Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A

with full details here:

<details>
<summary>Benchmark methodology and raw output</summary>
<pre>
As percentage improvement over baseline (old):

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 25% faster 28% faster
blake3-simd no diff no diff
meshoptimizer 19% faster 17% faster
pulldown-cmark 17% faster no diff
bz2 15% faster no diff
SpiderMonkey, 21% faster 2% faster
fib(30)
clang.wasm 42% faster N/A

As ratios (percent improvement above = 100% * (1 - 1/speedup_ratio))

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 1.34x faster 1.38x faster
blake3-simd no diff no diff
meshoptimizer 1.24x faster 1.21x faster
pulldown-cmark 1.21x faster no diff
bz2 1.18x faster no diff
SpiderMonkey, 1.26x faster 1.02x faster
fib(30)
clang.wasm 1.71x faster N/A

Methodology:

Comparing baseline of Wasmtime fdf063df98ad3839b0e0b78ea55b53b1a296abb0 (from
Mar 16) against my internal regalloc2 branch
9b89942cf62d262ee9ac3e7eab525ea8544a458b (from Mar 17) which last synced with
Wasmtime at eb1b71e31c035ff4250c5013ca0268deb931aa7c (from Feb 24).

Raw output of Sightglass below (instantiation excluded, not interesting).


compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 121531866.00 ± 51042761.18 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[478052996 501410277.40 591983000] new.so
[604955098 622942143.40 709527450] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 31981472.00 ± 13432120.92 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[125802142 131948268.40 155782325] new.so
[159196645 163929740.40 186715328] old.so

execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 36931.50 ± 3272.72 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[105358 106660.00 110728] new.so
[140608 143591.50 149787] old.so

execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 140341.60 ± 12437.21 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[400368 405315.60 420774] new.so
[534318 545657.20 569202] old.so


compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[112727304 139448014.80 189082604] new.so
[123143218 156732493.40 233512432] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[29664800 36696541.20 49758219] new.so
[32405712 41244760.40 61449541] old.so

execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[400672 739521.80 1042226] new.so
[498142 828791.40 1160786] old.so

execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[105439 194609.20 274267] new.so
[131088 218099.20 305464] old.so


compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 483775336.20 ± 24646158.96 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[2090515508 2113482784.00 2150210240] new.so
[2554359582 2597258120.20 2630111328] old.so

compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 127275628.40 ± 6480546.57 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[550127669 556172437.60 565836581] new.so
[672188482 683448066.00 692063546] old.so

execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 3386913742.00 ± 454568778.61 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[17786842514 17978520795.40 18352029814] new.so
[20863697992 21365434537.40 22139271504] old.so

execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 891020039.40 ± 119694835.02 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[4680694128 4731128047.40 4829411387] new.so
[5489883512 5622148086.80 5826025212] old.so


compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 213252595.20 ± 29303757.92 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[1120180378 1148350389.80 1203069094] new.so
[1340768136 1361602985.00 1397014596] old.so

compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 56118120.00 ± 7711578.76 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[294780634 302193792.40 316593182] new.so
[352828441 358311912.40 367631343] old.so

execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[8257780 8443755.80 8560944] new.so
[8455570 9495162.60 17648568] old.so

execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[2173072 2222013.50 2252853] new.so
[2225116 2498693.60 4644290] old.so


compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm

Δ = 58684068.80 ± 36909440.37 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[498967588 545831464.20 586460840] new.so
[540660276 604515533.00 635005118] old.so

compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

Δ = 15436153.00 ± 9714229.01 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[131305387 143637939.40 154329874] new.so
[142264400 159074092.40 167089438] old.so

execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[25932760 35978222.50 53794238] new.so
[28960083 29737468.90 35137211] old.so

execution :: cycles :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[98545894 136719075.20 204420658] new.so
[110059628 113008690.20 133522880] old.so
</pre>
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Apr 14 2022 at 17:29):

cfallin edited issue #3942:

This issue is meant to track the status of migrating Cranelift to use regalloc2, our new register allocator. We started this work a while ago, and as detailed in our 2022 roadmap, we plan to finish the migration this year.

The major tasks remaining are:

The last task has been under development for the past 2.5 weeks or so. I'll make my private branch public shortly, after a bit of cleanup. Its current status is that it is fully functional (passes tests, runs benchmarks) on x86-64. There is work to do to move the other two backends over (aarch64, s390x) and I will do this before we merge. (I might not be able to do this before Mon Mar 28; I'm out-of-office and offline all of next week unfortunately, but wanted to get these results out first!)

The nature of the changes to Cranelift are such that we do have to do the transition atomically and remove regalloc.rs support at the same time; the whole MachInst infrastructure is basically built up around the regalloc abstractions, so swapping it out has a large effect. Fortunately though I think there is not too much of a downside (aside from the usual code-churn risk, which we mitigate with ongoing fuzzing and careful review) -- performance numbers look good.

Here is a current snapshot of some benchmark results:

Benchmark       Compilation (wallclock)     Execution (wallclock)
blake3-scalar   25% faster                  28% faster
blake3-simd     no diff                     no diff
meshoptimizer   19% faster                  17% faster
pulldown-cmark  17% faster                  no diff
bz2             15% faster                  no diff
SpiderMonkey,   21% faster                  2% faster
  fib(30)
clang.wasm      42% faster                  N/A

with full details here:

<details>
<summary>Benchmark methodology and raw output</summary>
<pre>
As percentage improvement over baseline (old):

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 25% faster 28% faster
blake3-simd no diff no diff
meshoptimizer 19% faster 17% faster
pulldown-cmark 17% faster no diff
bz2 15% faster no diff
SpiderMonkey, 21% faster 2% faster
fib(30)
clang.wasm 42% faster N/A

As ratios (percent improvement above = 100% * (1 - 1/speedup_ratio))

Benchmark Compilation (wallclock) Execution (wallclock)
blake3-scalar 1.34x faster 1.38x faster
blake3-simd no diff no diff
meshoptimizer 1.24x faster 1.21x faster
pulldown-cmark 1.21x faster no diff
bz2 1.18x faster no diff
SpiderMonkey, 1.26x faster 1.02x faster
fib(30)
clang.wasm 1.71x faster N/A

Methodology:

Comparing baseline of Wasmtime fdf063df98ad3839b0e0b78ea55b53b1a296abb0 (from
Mar 16) against my internal regalloc2 branch
9b89942cf62d262ee9ac3e7eab525ea8544a458b (from Mar 17) which last synced with
Wasmtime at eb1b71e31c035ff4250c5013ca0268deb931aa7c (from Feb 24).

Raw output of Sightglass below (instantiation excluded, not interesting).


compilation :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 121531866.00 ± 51042761.18 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[478052996 501410277.40 591983000] new.so
[604955098 622942143.40 709527450] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 31981472.00 ± 13432120.92 (confidence = 99%)

new.so is 1.14x to 1.34x faster than old.so!
old.so is 0.72x to 0.89x faster than new.so!

[125802142 131948268.40 155782325] new.so
[159196645 163929740.40 186715328] old.so

execution :: nanoseconds :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 36931.50 ± 3272.72 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[105358 106660.00 110728] new.so
[140608 143591.50 149787] old.so

execution :: cycles :: benchmarks-next/blake3-scalar/benchmark.wasm

Δ = 140341.60 ± 12437.21 (confidence = 99%)

new.so is 1.32x to 1.38x faster than old.so!
old.so is 0.72x to 0.77x faster than new.so!

[400368 405315.60 420774] new.so
[534318 545657.20 569202] old.so


compilation :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[112727304 139448014.80 189082604] new.so
[123143218 156732493.40 233512432] old.so

compilation :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[29664800 36696541.20 49758219] new.so
[32405712 41244760.40 61449541] old.so

execution :: cycles :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[400672 739521.80 1042226] new.so
[498142 828791.40 1160786] old.so

execution :: nanoseconds :: benchmarks-next/blake3-simd/benchmark.wasm

No difference in performance.

[105439 194609.20 274267] new.so
[131088 218099.20 305464] old.so


compilation :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 483775336.20 ± 24646158.96 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[2090515508 2113482784.00 2150210240] new.so
[2554359582 2597258120.20 2630111328] old.so

compilation :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 127275628.40 ± 6480546.57 (confidence = 99%)

new.so is 1.22x to 1.24x faster than old.so!
old.so is 0.80x to 0.82x faster than new.so!

[550127669 556172437.60 565836581] new.so
[672188482 683448066.00 692063546] old.so

execution :: cycles :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 3386913742.00 ± 454568778.61 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[17786842514 17978520795.40 18352029814] new.so
[20863697992 21365434537.40 22139271504] old.so

execution :: nanoseconds :: benchmarks-next/meshoptimizer/benchmark.wasm

Δ = 891020039.40 ± 119694835.02 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[4680694128 4731128047.40 4829411387] new.so
[5489883512 5622148086.80 5826025212] old.so


compilation :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 213252595.20 ± 29303757.92 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[1120180378 1148350389.80 1203069094] new.so
[1340768136 1361602985.00 1397014596] old.so

compilation :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

Δ = 56118120.00 ± 7711578.76 (confidence = 99%)

new.so is 1.16x to 1.21x faster than old.so!
old.so is 0.82x to 0.86x faster than new.so!

[294780634 302193792.40 316593182] new.so
[352828441 358311912.40 367631343] old.so

execution :: cycles :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[8257780 8443755.80 8560944] new.so
[8455570 9495162.60 17648568] old.so

execution :: nanoseconds :: benchmarks-next/pulldown-cmark/benchmark.wasm

No difference in performance.

[2173072 2222013.50 2252853] new.so
[2225116 2498693.60 4644290] old.so


compilation :: cycles :: benchmarks-next/bz2/benchmark.wasm

Δ = 58684068.80 ± 36909440.37 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[498967588 545831464.20 586460840] new.so
[540660276 604515533.00 635005118] old.so

compilation :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

Δ = 15436153.00 ± 9714229.01 (confidence = 99%)

new.so is 1.04x to 1.18x faster than old.so!
old.so is 0.84x to 0.96x faster than new.so!

[131305387 143637939.40 154329874] new.so
[142264400 159074092.40 167089438] old.so

execution :: nanoseconds :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[25932760 35978222.50 53794238] new.so
[28960083 29737468.90 35137211] old.so

execution :: cycles :: benchmarks-next/bz2/benchmark.wasm

No difference in performance.

[98545894 136719075.20 204420658] new.so
[110059628 113008690.20 133522880] old.so
</pre>
</details>


Last updated: Nov 22 2024 at 16:03 UTC