Stream: git-wasmtime

Topic: wasmtime / issue #12139 Cranelift: ISLE mid-end performan...


view this post on Zulip Wasmtime GitHub notifications bot (Dec 08 2025 at 13:33):

bongjunj opened issue #12139:

Hi,

this is a follow-up of https://github.com/bytecodealliance/wasmtime/issues/12106 .

Although we've removed two sorts of performance regressing mid-end ISLE rules,
there still remains a significant performance degradation as well as other suspected cases.
(There is, of course, a bright side: we have significant performance improvements for many cases!)

Performance Regression:

First, here is the backing data of the performance regression:

Benchmark No Opt Main Main Speedup
blake3-scalar 317,727 317,719 0.00%
blake3-simd 313,115 306,232 2.25%
bz2 87,201,400 86,337,330 1.00%
pulldown-cmark 6,580,174 6,905,992 -4.72%
regex 209,743,816 210,183,175 -0.21%
shootout-ackermann 8,498,140 7,764,439 9.45%
shootout-base64 381,721,177 352,724,661 8.22%
shootout-ctype 830,813,398 796,486,698 4.31%
shootout-ed25519 9,583,747,723 9,395,321,203 2.01%
shootout-fib2 3,009,269,670 3,010,509,565 -0.04%
shootout-gimli 5,338,258 5,401,697 -1.17%
shootout-heapsort 2,382,073,831 2,375,914,107 0.26%
shootout-keccak 25,168,386 21,112,482 19.21%
shootout-matrix 538,696,036 544,739,691 -1.11%
shootout-memmove 36,156,621 36,115,998 0.11%
shootout-minicsv 1,481,713,625 1,291,534,227 14.73%
shootout-nestedloop 449 442 1.43%
shootout-random 630,328,205 439,691,474 43.36%
shootout-ratelimit 39,148,817 39,956,714 -2.02%
shootout-seqhash 8,869,585,125 8,639,110,150 2.67%
shootout-sieve 905,404,028 840,777,681 7.69%
shootout-switch 139,525,474 153,663,682 -9.20%
shootout-xblabla20 2,891,404 2,907,369 -0.55%
shootout-xchacha20 4,384,703 4,395,319 -0.24%
spidermonkey 636,104,785 631,998,404 0.65%

Unlike the previous cases, the cause is not obvious.

19245 clif/v-no-opts/shootout-switch/wasm[0]--function[9]--__original_main.clif
19241 clif/v-main/shootout-switch/wasm[0]--function[9]--__original_main.clif

The number of instructions does not increase significantly from no-opt to main.
However, the applied optimizations make the program use long-lived value:

--- /data/bongjun/clif/v-no-opts/shootout-switch/wasm[0]--function[9]--__original_main.clif 2025-12-08 12:43:58.406738645 +0000
+++ /data/bongjun/clif/v-main/shootout-switch/wasm[0]--function[9]--__original_main.clif    2025-12-08 12:49:01.961085326 +0000

-                                    v8572 = iconst.i32 1066
-                                    v8573 = iconst.i32 0
-@d20b                               v4324 = call fn1(v0, v0, v8572, v8573)  ; v8572 = 1066, v8573 = 0
-                                    v8574 = iadd.i64 v105, v106  ; v106 = 3584
-@d219                               v4333 = load.i32 little heap v8574
-                                    v8575 = iconst.i32 6
-                                    v8576 = icmp uge v4333, v8575  ; v8575 = 6
+                                    v8603 = iconst.i32 1066
+                                    v8604 = iconst.i32 0
+@d20b                               v4324 = call fn1(v0, v0, v8603, v8604)  ; v8603 = 1066, v8604 = 0
+                                    v8605 = iadd.i64 v11, v106  ; v106 = 3584
+@d219                               v4333 = load.i32 little heap v8605
+                                    v8606 = iconst.i32 6
+                                    v8607 = icmp uge v4333, v8606  ; v8606 = 6

See v8574 and v8605 which uses v105 and v11.
v11 is defined at the beginning, but v105 is defined later than v11:

                                block0(v0: i64, v1: i64):
@01f0                               v5 = load.i32 notrap aligned table v0+256
@01f6                               v6 = iconst.i32 16
@01f8                               v7 = isub v5, v6  ; v6 = 16
@01fb                               store notrap aligned table v7, v0+256
@0203                               v9 = iconst.i32 0x2710
@0207                               v11 = load.i64 notrap aligned readonly can_move checked v0+56

...

@02d6                               v105 = iadd.i64 v11, v4439

This might increase the register pressure, causing more spills which can degrade the performance.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 08 2025 at 14:56):

alexcrichton added the cranelift label to Issue #12139.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 08 2025 at 14:56):

alexcrichton added the cranelift:goal:optimize-speed label to Issue #12139.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 08 2025 at 14:56):

alexcrichton added the performance label to Issue #12139.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 12 2025 at 11:32):

bongjunj commented on issue #12139:

Investigating the generated CLIF for shootout-switch case, I noticed there are over 2,500 unused constants in a block:
https://gist.github.com/bongjunj/08cf48d4e5827cf8ed270f26442e2604#file-main-clif-L216-L240 .

The constants defined in the block are never referenced anywhere in the function.
Is this a intended behavior for translating switch?

To repro this, run wasmtime compile sightglass/benchmarks/shootout/shootut-switch.wasm --emit-clif <dir> and look up the shootout-switch/wasm\[0\]--function\[9\]--__original_main.clif file.

cc @cfallin

view this post on Zulip Wasmtime GitHub notifications bot (Dec 12 2025 at 11:32):

bongjunj edited a comment on issue #12139:

Investigating the generated CLIF for shootout-switch case, I noticed there are over 2,500 unused constants in a block:
https://gist.github.com/bongjunj/08cf48d4e5827cf8ed270f26442e2604#file-main-clif-L216-L240 .
For example, look up v185 in the CLIF file. None are matched.

The constants defined in the block are never referenced anywhere in the function.
Is this a intended behavior for translating switch?

To repro this, run wasmtime compile sightglass/benchmarks/shootout/shootut-switch.wasm --emit-clif <dir> and look up the shootout-switch/wasm\[0\]--function\[9\]--__original_main.clif file.

cc @cfallin

view this post on Zulip Wasmtime GitHub notifications bot (Dec 12 2025 at 12:00):

bongjunj edited a comment on issue #12139:

Investigating the generated CLIF for shootout-switch case, I noticed there are over 2,500 unused constants in a block:
https://gist.github.com/bongjunj/08cf48d4e5827cf8ed270f26442e2604#file-main-clif-L216-L240 .
For example, look up v185 in the CLIF file. None are matched.

This not only happens in the specified block, but also in another block too:

                                block3:
                                    v4434 = iconst.i32 16
                                    v4435 = iadd.i32 v16, v4434  ; v4434 = 16
                                    v4436 = iconst.i32 0
@0236                               v31 = iconst.i32 7
@0231                               v28 = iconst.i32 12
@0243                               v38 = iconst.i32 6
@023e                               v36 = iconst.i32 8
@0250                               v45 = iconst.i32 5
@024b                               v43 = iconst.i32 4
@0267                               v57 = iconst.i32 3
@0262                               v55 = iconst.i32 -4
@0274                               v64 = iconst.i32 2
@026f                               v62 = iconst.i32 -8
@0281                               v71 = iconst.i32 1
@027c                               v69 = iconst.i32 -12
@0289                               v76 = iconst.i32 -16
                                    v4409 = iconst.i32 9992
@0293                               v81 = iconst.i32 32
@022d                               jump block4(v4435, v4436)  ; v4436 = 0

...

@028e                               v78 = uextend.i64 v4463
@028e                               v80 = iadd.i64 v11, v78
@028e                               store little heap v30, v80
                                    v4413 = icmp ne v30, v4409  ; v4409 = 9992

Surprisingly, none of the constants from v31 to v76 in the block are used anywhere in the function.
The constants defined in the block are never referenced anywhere in the function.
Is this a intended behavior for translating switch?

Furthermore, the constant v4409 that is newly generated as a result of optimization is not placed close enough to its usage v4413. I see no valid reason to instantiate such constant distant from its only usage.

To repro this, run wasmtime compile sightglass/benchmarks/shootout/shootut-switch.wasm --emit-clif <dir> and look up the shootout-switch/wasm\[0\]--function\[9\]--__original_main.clif file.

cc @cfallin


Last updated: Dec 13 2025 at 19:03 UTC