wasmtime / issue #11330 The Different Performances of Cra... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #11330 The Different Performances of Cra...

Wasmtime GitHub notifications bot (Jul 27 2025 at 08:26):

Phenomenon description

Hello, I have observed different temporal variations between different runtime tools, related to the backend they are using. The specific performance is as follows:

pr19005.wasm pr19005_test.wasm multiple

wasmtime 0.007931 0.250915 31.6x

wasmer_cranelift 0.013363 0.262407 19.6x

wasmer_llvm 0.014017 0.018488 1.3x

wasmedge_jit 0.020752 0.027414 1.3x

wamr_llvm_jit 0.016472 0.024946 1.5x

The data is in seconds, and each data is the result of ten executions and averages.

The first two use cranelift as the backend, and the last three use llvm as the backend.
There are five files in the test_case.zip, and when you execute pr19005.wasm compiled from C source code and pr19005_test.wasm obtained after simple modifications, the results are shown in the table, and the time increase multiple is marked in the last column.

<details>
<summary> Modification process: </summary>
# pr19005.wat Lines 17-26
17          (loop  ;; label = @3
18            (br_if 1 (;@2;)
19              (local.get 1))
20            (br_if 0 (;@3;)
21              (i32.ne
22                (local.tee 0
23                  (i32.add
24                    (local.get 0)
25                    (i32.const 12)))
26                (i32.const 266))))
Copy the loop and paste right next to it
# pr19005_test.wat Lines 17-36
17          (loop  ;; label = @3
18            (br_if 1 (;@2;)
19              (local.get 1))
20            (br_if 0 (;@3;)
21              (i32.ne
22                (local.tee 0
23                  (i32.add
24                    (local.get 0)
25                    (i32.const 12)))
26                (i32.const 266))))
27          (loop  ;; label = @3
28            (br_if 1 (;@2;)
29              (local.get 1))
30            (br_if 0 (;@3;)
31              (i32.ne
32                (local.tee 0
33                  (i32.add
34                    (local.get 0)
35                    (i32.const 12)))
36                (i32.const 266))))
</details>

Test Case

test_case.zip

Steps to Reproduce
# Compile C files
emcc pr19005.c -o pr19005.wasm -O2 -s WASM=1

wasm2wat -f pr19005.wasm -o pr19005.wat

wat2wasm pr19005_test.wat -o pr19005_test.wasm

# Execute the wasm file and collect data
perf stat -r 10 -e 'task-clock' /path/to/wasmtime pr19005.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmer run pr19005.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmer run --llvm pr19005.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmedge --enable-jit pr19005.wasm
perf stat -r 10 -e 'task-clock' /path/to/build_llvm_jit/iwasm pr19005.wasm
# pr19005_test.wasm is the same
Analysis

The added second loop doesn't really work in the use case. Before the second loop is executed, the value of local 0 is equal to 266, so local 0 in the second loop can only exit the loop after the loop exceeds 2<sup>32</sup> and returns to itself.
For this change, I observed that the time change was not noticeable when using LLVM as the backend, but there was a significant increase in time when using cranelift as the backend.

I suspect this may involve two backend optimization strategies for the code.

Versions and Environment

The runtime tools are all built on release and use JITmode.

wasmtime: 35.0.0 (9c2e6f17c 2025-06-17)

wasmer: 6.0.1

wasmedge: 0.14.1

WAMR: iwasm 2.4.0

emcc: 4.0.10 (b7dc6e5747465580df5984e723b9d1f10d8e804b)

wabt: 1.0.27

llvm: 18.1.8

Host OS: Ubuntu 22.04.5 LTS x64

CPU: 12th Gen Intel® Core™ i7-12700 × 20

Extra Info

If you need any other relevant information, please let me know and I will do my best to provide it. Looking forward to your reply! Thank you!

	pr19005.wasm	pr19005_test.wasm	multiple
wasmtime	0.007931	0.250915	31.6x
wasmer_cranelift	0.013363	0.262407	19.6x
wasmer_llvm	0.014017	0.018488	1.3x
wasmedge_jit	0.020752	0.027414	1.3x
wamr_llvm_jit	0.016472	0.024946	1.5x

Wasmtime GitHub notifications bot (Jul 28 2025 at 14:43):

alexcrichton closed issue #11330:

Phenomenon description

Hello, I have observed different temporal variations between different runtime tools, related to the backend they are using. The specific performance is as follows:

pr19005.wasm pr19005_test.wasm multiple

wasmtime 0.007931 0.250915 31.6x

wasmer_cranelift 0.013363 0.262407 19.6x

wasmer_llvm 0.014017 0.018488 1.3x

wasmedge_jit 0.020752 0.027414 1.3x

wamr_llvm_jit 0.016472 0.024946 1.5x

The data is in seconds, and each data is the result of ten executions and averages.

The first two use cranelift as the backend, and the last three use llvm as the backend.
There are five files in the test_case.zip, and when you execute pr19005.wasm compiled from C source code and pr19005_test.wasm obtained after simple modifications, the results are shown in the table, and the time increase multiple is marked in the last column.

<details>
<summary> Modification process: </summary>
# pr19005.wat Lines 17-26
17          (loop  ;; label = @3
18            (br_if 1 (;@2;)
19              (local.get 1))
20            (br_if 0 (;@3;)
21              (i32.ne
22                (local.tee 0
23                  (i32.add
24                    (local.get 0)
25                    (i32.const 12)))
26                (i32.const 266))))
Copy the loop and paste right next to it
# pr19005_test.wat Lines 17-36
17          (loop  ;; label = @3
18            (br_if 1 (;@2;)
19              (local.get 1))
20            (br_if 0 (;@3;)
21              (i32.ne
22                (local.tee 0
23                  (i32.add
24                    (local.get 0)
25                    (i32.const 12)))
26                (i32.const 266))))
27          (loop  ;; label = @3
28            (br_if 1 (;@2;)
29              (local.get 1))
30            (br_if 0 (;@3;)
31              (i32.ne
32                (local.tee 0
33                  (i32.add
34                    (local.get 0)
35                    (i32.const 12)))
36                (i32.const 266))))
</details>

Test Case

test_case.zip

Steps to Reproduce
# Compile C files
emcc pr19005.c -o pr19005.wasm -O2 -s WASM=1

wasm2wat -f pr19005.wasm -o pr19005.wat

wat2wasm pr19005_test.wat -o pr19005_test.wasm

# Execute the wasm file and collect data
perf stat -r 10 -e 'task-clock' /path/to/wasmtime pr19005.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmer run pr19005.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmer run --llvm pr19005.wasm
perf stat -r 10 -e 'task-clock' /path/to/wasmedge --enable-jit pr19005.wasm
perf stat -r 10 -e 'task-clock' /path/to/build_llvm_jit/iwasm pr19005.wasm
# pr19005_test.wasm is the same
Analysis

The added second loop doesn't really work in the use case. Before the second loop is executed, the value of local 0 is equal to 266, so local 0 in the second loop can only exit the loop after the loop exceeds 2<sup>32</sup> and returns to itself.
For this change, I observed that the time change was not noticeable when using LLVM as the backend, but there was a significant increase in time when using cranelift as the backend.

I suspect this may involve two backend optimization strategies for the code.

Versions and Environment

The runtime tools are all built on release and use JITmode.

wasmtime: 35.0.0 (9c2e6f17c 2025-06-17)

wasmer: 6.0.1

wasmedge: 0.14.1

WAMR: iwasm 2.4.0

emcc: 4.0.10 (b7dc6e5747465580df5984e723b9d1f10d8e804b)

wabt: 1.0.27

llvm: 18.1.8

Host OS: Ubuntu 22.04.5 LTS x64

CPU: 12th Gen Intel® Core™ i7-12700 × 20

Extra Info

If you need any other relevant information, please let me know and I will do my best to provide it. Looking forward to your reply! Thank you!

	pr19005.wasm	pr19005_test.wasm	multiple
wasmtime	0.007931	0.250915	31.6x
wasmer_cranelift	0.013363	0.262407	19.6x
wasmer_llvm	0.014017	0.018488	1.3x
wasmedge_jit	0.020752	0.027414	1.3x
wamr_llvm_jit	0.016472	0.024946	1.5x

Wasmtime GitHub notifications bot (Jul 28 2025 at 14:43):

alexcrichton commented on issue #11330:

Thanks for the report, but Wasmtime and Cranelift, like peer runtimes on the web, are designed to efficiently run ahead-of-time optimized WebAssembly code. It's expected that you can craft modules that natively-optimized code will perform much better on. Cranelift does not attempt to be as optimizing of a compiler as LLVM, instead we rely on LLVM to do said optimizations and Cranelift takes it the "final mile" to machine code.

I'm going to close this because arbitrary modifications of WebAssembly files are known to produce un-optimized code. I understand that LLVM can see through these modifications and make things fast, but it's not a bug that Cranelift does not. Cranelift is intended to compiler-optimized code, and in this situation you're generating non-compiler-optimized code.

It's also worth noting that the reason Cranelift has this approach is that all modules in practice are optimized by LLVM. There is no use case we are aware of for intentionally not running an optimizing compiler like LLVM while also expecting the code to rust fast in Wasmtime.

Wasmtime GitHub notifications bot (Jul 29 2025 at 02:44):

gaaraw commented on issue #11330:

Thank you for your answer, it has deepened my understanding of cranelift.

Wasmtime GitHub notifications bot (Jul 29 2025 at 04:55):

pchickey edited a comment on issue #11330:

Thanks for the report, but Wasmtime and Cranelift, like peer runtimes on the web, are designed to efficiently run ahead-of-time optimized WebAssembly code. It's expected that you can craft modules that natively-optimized code will perform much better on. Cranelift does not attempt to be as optimizing of a compiler as LLVM, instead we rely on LLVM to do said optimizations and Cranelift takes it the "final mile" to machine code.

I'm going to close this because arbitrary modifications of WebAssembly files are known to produce un-optimized code. I understand that LLVM can see through these modifications and make things fast, but it's not a bug that Cranelift does not. Cranelift is intended to compiler-optimized code, and in this situation you're generating non-compiler-optimized code.

It's also worth noting that the reason Cranelift has this approach is that all modules in practice are optimized by LLVM. There is no use case we are aware of for intentionally not running an optimizing compiler like LLVM while also expecting the code to run fast in Wasmtime.

Last updated: Feb 24 2026 at 06:21 UTC