gaaraw opened issue #11330:
Phenomenon description
Hello, I have observed different temporal variations between different runtime tools, related to the backend they are using. The specific performance is as follows:
pr19005.wasm pr19005_test.wasm multiple wasmtime 0.007931 0.250915 31.6x wasmer_cranelift 0.013363 0.262407 19.6x wasmer_llvm 0.014017 0.018488 1.3x wasmedge_jit 0.020752 0.027414 1.3x wamr_llvm_jit 0.016472 0.024946 1.5x The data is in seconds, and each data is the result of ten executions and averages.
The first two use
craneliftas the backend, and the last three usellvmas the backend.
There are five files in thetest_case.zip, and when you executepr19005.wasmcompiled from C source code andpr19005_test.wasmobtained after simple modifications, the results are shown in the table, and the time increase multiple is marked in the last column.<details>
<summary> Modification process: </summary># pr19005.wat Lines 17-26 17 (loop ;; label = @3 18 (br_if 1 (;@2;) 19 (local.get 1)) 20 (br_if 0 (;@3;) 21 (i32.ne 22 (local.tee 0 23 (i32.add 24 (local.get 0) 25 (i32.const 12))) 26 (i32.const 266))))Copy the loop and paste right next to it
# pr19005_test.wat Lines 17-36 17 (loop ;; label = @3 18 (br_if 1 (;@2;) 19 (local.get 1)) 20 (br_if 0 (;@3;) 21 (i32.ne 22 (local.tee 0 23 (i32.add 24 (local.get 0) 25 (i32.const 12))) 26 (i32.const 266)))) 27 (loop ;; label = @3 28 (br_if 1 (;@2;) 29 (local.get 1)) 30 (br_if 0 (;@3;) 31 (i32.ne 32 (local.tee 0 33 (i32.add 34 (local.get 0) 35 (i32.const 12))) 36 (i32.const 266))))</details>
Test Case
Steps to Reproduce
# Compile C files emcc pr19005.c -o pr19005.wasm -O2 -s WASM=1 wasm2wat -f pr19005.wasm -o pr19005.wat wat2wasm pr19005_test.wat -o pr19005_test.wasm # Execute the wasm file and collect data perf stat -r 10 -e 'task-clock' /path/to/wasmtime pr19005.wasm perf stat -r 10 -e 'task-clock' /path/to/wasmer run pr19005.wasm perf stat -r 10 -e 'task-clock' /path/to/wasmer run --llvm pr19005.wasm perf stat -r 10 -e 'task-clock' /path/to/wasmedge --enable-jit pr19005.wasm perf stat -r 10 -e 'task-clock' /path/to/build_llvm_jit/iwasm pr19005.wasm # pr19005_test.wasm is the sameAnalysis
The added second loop doesn't really work in the use case. Before the second loop is executed, the value of
local 0is equal to 266, solocal 0in the second loop can only exit the loop after the loop exceeds 2<sup>32</sup> and returns to itself.
For this change, I observed that the time change was not noticeable when usingLLVMas the backend, but there was a significant increase in time when usingcraneliftas the backend.I suspect this may involve two backend optimization strategies for the code.
Versions and Environment
The runtime tools are all built on release and use
JITmode.
- wasmtime: 35.0.0 (9c2e6f17c 2025-06-17)
- wasmer: 6.0.1
- wasmedge: 0.14.1
- WAMR: iwasm 2.4.0
- emcc: 4.0.10 (b7dc6e5747465580df5984e723b9d1f10d8e804b)
- wabt: 1.0.27
- llvm: 18.1.8
- Host OS: Ubuntu 22.04.5 LTS x64
- CPU: 12th Gen Intel® Core™ i7-12700 × 20
Extra Info
If you need any other relevant information, please let me know and I will do my best to provide it. Looking forward to your reply! Thank you!
alexcrichton closed issue #11330:
Phenomenon description
Hello, I have observed different temporal variations between different runtime tools, related to the backend they are using. The specific performance is as follows:
pr19005.wasm pr19005_test.wasm multiple wasmtime 0.007931 0.250915 31.6x wasmer_cranelift 0.013363 0.262407 19.6x wasmer_llvm 0.014017 0.018488 1.3x wasmedge_jit 0.020752 0.027414 1.3x wamr_llvm_jit 0.016472 0.024946 1.5x The data is in seconds, and each data is the result of ten executions and averages.
The first two use
craneliftas the backend, and the last three usellvmas the backend.
There are five files in thetest_case.zip, and when you executepr19005.wasmcompiled from C source code andpr19005_test.wasmobtained after simple modifications, the results are shown in the table, and the time increase multiple is marked in the last column.<details>
<summary> Modification process: </summary># pr19005.wat Lines 17-26 17 (loop ;; label = @3 18 (br_if 1 (;@2;) 19 (local.get 1)) 20 (br_if 0 (;@3;) 21 (i32.ne 22 (local.tee 0 23 (i32.add 24 (local.get 0) 25 (i32.const 12))) 26 (i32.const 266))))Copy the loop and paste right next to it
# pr19005_test.wat Lines 17-36 17 (loop ;; label = @3 18 (br_if 1 (;@2;) 19 (local.get 1)) 20 (br_if 0 (;@3;) 21 (i32.ne 22 (local.tee 0 23 (i32.add 24 (local.get 0) 25 (i32.const 12))) 26 (i32.const 266)))) 27 (loop ;; label = @3 28 (br_if 1 (;@2;) 29 (local.get 1)) 30 (br_if 0 (;@3;) 31 (i32.ne 32 (local.tee 0 33 (i32.add 34 (local.get 0) 35 (i32.const 12))) 36 (i32.const 266))))</details>
Test Case
Steps to Reproduce
# Compile C files emcc pr19005.c -o pr19005.wasm -O2 -s WASM=1 wasm2wat -f pr19005.wasm -o pr19005.wat wat2wasm pr19005_test.wat -o pr19005_test.wasm # Execute the wasm file and collect data perf stat -r 10 -e 'task-clock' /path/to/wasmtime pr19005.wasm perf stat -r 10 -e 'task-clock' /path/to/wasmer run pr19005.wasm perf stat -r 10 -e 'task-clock' /path/to/wasmer run --llvm pr19005.wasm perf stat -r 10 -e 'task-clock' /path/to/wasmedge --enable-jit pr19005.wasm perf stat -r 10 -e 'task-clock' /path/to/build_llvm_jit/iwasm pr19005.wasm # pr19005_test.wasm is the sameAnalysis
The added second loop doesn't really work in the use case. Before the second loop is executed, the value of
local 0is equal to 266, solocal 0in the second loop can only exit the loop after the loop exceeds 2<sup>32</sup> and returns to itself.
For this change, I observed that the time change was not noticeable when usingLLVMas the backend, but there was a significant increase in time when usingcraneliftas the backend.I suspect this may involve two backend optimization strategies for the code.
Versions and Environment
The runtime tools are all built on release and use
JITmode.
- wasmtime: 35.0.0 (9c2e6f17c 2025-06-17)
- wasmer: 6.0.1
- wasmedge: 0.14.1
- WAMR: iwasm 2.4.0
- emcc: 4.0.10 (b7dc6e5747465580df5984e723b9d1f10d8e804b)
- wabt: 1.0.27
- llvm: 18.1.8
- Host OS: Ubuntu 22.04.5 LTS x64
- CPU: 12th Gen Intel® Core™ i7-12700 × 20
Extra Info
If you need any other relevant information, please let me know and I will do my best to provide it. Looking forward to your reply! Thank you!
alexcrichton commented on issue #11330:
Thanks for the report, but Wasmtime and Cranelift, like peer runtimes on the web, are designed to efficiently run ahead-of-time optimized WebAssembly code. It's expected that you can craft modules that natively-optimized code will perform much better on. Cranelift does not attempt to be as optimizing of a compiler as LLVM, instead we rely on LLVM to do said optimizations and Cranelift takes it the "final mile" to machine code.
I'm going to close this because arbitrary modifications of WebAssembly files are known to produce un-optimized code. I understand that LLVM can see through these modifications and make things fast, but it's not a bug that Cranelift does not. Cranelift is intended to compiler-optimized code, and in this situation you're generating non-compiler-optimized code.
It's also worth noting that the reason Cranelift has this approach is that all modules in practice are optimized by LLVM. There is no use case we are aware of for intentionally not running an optimizing compiler like LLVM while also expecting the code to rust fast in Wasmtime.
gaaraw commented on issue #11330:
Thank you for your answer, it has deepened my understanding of cranelift.
pchickey edited a comment on issue #11330:
Thanks for the report, but Wasmtime and Cranelift, like peer runtimes on the web, are designed to efficiently run ahead-of-time optimized WebAssembly code. It's expected that you can craft modules that natively-optimized code will perform much better on. Cranelift does not attempt to be as optimizing of a compiler as LLVM, instead we rely on LLVM to do said optimizations and Cranelift takes it the "final mile" to machine code.
I'm going to close this because arbitrary modifications of WebAssembly files are known to produce un-optimized code. I understand that LLVM can see through these modifications and make things fast, but it's not a bug that Cranelift does not. Cranelift is intended to compiler-optimized code, and in this situation you're generating non-compiler-optimized code.
It's also worth noting that the reason Cranelift has this approach is that all modules in practice are optimized by LLVM. There is no use case we are aware of for intentionally not running an optimizing compiler like LLVM while also expecting the code to run fast in Wasmtime.
Last updated: Dec 06 2025 at 06:05 UTC