One of our students compared different Wasm runtime execution times and their results surprised us:
We are currently trying to figure out how it is possible that WAMR JIT and AOT performs so much better than native, so we performed multiple new benchmarks.
Our own results show the same weird behavior that we cannot explain: WAMR AOT and LLVM JIT performs 3-10x better than the native application for certain benchmarks.
Thinking we did something wrong with the native application or the Wasm binaries, we also used the provided WAMR benchmarks from https://github.com/bytecodealliance/wasm-micro-runtime/tree/main/tests/benchmarks/polybench, but with the same results:
These are the results for iwasm compiled with LLVM JIT
ubuntu@benchmark:~/wasm-micro-runtime/tests/benchmarks/polybench$ cat report_jit_llvm_lazy.txt
native iwasm-jit-llvm-lazy
2mm 39.07 8.90
3mm 63.17 9.30
adi 44.85 25.21
atax 0.07 1.09
bicg 0.08 1.10
cholesky 247.97 42.25
correlation 48.77 6.10
covariance 48.86 5.89
deriche 1.79 1.06
doitgen 1.20 2.26
durbin 0.02 1.08
fdtd-2d 5.35 7.93
floyd-warshall 49.97 47.44
gemm 1.36 3.59
gemver 0.21 0.99
gesummv 0.06 0.98
gramschmidt 90.00 12.94
heat-3d 6.36 14.82
jacobi-1d 0.02 1.05
jacobi-2d 4.68 10.84
ludcmp 298.59 50.20
lu 300.00 49.60
mvt 0.22 1.10
nussinov 59.15 9.16
seidel-2d 31.64 32.34
symm 30.06 5.58
syr2k 37.46 5.61
syrk 13.12 2.97
trisolv 0.05 1.04
trmm 29.15 3.14
These are the results with AOT binaries
ubuntu@benchmark:~/wasm-micro-runtime/tests/benchmarks/polybench$ cat report_aot.txt
native iwasm-aot iwasm-aot-segue
2mm 39.88 4.72 1.28
3mm 62.53 8.32 1.26
adi 44.73 23.21 1.30
atax 0.08 0.11 1.29
bicg 0.10 0.11 1.27
cholesky 244.51 49.66 1.31
correlation 47.32 4.98 1.26
covariance 47.52 4.94 1.26
deriche 1.70 0.46 1.27
doitgen 1.19 1.42 1.28
durbin 0.02 0.03 1.28
fdtd-2d 5.28 7.14 1.29
floyd-warshall 47.67 46.35 1.29
gemm 1.34 2.87 1.26
gemver 0.22 0.13 1.26
gesummv 0.07 0.09 1.26
gramschmidt 87.81 12.08 1.25
heat-3d 6.27 14.41 1.28
jacobi-1d 0.02 0.03 1.26
jacobi-2d 5.74 10.50 1.26
ludcmp 295.11 46.20 1.28
lu 297.54 48.62 1.50
mvt 0.20 0.11 1.28
nussinov 55.83 8.54 1.27
seidel-2d 31.16 31.01 1.31
symm 29.39 4.53 1.26
syr2k 35.19 4.95 1.26
syrk 12.60 2.29 1.28
trisolv 0.05 0.07 1.26
trmm 27.30 2.43 1.27
At first we thought this might have to do something with SIMD optimizations, but we can't quite figure it out. We also stumbled upon a similar issue here, but no definitive answer: https://github.com/bytecodealliance/wasm-micro-runtime/issues/4292
We tried to disable SIMD when building iwasm and wamrc using the -DWAMR_BUILD_SIMD=0 flag, but it is not clear to us if this disables native SIMD instructions, or if this disables the Wasm 128-bit SIMD proposal (https://github.com/WebAssembly/spec/blob/main/proposals/simd/SIMD.md)? AOT binaries compiled with wamrc that was compiled using -DWAMR_BUILD_SIMD=0 still contain native SIMD instructions when looking at the object with objdump.
Are we missing or misunderstanding something? All suggestions are welcome.
Once we figure this out, we will move to more realistic benchmarks.
Has similar vibes to some benchmarks that ran faster as wasm2c compiled native vs source compiled native where the 32-bit pointer space was believed to give a boost: https://kripken.github.io/talks/2020/universal.html#/26/0/0
Last updated: Dec 06 2025 at 06:05 UTC