Stream: wamr

Topic: Unexpected PolyBench results


view this post on Zulip Michiel Van Kenhove (imec-UGent) (Jul 02 2025 at 15:08):

One of our students compared different Wasm runtime execution times and their results surprised us:

image.png

We are currently trying to figure out how it is possible that WAMR JIT and AOT performs so much better than native, so we performed multiple new benchmarks.

Our own results show the same weird behavior that we cannot explain: WAMR AOT and LLVM JIT performs 3-10x better than the native application for certain benchmarks.

Thinking we did something wrong with the native application or the Wasm binaries, we also used the provided WAMR benchmarks from https://github.com/bytecodealliance/wasm-micro-runtime/tree/main/tests/benchmarks/polybench, but with the same results:

These are the results for iwasm compiled with LLVM JIT

ubuntu@benchmark:~/wasm-micro-runtime/tests/benchmarks/polybench$ cat report_jit_llvm_lazy.txt
                        native  iwasm-jit-llvm-lazy
2mm                     39.07   8.90
3mm                     63.17   9.30
adi                     44.85   25.21
atax                    0.07    1.09
bicg                    0.08    1.10
cholesky                247.97  42.25
correlation             48.77   6.10
covariance              48.86   5.89
deriche                 1.79    1.06
doitgen                 1.20    2.26
durbin                  0.02    1.08
fdtd-2d                 5.35    7.93
floyd-warshall          49.97   47.44
gemm                    1.36    3.59
gemver                  0.21    0.99
gesummv                 0.06    0.98
gramschmidt             90.00   12.94
heat-3d                 6.36    14.82
jacobi-1d               0.02    1.05
jacobi-2d               4.68    10.84
ludcmp                  298.59  50.20
lu                      300.00  49.60
mvt                     0.22    1.10
nussinov                59.15   9.16
seidel-2d               31.64   32.34
symm                    30.06   5.58
syr2k                   37.46   5.61
syrk                    13.12   2.97
trisolv                 0.05    1.04
trmm                    29.15   3.14

These are the results with AOT binaries

ubuntu@benchmark:~/wasm-micro-runtime/tests/benchmarks/polybench$ cat report_aot.txt
                        native  iwasm-aot       iwasm-aot-segue
2mm                     39.88   4.72            1.28
3mm                     62.53   8.32            1.26
adi                     44.73   23.21           1.30
atax                    0.08    0.11            1.29
bicg                    0.10    0.11            1.27
cholesky                244.51  49.66           1.31
correlation             47.32   4.98            1.26
covariance              47.52   4.94            1.26
deriche                 1.70    0.46            1.27
doitgen                 1.19    1.42            1.28
durbin                  0.02    0.03            1.28
fdtd-2d                 5.28    7.14            1.29
floyd-warshall          47.67   46.35           1.29
gemm                    1.34    2.87            1.26
gemver                  0.22    0.13            1.26
gesummv                 0.07    0.09            1.26
gramschmidt             87.81   12.08           1.25
heat-3d                 6.27    14.41           1.28
jacobi-1d               0.02    0.03            1.26
jacobi-2d               5.74    10.50           1.26
ludcmp                  295.11  46.20           1.28
lu                      297.54  48.62           1.50
mvt                     0.20    0.11            1.28
nussinov                55.83   8.54            1.27
seidel-2d               31.16   31.01           1.31
symm                    29.39   4.53            1.26
syr2k                   35.19   4.95            1.26
syrk                    12.60   2.29            1.28
trisolv                 0.05    0.07            1.26
trmm                    27.30   2.43            1.27

At first we thought this might have to do something with SIMD optimizations, but we can't quite figure it out. We also stumbled upon a similar issue here, but no definitive answer: https://github.com/bytecodealliance/wasm-micro-runtime/issues/4292

We tried to disable SIMD when building iwasm and wamrc using the -DWAMR_BUILD_SIMD=0 flag, but it is not clear to us if this disables native SIMD instructions, or if this disables the Wasm 128-bit SIMD proposal (https://github.com/WebAssembly/spec/blob/main/proposals/simd/SIMD.md)? AOT binaries compiled with wamrc that was compiled using -DWAMR_BUILD_SIMD=0 still contain native SIMD instructions when looking at the object with objdump.

Are we missing or misunderstanding something? All suggestions are welcome.

Once we figure this out, we will move to more realistic benchmarks.

WebAssembly Micro Runtime (WAMR). Contribute to bytecodealliance/wasm-micro-runtime development by creating an account on GitHub.
Subject of the issue I am baffled with the results I get from the polybench benchmarks. I will try to attach a graph of my results, but basically, in all but 5 of the benchmarks, AOT outperforms na...
WebAssembly specification, reference interpreter, and test suite. - WebAssembly/spec

view this post on Zulip Milan (rajsite) (Jul 02 2025 at 18:08):

Has similar vibes to some benchmarks that ran faster as wasm2c compiled native vs source compiled native where the 32-bit pointer space was believed to give a boost: https://kripken.github.io/talks/2020/universal.html#/26/0/0


Last updated: Dec 06 2025 at 06:05 UTC