gaaraw opened issue #13258:
Describe the bug
table.initappears to have a very expensive non-empty path in Wasmtime in a minimal repeated microbenchmark.I first found this in a generated differential benchmark, then reduced it to a much smaller testcase. The slowdown remains after removing loop-derived operand shaping and shrinking the table/element resources to the minimum needed.
The smallest clear reproducer I found is:
primary_reproducer_table_init_len1.watA close control with
len = 0is:
supporting_control_table_init_len0.watTest Case
Primary reproducer loop body:
i32.const 0 i32.const 0 i32.const 1 table.init 0 0Minimal resources:
(table $tab0 1 funcref) (elem funcref (ref.null func))Supporting controls:
supporting_control_table_init_len0.wat(len = 0)supporting_len2_table_init.wat(len = 2with table/elem size 2)supporting_table_fill_len1.watsupporting_table_copy_len1.watSteps to Reproduce
- Build the primary testcase:
wat2wasm --enable-all primary_reproducer_table_init_len1.wat -o primary_reproducer_table_init_len1.wasm
- Warm up once:
wasmtime primary_reproducer_table_init_len1.wasm
- Measure runtime:
perf stat -r 3 -e 'task-clock' wasmtime primary_reproducer_table_init_len1.wasm
- For comparison, run the same flow on:
supporting_control_table_init_len0.wasmsupporting_len2_table_init.wasmsupporting_table_fill_len1.wasmsupporting_table_copy_len1.wasmIf helpful, I can also provide the exact commands I used for the other runtimes in the comparison table.
Expected and actual Results
Primary reduced
table.initresults
testcase shape wasmer_llvm (s) wasmedge_jit (s) wamr_llvm_jit (s) wasmer_cranelift (s) wasmtime (s) wamr_fast_jit (s) const_len0 dst=0, src=0, len=0, table=1, elem=113.2085 6.2617 2.8532 13.4362 59.9080 3.2286 const_len1 dst=0, src=0, len=1, table=1, elem=113.8520 9.0505 4.1151 13.9670 99.9186 4.6532 const_len2 dst=0, src=0, len=2, table=2, elem=214.6396 9.0903 4.41133 14.6610 132.7836 4.9468 const_src1_len1 dst=0, src=1, len=1, table=2, elem=213.7660 9.0285 4.1430 14.1467 99.7570 4.6662 Observed pattern:
- Wasmtime is already much slower than the comparison runtimes for
len = 0.- The cost rises sharply for
len = 1and again forlen = 2.- Changing
srcfrom0to1does not materially change the result.Target-removed control
A target-removed control with the same outer loop / stack shaping but no
table.initis very fast:
testcase wasmer_llvm (s) wasmedge_jit (s) wamr_llvm_jit (s) wasmer_cranelift (s) wasmtime (s) wamr_fast_jit (s) control_no_target 0.011744 0.022739 0.015508 0.29056 0.28542 0.43075 So this does not look like a loop/scaffold artifact. The expensive part seems tied to
table.inititself.Related bulk-table instructions
I also compared matched
table.fill/table.copycases withlen = 1:
testcase wasmer_llvm (s) wasmedge_jit (s) wamr_llvm_jit (s) wasmer_cranelift (s) wasmtime (s) wamr_fast_jit (s) table.fill len=1 5.0919 4.89015 2.18633 5.36801 12.0544 2.6832 table.copy len=1 6.32213 8.8099 4.8734 6.64548 18.5358 6.4398 Wasmtime is not the fastest there either, but the slowdown is much less dramatic than for
table.init.So the anomaly looks more specific to
table.initthan to all small bulk-table operations in general.Versions and Environment
- Wasmtime version:
wasmtime 41.0.0 (4898322a4 2025-12-18)- wasmer: 6.1.0
- WAMR: iwasm 2.4.4
- wasmedge: 0.16.1-18-gc457fe30
- wabt: 1.0.39
- llvm: 21.1.5
- Host OS: Ubuntu 22.04.5 LTS x64
- CPU: 12th Gen Intel® Core™ i7-12700 × 20
If useful, I can also attach the generated CLIF for the reduced testcase.
Extra Info
For the reduced
const_len1testcase, Wasmtime still keeps the hot loop alive and still lowers the operation through thetable.initbuiltin/helper path.I generated CLIF with:
wasmtime compile -C cache=n --emit-clif out_dir primary_reproducer_table_init_len1.wasmIn the generated CLIF for the reduced case, the hot loop still contains a per-iteration call equivalent to:
call fn0(vmctx, 0, 0, 0, 0, 1)So this does not appear to be caused by dead-code elimination or by loop-derived operand shaping.
Based on the measurements, the strongest trigger condition I can currently support is:
- repeated
table.init 0 0- in-bounds
- minimal table / passive element segment
- especially the non-empty path (
len > 0)I have not confirmed the internal root cause, so I’m only reporting the measured trigger pattern here.
gaaraw added the bug label to Issue #13258.
gaaraw added the fuzz-bug label to Issue #13258.
Last updated: May 03 2026 at 22:13 UTC