Stream: git-wasmtime

Topic: wasmtime / issue #13258 <Performance> fuzzbug: `table.ini...


view this post on Zulip Wasmtime GitHub notifications bot (May 03 2026 at 11:12):

gaaraw opened issue #13258:

Describe the bug

table.init appears to have a very expensive non-empty path in Wasmtime in a minimal repeated microbenchmark.

I first found this in a generated differential benchmark, then reduced it to a much smaller testcase. The slowdown remains after removing loop-derived operand shaping and shrinking the table/element resources to the minimum needed.

The smallest clear reproducer I found is:

A close control with len = 0 is:

Test Case

test_cases.zip

Primary reproducer loop body:

i32.const 0
i32.const 0
i32.const 1
table.init 0 0

Minimal resources:

(table $tab0 1 funcref)
(elem funcref (ref.null func))

Supporting controls:

Steps to Reproduce

  1. Build the primary testcase:
wat2wasm --enable-all primary_reproducer_table_init_len1.wat -o primary_reproducer_table_init_len1.wasm
  1. Warm up once:
wasmtime primary_reproducer_table_init_len1.wasm
  1. Measure runtime:
perf stat -r 3 -e 'task-clock' wasmtime primary_reproducer_table_init_len1.wasm
  1. For comparison, run the same flow on:

If helpful, I can also provide the exact commands I used for the other runtimes in the comparison table.

Expected and actual Results

Primary reduced table.init results

testcase shape wasmer_llvm (s) wasmedge_jit (s) wamr_llvm_jit (s) wasmer_cranelift (s) wasmtime (s) wamr_fast_jit (s)
const_len0 dst=0, src=0, len=0, table=1, elem=1 13.2085 6.2617 2.8532 13.4362 59.9080 3.2286
const_len1 dst=0, src=0, len=1, table=1, elem=1 13.8520 9.0505 4.1151 13.9670 99.9186 4.6532
const_len2 dst=0, src=0, len=2, table=2, elem=2 14.6396 9.0903 4.41133 14.6610 132.7836 4.9468
const_src1_len1 dst=0, src=1, len=1, table=2, elem=2 13.7660 9.0285 4.1430 14.1467 99.7570 4.6662

Observed pattern:

Target-removed control

A target-removed control with the same outer loop / stack shaping but no table.init is very fast:

testcase wasmer_llvm (s) wasmedge_jit (s) wamr_llvm_jit (s) wasmer_cranelift (s) wasmtime (s) wamr_fast_jit (s)
control_no_target 0.011744 0.022739 0.015508 0.29056 0.28542 0.43075

So this does not look like a loop/scaffold artifact. The expensive part seems tied to table.init itself.

Related bulk-table instructions

I also compared matched table.fill / table.copy cases with len = 1:

testcase wasmer_llvm (s) wasmedge_jit (s) wamr_llvm_jit (s) wasmer_cranelift (s) wasmtime (s) wamr_fast_jit (s)
table.fill len=1 5.0919 4.89015 2.18633 5.36801 12.0544 2.6832
table.copy len=1 6.32213 8.8099 4.8734 6.64548 18.5358 6.4398

Wasmtime is not the fastest there either, but the slowdown is much less dramatic than for table.init.

So the anomaly looks more specific to table.init than to all small bulk-table operations in general.

Versions and Environment

If useful, I can also attach the generated CLIF for the reduced testcase.

Extra Info

For the reduced const_len1 testcase, Wasmtime still keeps the hot loop alive and still lowers the operation through the table.init builtin/helper path.

I generated CLIF with:

wasmtime compile -C cache=n --emit-clif out_dir primary_reproducer_table_init_len1.wasm

In the generated CLIF for the reduced case, the hot loop still contains a per-iteration call equivalent to:

call fn0(vmctx, 0, 0, 0, 0, 1)

So this does not appear to be caused by dead-code elimination or by loop-derived operand shaping.

Based on the measurements, the strongest trigger condition I can currently support is:

I have not confirmed the internal root cause, so I’m only reporting the measured trigger pattern here.

view this post on Zulip Wasmtime GitHub notifications bot (May 03 2026 at 11:12):

gaaraw added the bug label to Issue #13258.

view this post on Zulip Wasmtime GitHub notifications bot (May 03 2026 at 11:12):

gaaraw added the fuzz-bug label to Issue #13258.


Last updated: May 03 2026 at 22:13 UTC