Stream: git-wasmtime

Topic: wasmtime / issue #13295 <Performance> fuzzbug: Repeated `...


view this post on Zulip Wasmtime GitHub notifications bot (May 06 2026 at 04:10):

gaaraw opened issue #13295:

Describe the bug

Repeated ref.func in a tiny hot loop appears to be much slower in Wasmtime than in Wasmer Cranelift.

After reduction, I got a minimal reproducer that preserves essentially the same gap, plus two close controls where the gap disappears. The evidence points specifically to the per-iteration ref.func path, not to the loop scaffold or the reference sink.

test_cases.zip

Primary reproducer:

Supporting controls:

Test Case

Primary reproducer loop body:

ref.func $f0
global.set $g0
local.get $i
i64.const 1
i64.sub
local.tee $i
i64.const 0
i64.ne
br_if $body

The reduced reproducer uses:

Matched controls:

Steps to Reproduce

  1. Build the primary testcase:
wat2wasm primary_reproducer_ref_func_hotloop.wat -o primary_reproducer_ref_func_hotloop.wasm
  1. Warm up once:
wasmtime primary_reproducer_ref_func_hotloop.wasm
  1. Measure runtime:
perf stat -r 3 -e 'task-clock' wasmtime primary_reproducer_ref_func_hotloop.wasm
  1. Run the same flow on the two supporting controls above.

  2. For comparison with Wasmer Cranelift:

wasmer run primary_reproducer_ref_func_hotloop.wasm
perf stat -r 3 -e 'task-clock' wasmer run primary_reproducer_ref_func_hotloop.wasm

Expected and actual Results

Primary reproducer and close controls

testcase wasmer_cranelift (s) wasmtime (s) ratio
primary_reproducer_ref_func_hotloop 2.7668 38.2027 13.81x
supporting_control_ref_func_hoisted 0.5029 0.5266 1.05x
supporting_control_ref_null_hotloop 0.4083 0.4468 1.09x

Observed pattern:

This makes the trigger look very specifically tied to repeated hot-loop ref.func.

Family-level consistency

The original generated ref.func seeds showed the same shape:

testcase wasmer_cranelift (s) wasmtime (s) ratio
ref_func_1 2.9594 38.5392 13.02x
ref_func_2 2.7539 38.8498 14.11x

A related mixed testcase from the ref.is_null family also showed the same gap only when the loop used ref.func to create the non-null input each iteration:

testcase wasmer_cranelift (s) wasmtime (s) ratio
ref_is_null_2 (ref.func + ref.is_null) 2.9624 39.3673 13.29x
hoisted non-null control for ref.is_null 0.6419 0.6855 1.07x

So the ref.is_null outlier seems to be explained by the same repeated-ref.func trigger, rather than by ref.is_null itself.

Versions and Environment

Extra Info

I also checked Wasmtime CLIF for the reduced reproducer to make sure the benchmark is still alive.

The hot loop still performs a per-iteration builtin call:

v6 = call fn0(v0, v32)
store notrap aligned table v6, v0+96

where fn0 is wasmtime_builtin_ref_func.

That builtin still performs a deeper indirect runtime call with extra frame/return-address bookkeeping:

v3 = get_frame_pointer.i64
store notrap aligned v3, v2+40
v4 = get_return_address.i64
store notrap aligned v4, v2+48
v7 = call_indirect sig0, v6(v0, v1)

In contrast, the hoisted control's hot loop is just a load/store path without wasmtime_builtin_ref_func in the loop:

v5 = load.i64 notrap aligned table v0+96
store notrap aligned table v5, v0+112

I have not confirmed the internal root cause, so I’m only reporting the measured trigger pattern:

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2026 at 04:10):

gaaraw added the bug label to Issue #13295.

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2026 at 04:10):

gaaraw added the fuzz-bug label to Issue #13295.

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2026 at 04:42):

cfallin commented on issue #13295:

Hi @gaaraw,

A few things:

In this case, what I think you're running into is that we have lazy initialization of funcrefs in tables, which we do in a libcall. We could optimize differently by not doing that, but the payoff is that our instantiation is extremely fast because the table contents need not be initialized eagerly. What I'm getting at here is that different engines may choose different implementation strategies that prioritize one dimension or another of performance, and so this "performance fuzzbug" may not really even be considered a bug.

So: it'd be useful to hear your philosophy and purpose behind this fuzzing campaign. Is it to find deltas and raise questions that may point to inefficiencies we can fix? That's fine if so -- but I would perhaps go about it a bit differently. First, don't call it a "fuzzbug" (that has a generally-accepted meaning that is pretty different than what you have here); second, don't bombard us with overly verbose descriptions; third, do some more analysis on the tradeoffs, and come into this with more curiosity about "why", then we can have an interesting discussion. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2026 at 04:42):

cfallin edited a comment on issue #13295:

Hi @gaaraw,

A few things:

In this case, what I think you're running into is that we have lazy initialization of funcrefs in tables, which we do in a libcall. We could optimize differently by not doing that, but the payoff is that our instantiation is extremely fast because the table contents need not be initialized eagerly. What I'm getting at here is that different engines may choose different implementation strategies that prioritize one dimension or another of performance, and so this "performance fuzzbug" may not really even be considered a bug.

So: it'd be useful to hear your philosophy and purpose behind this fuzzing campaign. Is it to find deltas and raise questions that may point to inefficiencies we can fix? That's fine if so -- but I would perhaps go about it a bit differently. First, don't call it a "fuzzbug" (that has a generally-accepted meaning that is pretty different than what you have here); second, don't bombard us with overly verbose descriptions; third, do some more analysis on the tradeoffs, and come into this with more curiosity about "why", then we can have an interesting discussion. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2026 at 09:22):

gaaraw commented on issue #13295:

Thanks — this is very helpful feedback.

You're right on both points: calling these reports "fuzzbugs" is not the right framing here, and my issue writeups have been too verbose.

Also, no — I am not assuming that all engines should converge to one canonical fast implementation, or that a cross-engine delta automatically means one engine is wrong.

What I am trying to do is use cross-runtime deltas as signals, reduce them to smaller execution shapes, and then figure out whether the result looks more like a missed optimization, an implementation tradeoff, or a benchmark artifact.

Your explanation here is useful exactly for that reason. If this ref.func behavior is tied to lazy funcref initialization through a libcall path, with the payoff being faster instantiation, then this case is better understood as a tradeoff-revealing performance anomaly than as a straightforward bug report.

So I think the lesson for me is to present these cases with more curiosity and less bug-like framing: tighter reports, clearer controls, and more discussion of plausible tradeoffs up front.

Thanks again — I appreciate the clarification and the candid guidance.

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2026 at 14:49):

alexcrichton removed the bug label from Issue #13295.

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2026 at 14:49):

alexcrichton removed the fuzz-bug label from Issue #13295.

view this post on Zulip Wasmtime GitHub notifications bot (May 06 2026 at 14:49):

alexcrichton added the performance label to Issue #13295.


Last updated: Jun 01 2026 at 09:49 UTC