Stream: git-wasmtime

Topic: wasmtime / issue #13461 Cranelift: GC reference reads bac...


view this post on Zulip Wasmtime GitHub notifications bot (May 23 2026 at 01:27):

gfx opened issue #13461:

Summary

A Wasm GC (ref null $t) value that is live across a call safepoint can read back as
null after the call, making a ref.as_non_null (or struct.get/array.get) on a
provably non-null value trap with wasm trap: null reference. The same module runs
correctly on 44.0.0.

git bisect (over v44.0.1..v45.0.0) points at:

24938c41630e2662ebcdb0a37137f02fd48c4d07"Handle alias values in SafepointSpiller::rewrite_use" (#13228)

That PR makes the safepoint spiller reload aliased needs-stack-map values from their
stack slot after safepoints (needed for the new moving collector). For the affected
value, the post-safepoint reload appears to read a slot that was never spilled, so the
use sees null.

Reproduction

A self-contained regression test (.wast) is on this branch:

https://github.com/wado-lang/wasmtime/blob/gfx/repro-gc-safepoint-reload-null/tests/misc_testsuite/gc/safepoint-reload-aliased-ref-null.wast

It is a standard tests/misc_testsuite/gc/ test — (assert_return (invoke "run")) on a
GC module. run returns without trapping on 44.0.0 and traps with null reference on
45.0.0 and main.

You can also run the module directly:

$ FEATURES="-Wgc=y -Wfunction-references=y -Wwide-arithmetic=y -Wsimd=y -Wthreads=y -Wreference-types=y"

# 44.0.0  OK (exit 0)
$ wasmtime run $FEATURES --invoke run repro.wasm

# 45.0.0 / main  traps
$ wasmtime run $FEATURES --invoke run repro.wasm
...
    wasm trap: null reference

The module was produced by a real frontend (the Wado
compiler): it JSON-deserializes a two-field struct in a loop. The trapping
ref.as_non_null reads a struct-access object that is built with struct.new (so it
cannot be null) and is held in a local that is live across the call fetching the next
field; on the second loop iteration the reloaded local reads back null. The module passes
wasm-tools validate --features all.

On minimization

I could not reduce this with wasm-tools shrink (0% — the module is already DCE'd and the
miscompile is register-allocation-sensitive, so any structural mutation hides it). Happy
to provide the source program or help with a CLIF-level reduction.

Environment

view this post on Zulip Wasmtime GitHub notifications bot (May 23 2026 at 01:34):

gfx edited issue #13461:

Summary

A Wasm GC (ref null $t) value that is live across a call safepoint can read back as
null after the call, making a ref.as_non_null (or struct.get/array.get) on a
provably non-null value trap with wasm trap: null reference. The same module runs
correctly on 44.0.0.

git bisect (over v44.0.1..v45.0.0) points at:

24938c41630e2662ebcdb0a37137f02fd48c4d07"Handle alias values in SafepointSpiller::rewrite_use" (#13228)

That PR makes the safepoint spiller reload aliased needs-stack-map values from their
stack slot after safepoints (needed for the new moving collector). For the affected
value, the post-safepoint reload appears to read a slot that was never spilled, so the
use sees null.

Reproduction

A self-contained regression test (.wast) is on this branch:

https://github.com/wado-lang/wasmtime/blob/gfx/repro-gc-safepoint-reload-null/tests/misc_testsuite/gc/safepoint-reload-aliased-ref-null.wast

It is a standard tests/misc_testsuite/gc/ test — (assert_return (invoke "run")) on a
GC module. run returns without trapping on 44.0.0 and traps with null reference on
45.0.0 and main.

You can also run the module directly:

$ FEATURES="-Wgc=y -Wfunction-references=y -Wwide-arithmetic=y -Wsimd=y -Wthreads=y -Wreference-types=y"

# 44.0.0  OK (exit 0)
$ wasmtime run $FEATURES --invoke run repro.wasm

# 45.0.0 / main  traps
$ wasmtime run $FEATURES --invoke run repro.wasm
...
    wasm trap: null reference

The module was produced by a real frontend (the Wado
compiler): it JSON-deserializes a two-field struct in a loop. The trapping
ref.as_non_null reads a struct-access object that is built with struct.new (so it
cannot be null) and is held in a local that is live across the call fetching the next
field; on the second loop iteration the reloaded local reads back null. The module passes
wasm-tools validate --features all.

On minimization

I could not reduce this with wasm-tools shrink (0% — the module is already DCE'd and the
miscompile is register-allocation-sensitive, so any structural mutation might hide it). Happy
to provide the source program or help with a CLIF-level reduction.

Environment

view this post on Zulip Wasmtime GitHub notifications bot (May 23 2026 at 15:35):

alexcrichton added the wasm-proposal:gc label to Issue #13461.

view this post on Zulip Wasmtime GitHub notifications bot (May 23 2026 at 15:35):

alexcrichton commented on issue #13461:

cc @fitzgen

view this post on Zulip Wasmtime GitHub notifications bot (May 23 2026 at 20:33):

vouillon commented on issue #13461:

Claude suggests to swap these two steps:

https://github.com/bytecodealliance/wasmtime/blob/fa50c6a150d4cb9aebaa82e95f739aa97b3fb76c/cranelift/frontend/src/frontend/safepoints.rs#L762-L776

The spiller's stack-slot allocator reuses slots via a free-list. The reuse invariant is: a slot freed at a def can only be handed back out to values processed strictly later in the pass (= strictly earlier in program order), so liveness guarantees no overlap. Per-instruction the pass currently runs rewrite_defrewrite_safepointrewrite_use.

That order breaks the invariant whenever a single instruction is both a safepoint and the def of a needs-stack-map value (e.g. v_def = call $alloc). rewrite_def frees v_def's slot into the pool, then rewrite_safepoint requests slots for everything live across the call; any live-across value that doesn't yet have a slot pops v_def's just-freed slot. The IR ends up storing v_def into a slot that another live-across value is also reading from, and in a loop the next iteration's reload returns v_def instead of the original value — manifesting as wasm trap: null reference on ref.as_non_null.

The bug pre-existed #13228; that commit only made it observable by rewriting aliased uses into stack_loads instead of leaving them as direct register reads.

Fix: do rewrite_safepoint before rewrite_def so live-across slots are reserved before v_def's slot is returned to the pool. The two steps commute (disjoint state: stack-map metadata vs. inserting a stack_store), and rewrite_def's freed slot then only becomes available to values processed later in the pass, restoring the invariant.

view this post on Zulip Wasmtime GitHub notifications bot (May 23 2026 at 20:45):

cfallin commented on issue #13461:

Claude suggests to swap these two steps:

Hi @vouillon -- without commenting on the rest of your message, please do not do this. Our AI tool usage policy requires a human in the loop, and copying/pasting something straight from your agent is explicitly against the spirit of our policy. We are talking to you, and you take responsibility for everything you say and all code that you post here. If you do not have enough of a solid grasp on your contributions to speak yourself about them, but have to transparently delegate to your agent, then that is a good sign that you should spend more time understanding it yourself first. (You are of course free to use an agent internally; but we interact with you only.) Doing otherwise is an "extractive contribution" per our policy: it says that you don't want to bother parsing and working through details but just want to throw that work at us. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (May 23 2026 at 22:08):

vouillon commented on issue #13461:

Claude suggests to swap these two steps:

Hi @vouillon -- without commenting on the rest of your message, _please do not do this_. Our AI tool usage policy requires a human in the loop, and copying/pasting something straight from your agent is _explicitly_ against the spirit of our policy. We are talking to _you_, and _you_ take responsibility for everything you say and all code that you post here. If you do not have enough of a solid grasp on your contributions to speak yourself about them, but have to transparently delegate to your agent, then that is a good sign that you should spend more time understanding it yourself first. (You are of course free to use an agent internally; but we interact with _you_ only.) Doing otherwise is an "extractive contribution" per our policy: it says that you don't want to bother parsing and working through details but just want to throw that work at us. Thanks!

Sorry, I should have phrased this differently, and should have been clearer. This change fixes the issue, and does not seem to break anything, including a quite large test suite with which I found two other issues (#13449 and #13450). I have also read the AI-generated explanation, and it makes a lot of sense to me. So, even though I have not taken the time to make a PR, I thought it would save you some time to have this information. ("Our golden rule is that a contribution should be worth more to the project than the time it takes to review it.")

view this post on Zulip Wasmtime GitHub notifications bot (May 23 2026 at 22:33):

cfallin commented on issue #13461:

Per our AI guidelines, such a note would be quite welcome -- if written by a human rather than "Claude seems to say: \<dump\>". I hope you can appreciate that we get increasingly large mountains of very verbose AI output, some of it right, some of it wrong, and some of it not even wrong in extraordinarily strange ways, and so the only realistic approach with our limited time is to consider all such text as un-reviewed by a human until proven otherwise. That's the essence behind our policy. A "I've tested this fix and it works, what do you think?" note would have led to a much better reception!

view this post on Zulip Wasmtime GitHub notifications bot (May 23 2026 at 22:34):

cfallin edited a comment on issue #13461:

Per our AI guidelines, such a note would be quite welcome -- if written by a human rather than "Claude suggests: \<dump\>". I hope you can appreciate that we get increasingly large mountains of very verbose AI output, some of it right, some of it wrong, and some of it not even wrong in extraordinarily strange ways, and so the only realistic approach with our limited time is to consider all such text as un-reviewed by a human until proven otherwise. That's the essence behind our policy. A "I've tested this fix and it works, what do you think?" note would have led to a much better reception!

view this post on Zulip Wasmtime GitHub notifications bot (May 23 2026 at 23:40):

vouillon edited a comment on issue #13461:

Swapping these two steps seems to fix the issue:

https://github.com/bytecodealliance/wasmtime/blob/fa50c6a150d4cb9aebaa82e95f739aa97b3fb76c/cranelift/frontend/src/frontend/safepoints.rs#L762-L776

The rational is that one should reserve slots for all the needs-stack-map values before releasing the slot for the value defined by the instruction. Otherwise, one can assign the same slot for both a value live across the instruction and the value just defined.

view this post on Zulip Wasmtime GitHub notifications bot (May 24 2026 at 14:52):

gfx commented on issue #13461:

We're hitting another miscompile from #13228 that #13469 does not fix — a second case on a non-moving (drc) collector, which is what Wado uses.

Bisected (clean, over [#13217..v45.0.0]) to #13228 (24938c4163). The same wado-generated wasm runs on wasmtime 44 and traps on 45; stock v45.0.0 traps, and so does our fork with #13469 already applied.

Reproduce (failing source + details in this gist: https://gist.github.com/gfx/50948a70258ab94e1dd52011894a4848):

git clone https://github.com/wado-lang/wado
cd wado
git checkout gfx/wasmtime-45
git submodule update --init vendor/wasmtime   # = wado-lang/wasmtime @ gfx/wasmtime-45 (v45.0.0 + #13469)
cargo build --release --bin wado
./target/release/wado test -O3 package-gale/src/codegen_lr_label_test.wado

Traps in a GC-reference-heavy codegen function:

error while executing at wasm backtrace:
    0: ...!./codegen.wado/emit_lr_alt_struct
    3: ...!./codegen.wado/generate
    4: ...!__test_0_labeled_left_recursive_alt_routes_through_gen_scan_lr_functions

The same source on wasmtime 44 (wado main) passes. It needs -O3 (aggressive inlining) and reproduces with cranelift opt level none (the test harness uses Config::cranelift_opt_level(OptLevel::None)).

I tried to reduce this to a standalone wasmtime run case but couldn't get one to fire: invoking the same .wasm through its component export does not trap — it only reproduces through the test harness's instantiate-and-call path, so it's regalloc-fragile like the original #13461 repro. Happy to dig further if a smaller case would help.

Since #13228's own commit message notes these reloads are unnecessary for non-moving collectors, and it has now produced two distinct miscompiles for them (this and #13461), I'd lean toward reverting #13228 rather than fixing it case by case.

view this post on Zulip Wasmtime GitHub notifications bot (May 24 2026 at 14:54):

gfx edited a comment on issue #13461:

We're hitting another miscompile from #13228 that #13469 does not fix — a second case on a non-moving (drc) collector, which is what Wado uses.

Bisected (clean, over [#13217..v45.0.0]) to #13228 (24938c4163). The same wado-generated wasm runs on wasmtime 44 and traps on 45; stock v45.0.0 traps, and so does our fork with #13469 already applied.

Reproduce (failing source + details in this gist: https://gist.github.com/gfx/50948a70258ab94e1dd52011894a4848):

git clone https://github.com/wado-lang/wado
cd wado
git checkout gfx/wasmtime-45
git submodule update --init vendor/wasmtime   # = wado-lang/wasmtime @ gfx/wasmtime-45 (v45.0.0 + #13469)
cargo build --release --bin wado
./target/release/wado test -O3 package-gale/src/codegen_lr_label_test.wado

Traps in a GC-reference-heavy codegen function:

error while executing at wasm backtrace:
    0: ...!./codegen.wado/emit_lr_alt_struct
    3: ...!./codegen.wado/generate
    4: ...!__test_0_labeled_left_recursive_alt_routes_through_gen_scan_lr_functions

The same source on wasmtime 44 (wado main) passes. It needs wado -O3 and reproduces with cranelift opt level none (the test harness uses Config::cranelift_opt_level(OptLevel::None)).

I tried to reduce this to a standalone wasmtime run case but couldn't get one to fire: invoking the same .wasm through its component export does not trap — it only reproduces through the test harness's instantiate-and-call path, so it's regalloc-fragile like the original #13461 repro. Happy to dig further if a smaller case would help.

Since #13228's own commit message notes these reloads are unnecessary for non-moving collectors, and it has now produced two distinct miscompiles for them (this and #13461), I'd lean toward reverting #13228 rather than fixing it case by case.

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2026 at 10:45):

vouillon commented on issue #13461:

We're hitting another miscompile from #13228 that #13469 does not fix — a second case on a non-moving (drc) collector, which is what Wado uses.

This other issue should be fixed by #13480.

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2026 at 10:54):

vouillon commented on issue #13461:

Per our AI guidelines, such a note would be quite welcome -- if written by a human rather than "Claude suggests: <dump>". I hope you can appreciate that we get increasingly large mountains of very verbose AI output, some of it right, some of it wrong, and some of it _not even wrong_ in extraordinarily strange ways, and so the only realistic approach with our limited time is to consider all such text as un-reviewed by a human until proven otherwise. That's the essence behind our policy. A "I've tested this fix and it works, what do you think?" note would have led to a _much_ better reception!

I totally understand. I apologize for my clumsy message. I'll be more careful going forward.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2026 at 20:37):

fitzgen closed issue #13461:

Summary

A Wasm GC (ref null $t) value that is live across a call safepoint can read back as
null after the call, making a ref.as_non_null (or struct.get/array.get) on a
provably non-null value trap with wasm trap: null reference. The same module runs
correctly on 44.0.0.

git bisect (over v44.0.1..v45.0.0) points at:

24938c41630e2662ebcdb0a37137f02fd48c4d07"Handle alias values in SafepointSpiller::rewrite_use" (#13228)

That PR makes the safepoint spiller reload aliased needs-stack-map values from their
stack slot after safepoints (needed for the new moving collector). For the affected
value, the post-safepoint reload appears to read a slot that was never spilled, so the
use sees null.

Reproduction

A self-contained regression test (.wast) is on this branch:

https://github.com/wado-lang/wasmtime/blob/gfx/repro-gc-safepoint-reload-null/tests/misc_testsuite/gc/safepoint-reload-aliased-ref-null.wast

It is a standard tests/misc_testsuite/gc/ test — (assert_return (invoke "run")) on a
GC module. run returns without trapping on 44.0.0 and traps with null reference on
45.0.0 and main.

You can also run the module directly:

$ FEATURES="-Wgc=y -Wfunction-references=y -Wwide-arithmetic=y -Wsimd=y -Wthreads=y -Wreference-types=y"

# 44.0.0  OK (exit 0)
$ wasmtime run $FEATURES --invoke run repro.wasm

# 45.0.0 / main  traps
$ wasmtime run $FEATURES --invoke run repro.wasm
...
    wasm trap: null reference

The module was produced by a real frontend (the Wado
compiler): it JSON-deserializes a two-field struct in a loop. The trapping
ref.as_non_null reads a struct-access object that is built with struct.new (so it
cannot be null) and is held in a local that is live across the call fetching the next
field; on the second loop iteration the reloaded local reads back null. The module passes
wasm-tools validate --features all.

On minimization

I could not reduce this with wasm-tools shrink (0% — the module is already DCE'd and the
miscompile is register-allocation-sensitive, so any structural mutation might hide it). Happy
to provide the source program or help with a CLIF-level reduction.

Environment


Last updated: Jun 01 2026 at 09:49 UTC