Stream: git-wasmtime

Topic: wasmtime / PR #13480 cranelift: Fix safepoint stack slot ...


view this post on Zulip Wasmtime GitHub notifications bot (May 26 2026 at 10:14):

vouillon opened PR #13480 from vouillon:stack-fix to bytecodealliance:main:

This fixes a bug in the safepoint spiller, reported in https://github.com/bytecodealliance/wasmtime/issues/13461#issuecomment-4529067695, where two overlapping needs-stack-map values could end up sharing the same stack slot, leading to memory and GC tracking corruption.

The Problem

The safepoint spiller walks instructions in reverse (post-order) and allocates stack slots lazily when a value is first encountered as an operand or a safepoint live-set entry.

Since loop invariants are defined outside the loop and not passed as block parameters, they are not referenced by the loop's back-edge branch instructions. This means that during the reverse walk of the block, the spiller is initially unaware of the loop-invariant value.

If a block-local variable defined via a non-safepoint instruction is processed first in the reverse walk, its slot is freed onto the free list. When the spiller continues backward and eventually hits the loop-invariant's use at a safepoint, the lazy allocation pops that recently-freed slot from the list. This leaves the two overlapping values sharing the same storage. When the block-local value is written, it clobbers the loop-invariant value, leading to stale data on the next loop iteration.

The Fix

To address this, the spiller now performs a pre-reservation pass before processing the instructions of each block. It identifies all values that propagate through the block (present in both live_in and live_out) and are live across at least one safepoint, and immediately allocates stack slots for them. This ensures their slots are already occupied and protected before any instruction definitions inside the block are processed and freed for reuse.

Verification

This fix has been verified by adding the regression test cond_loop_with_body_local_pre_reserves_loop_invariant in cranelift/frontend/src/frontend/safepoints.rs that reproduces the slot-sharing issue and verifies the fix.

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2026 at 10:14):

vouillon requested cfallin for a review on PR #13480.

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2026 at 10:14):

vouillon requested wasmtime-compiler-reviewers for a review on PR #13480.

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2026 at 16:03):

github-actions[bot] added the label cranelift on PR #13480.

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2026 at 23:29):

:memo: cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2026 at 23:29):

:speech_balloon: cfallin created PR review comment:

This seems like it's papering over a deeper issue where our liveness computation is not sound -- I don't think that an ad-hoc conservative "reserve for all values" just for loops is the right answer. cc @fitzgen as you're more familiar with the design intent here?

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2026 at 12:56):

:memo: vouillon submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2026 at 12:56):

:speech_balloon: vouillon created PR review comment:

I have checked the change on a 273 KiB GC-heavy Wasm binary. This adds 13 slots (from 2154 slots), and 10 stack_addr instructions (from 3047), spread across 10 of 871 functions.

Note that we do not reserve for all values but only for the through-block values the instruction walk can't see. I don't see how to improve on this without significant changes. The current algorithm is a greedy linear scan, which is already not optimal, so I think a simple, obviously-sound reservation beats a more precise but heavier scheme.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2026 at 13:54):

:memo: fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2026 at 13:54):

:speech_balloon: fitzgen created PR review comment:

This seems like it's papering over a deeper issue where our liveness computation is not sound

Yeah, we should be seeing these values as live across the whole loop.

The current algorithm is a greedy linear scan

This is just the rewrite phase, which happens after we do a full fixpoint loop for the liveness analysis itself, which is what Chris is referring to in his comment. The intention is that the initial fixpoint loop inside the liveness analysis should recognize that these values are live across the whole loop.


I don't have anything else to say here yet except that the claimed behavior (I haven't verified anything yet) is surprising, perplexing, and a bit scary. Will dig in some more later today/tomorrow when I get time.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2026 at 15:50):

:memo: vouillon submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2026 at 15:50):

:speech_balloon: vouillon created PR review comment:

I think the liveness analysis is sound, and the fix relies on the fact that loop-invariants are live across the whole loop.

The issue is in the rewrite phase: it reconstructs each value's live range from what it encounters during the backward walk (operand uses and safepoint live-sets), not from live_in/live_out. So it has no signal that a value is live across a back edge unless the value is mentioned there. Loop-carried values are fine, because their next-iteration value is passed as a branch argument on the back edge. But a loop-invariant value isn't passed as an argument, so if there is no safepoint between the back edge and the next slot-freeing def, and the backward walk has not yet seen any use of the value, nothing prevents the two values from sharing the same slot.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2026 at 18:54):

:cross_mark: fitzgen closed without merge PR #13480.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2026 at 18:54):

fitzgen commented on PR #13480:

I added a .wast test that would crash and a more-precise fix in https://github.com/bytecodealliance/wasmtime/pull/13498

Thanks @vouillon!


Last updated: Jun 01 2026 at 09:49 UTC