Stream: winch

Topic: portable / deterministic stack depth checks


view this post on Zulip Graydon Hoare (Dec 15 2025 at 19:04):

Hi! I'm curious if there's been any discussion in the past, or there would be any appetite for, a mode of stack-depth-checking that is more portable/deterministic than the current one (which relies on arch-specific byte counts -- I think using https://docs.wasmtime.dev/api/wasmtime/struct.Config.html#method.max_wasm_stack )

view this post on Zulip Graydon Hoare (Dec 15 2025 at 19:06):

crudely: I was thinking it might be possible to extend the https://docs.rs/wasmtime/latest/wasmtime/trait.ResourceLimiter.html interface to get increment/decrement calls on each frame entry/exit, or perhaps something more efficient like "wasm calls that interface when it's first entered to get a frame-limit number, and tracks the entry/exit counts against that number, and calls the interface _back_ before any host call with the current frame depth"

view this post on Zulip Graydon Hoare (Dec 15 2025 at 19:07):

I'm asking because, well, at the moment if we use the max_wasm_stack method the same code running on two different-arch hosts that need to be in consensus will potentially fail with stack-exhaustion on one host but not the other, leading to divergence.

view this post on Zulip Graydon Hoare (Dec 15 2025 at 19:08):

(I suppose this isn't strictly a winch issue so much as an "anything wasmtime" issue, but our use case is on winch and I'd be perfectly happy with a winch-only solution)

view this post on Zulip Chris Fallin (Dec 15 2025 at 19:29):

It's an interesting question. It seems more feasible to offer some guarantees here for Winch than for Cranelift: in Winch the stack is more-or-less 1-to-1 with the Wasm operand stack and locals, plus or minus some alignment concerns, plus a few words for the frame/return address. (I don't remember if callsites push extra or use the args already on the stack -- @Jeff Charles ?)

In Cranelift in contrast putting any sort of deterministic requirement on frame size would essentially require regalloc to use the same number of spillslots on every architecture, at every release, and there's no way to say that about the pile of heuristics we have and the different machine instruction shapes and dataflow patterns for each lowering (as I'm sure you're well aware!). One could maybe impose a worst-case cap, and then pad up to that cap so that failures are deterministic, but then there's the classical overprovisioning dilemma (same as with e.g. memory overcommit): one wastes a lot to ensure the average case is the same as the worst case.

Also, I guess implicit in my thinking above is continuing to count stack usage in bytes, not frames; otherwise one has the problem that the actual stack limit needs to be sized for the worst case. (What if we have 100k locals in every Wasm function? Then a 2MiB thread stack means we can only have ~2 frames)

view this post on Zulip Chris Fallin (Dec 15 2025 at 19:30):

So tl;dr: seems potentially feasible for Winch backend in particular to lock down the frame size as a guarantee but I'd defer to @Jeff Charles

view this post on Zulip Graydon Hoare (Dec 15 2025 at 19:31):

yeah I was thinking more of "counting frames" because we can restrict the number of locals through simpler static analysis of inputs.

the other thing about "counting bytes" is that a wasm -> native host function -> wasm -> native host function chain will build up arch-specific frames from the native host functions.

view this post on Zulip Chris Fallin (Dec 15 2025 at 19:32):

Oh, that's a great point -- and there's no telling what LLVM will do in the native frames

view this post on Zulip Chris Fallin (Dec 15 2025 at 19:34):

I don't see the harm in a "frame counting" approach; it would necessarily be implemented differently for Winch and Cranelift backends and so if the need is specifically for Winch, and Wasmtime involvement is mostly adding one field in vmctx to track this depth and API to set it, I'd defer to the Winch folks for any other thoughts :-)

view this post on Zulip Graydon Hoare (Dec 15 2025 at 19:38):

cool -- so from your perspective, adding such a field + a getter/setter API for the host wouldn't be the end of the world? (I'd be happy to at least poke around prototyping this)

view this post on Zulip Chris Fallin (Dec 15 2025 at 19:41):

Yep, I'd be happy to review that at least; Alex would probably want a say as well

view this post on Zulip Jeff Charles (Dec 15 2025 at 19:57):

Someone adding a "frame counting" approach seems reasonable to me as well

view this post on Zulip Saúl Cabrera (Dec 15 2025 at 23:14):

I'd defer to the Winch folks for any other thoughts :-)

Sounds reasonable to me as well. I doesn't sound like any of this is really Winch specific as stated in the OP (assuming I'm not missing any details). From Winch's perspective we have built all the infrastructure needed to handle operations involving the vmctx, so that should hopefully give you a reasonable starting point e.g., https://github.com/bytecodealliance/wasmtime/blob/main/winch/codegen/src/codegen/mod.rs#L1273

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

view this post on Zulip fitzgen (he/him) (Dec 23 2025 at 16:59):

FWIW, this would be pretty easy to implement as a wasm-to-wasm transform, and that's what I'd suggest starting with. I don't think this should require any changes to wasmtime or winch, just append a new global and increment it at the start of every function, trap or call out to a host function if it gets beyond your limit N, and decrement it at every return (or branch to the return label index).

view this post on Zulip Graydon Hoare (Jan 05 2026 at 20:23):

fitzgen (he/him) said:

FWIW, this would be pretty easy to implement as a wasm-to-wasm transform, and that's what I'd suggest starting with. I don't think this should require any changes to wasmtime or winch, just append a new global and increment it at the start of every function, trap or call out to a host function if it gets beyond your limit N, and decrement it at every return (or branch to the return label index).

we've certainly considered this. I've been hesitating for three reasons:

  1. it doesn't handle the issue of tracking stack depth globally across multiple VMs-that-call-VMs with host frames interspersed, though in theory we could handle this by being conservative with the VM nesting depth and just picking a worst case bound, it might penalize guests quite a bit.
  2. more importantly, I'm concerned about my ability to write an instrumentation pass that guests can't subvert. like I guess between the fact that wasm has control flow integrity built in and globals .. can't be aliased? can they? .. I am just unsure that I can inject a counter and a check-and-trap code fragment into the guest's sandbox that the guest can't figure out a way to tinker with themselves. but maybe I am just too burned by past experience with clever guests breaking out of their sandboxes.
  3. I'm not super confident in my ability to write such a pass _at all_, like to identify all the places that need instrumenting and be sure I always emit the instrumentation. somehow it feels more plausible to me to modify the host. but I guess I can make a mistake in either, and it's heartening to hear you say you think it'd be easy. I assumed it'd be fairly hard.

view this post on Zulip fitzgen (he/him) (Jan 05 2026 at 21:53):

@Graydon Hoare

Regarding (1): are you saying you want to count host functions' frames as well or that re-entrancy through the host is problematic? The latter doesn't seem like it should be a problem. The former seems like something you'd have to handle on your own regardless of using a wasm transform vs something built into Wasmtime.

Regarding (2): Correct, nothing can alias globals. There is no way to enumerate globals and no globalref type. If the wasm is initially valid then it can never access any globals that you append to the globals index space after the fact.

Regarding (3): If you want it to be portable across architectures and such, and you're really just counting the number of frames in the stack, then you just need to instrument every function prologue, the family of return instructions, and the family of branch instructions iff the branch target is the outermost label (i.e. the branch is equivalent to a (potentially conditional) return).

view this post on Zulip Graydon Hoare (Jan 05 2026 at 22:31):

re #1 it's not reentrancy it's one VM calling another VM calling another VM. they're all isolated from one another. so their globals won't know about one another. whereas if it's tracked in the host there's more of a chance of using a single counter across all VMs.

I guess I can try sketching this and see how bad it is.

view this post on Zulip Chris Fallin (Jan 06 2026 at 13:22):

Regarding in-wasm instrumentation: does your guest use Wasm exceptions? If so, that's a bit tricky (possible though: e.g. one could add a local to every func, save this func's depth in that local, and at any catch-points, reset the global to that. The latter requires splitting edges though because catch-points can also be ordinary labels (or, well, I guess you don't have to, it's just a redundant store))

I sort of wonder if an alternative approach that adds a new parameter depth to every function and threads it through as depth+1 at all callsites (except tail calls!) would be a little easier -- at least, it's more "functional" (no possibility of forgetting to update mutable state)


Last updated: Jan 09 2026 at 13:15 UTC