Hi! I'm curious if there's been any discussion in the past, or there would be any appetite for, a mode of stack-depth-checking that is more portable/deterministic than the current one (which relies on arch-specific byte counts -- I think using https://docs.wasmtime.dev/api/wasmtime/struct.Config.html#method.max_wasm_stack )
crudely: I was thinking it might be possible to extend the https://docs.rs/wasmtime/latest/wasmtime/trait.ResourceLimiter.html interface to get increment/decrement calls on each frame entry/exit, or perhaps something more efficient like "wasm calls that interface when it's first entered to get a frame-limit number, and tracks the entry/exit counts against that number, and calls the interface _back_ before any host call with the current frame depth"
I'm asking because, well, at the moment if we use the max_wasm_stack method the same code running on two different-arch hosts that need to be in consensus will potentially fail with stack-exhaustion on one host but not the other, leading to divergence.
(I suppose this isn't strictly a winch issue so much as an "anything wasmtime" issue, but our use case is on winch and I'd be perfectly happy with a winch-only solution)
It's an interesting question. It seems more feasible to offer some guarantees here for Winch than for Cranelift: in Winch the stack is more-or-less 1-to-1 with the Wasm operand stack and locals, plus or minus some alignment concerns, plus a few words for the frame/return address. (I don't remember if callsites push extra or use the args already on the stack -- @Jeff Charles ?)
In Cranelift in contrast putting any sort of deterministic requirement on frame size would essentially require regalloc to use the same number of spillslots on every architecture, at every release, and there's no way to say that about the pile of heuristics we have and the different machine instruction shapes and dataflow patterns for each lowering (as I'm sure you're well aware!). One could maybe impose a worst-case cap, and then pad up to that cap so that failures are deterministic, but then there's the classical overprovisioning dilemma (same as with e.g. memory overcommit): one wastes a lot to ensure the average case is the same as the worst case.
Also, I guess implicit in my thinking above is continuing to count stack usage in bytes, not frames; otherwise one has the problem that the actual stack limit needs to be sized for the worst case. (What if we have 100k locals in every Wasm function? Then a 2MiB thread stack means we can only have ~2 frames)
So tl;dr: seems potentially feasible for Winch backend in particular to lock down the frame size as a guarantee but I'd defer to @Jeff Charles
yeah I was thinking more of "counting frames" because we can restrict the number of locals through simpler static analysis of inputs.
the other thing about "counting bytes" is that a wasm -> native host function -> wasm -> native host function chain will build up arch-specific frames from the native host functions.
Oh, that's a great point -- and there's no telling what LLVM will do in the native frames
I don't see the harm in a "frame counting" approach; it would necessarily be implemented differently for Winch and Cranelift backends and so if the need is specifically for Winch, and Wasmtime involvement is mostly adding one field in vmctx to track this depth and API to set it, I'd defer to the Winch folks for any other thoughts :-)
cool -- so from your perspective, adding such a field + a getter/setter API for the host wouldn't be the end of the world? (I'd be happy to at least poke around prototyping this)
Yep, I'd be happy to review that at least; Alex would probably want a say as well
Someone adding a "frame counting" approach seems reasonable to me as well
I'd defer to the Winch folks for any other thoughts :-)
Sounds reasonable to me as well. I doesn't sound like any of this is really Winch specific as stated in the OP (assuming I'm not missing any details). From Winch's perspective we have built all the infrastructure needed to handle operations involving the vmctx, so that should hopefully give you a reasonable starting point e.g., https://github.com/bytecodealliance/wasmtime/blob/main/winch/codegen/src/codegen/mod.rs#L1273
FWIW, this would be pretty easy to implement as a wasm-to-wasm transform, and that's what I'd suggest starting with. I don't think this should require any changes to wasmtime or winch, just append a new global and increment it at the start of every function, trap or call out to a host function if it gets beyond your limit N, and decrement it at every return (or branch to the return label index).
fitzgen (he/him) said:
FWIW, this would be pretty easy to implement as a wasm-to-wasm transform, and that's what I'd suggest starting with. I don't think this should require any changes to wasmtime or winch, just append a new global and increment it at the start of every function, trap or call out to a host function if it gets beyond your limit N, and decrement it at every return (or branch to the return label index).
we've certainly considered this. I've been hesitating for three reasons:
@Graydon Hoare
Regarding (1): are you saying you want to count host functions' frames as well or that re-entrancy through the host is problematic? The latter doesn't seem like it should be a problem. The former seems like something you'd have to handle on your own regardless of using a wasm transform vs something built into Wasmtime.
Regarding (2): Correct, nothing can alias globals. There is no way to enumerate globals and no globalref type. If the wasm is initially valid then it can never access any globals that you append to the globals index space after the fact.
Regarding (3): If you want it to be portable across architectures and such, and you're really just counting the number of frames in the stack, then you just need to instrument every function prologue, the family of return instructions, and the family of branch instructions iff the branch target is the outermost label (i.e. the branch is equivalent to a (potentially conditional) return).
re #1 it's not reentrancy it's one VM calling another VM calling another VM. they're all isolated from one another. so their globals won't know about one another. whereas if it's tracked in the host there's more of a chance of using a single counter across all VMs.
I guess I can try sketching this and see how bad it is.
Regarding in-wasm instrumentation: does your guest use Wasm exceptions? If so, that's a bit tricky (possible though: e.g. one could add a local to every func, save this func's depth in that local, and at any catch-points, reset the global to that. The latter requires splitting edges though because catch-points can also be ordinary labels (or, well, I guess you don't have to, it's just a redundant store))
I sort of wonder if an alternative approach that adds a new parameter depth to every function and threads it through as depth+1 at all callsites (except tail calls!) would be a little easier -- at least, it's more "functional" (no possibility of forgetting to update mutable state)
Last updated: Jan 09 2026 at 13:15 UTC