Stream: wasmtime

Topic: `Config::max_wasm_stack` determinism


view this post on Zulip Olexiy Kulchitskiy (Oct 25 2024 at 07:42):

Can I expect a deterministic behavior from Wasmtime with max_wasm_stack config set?

Initially, the doc says: "the number here is not super-precise, but rather wasm will take at most 'pretty close to this much' stack space"., which lays some doubts re. determinism in my head)

But then it says: "If a wasm call (or series of nested wasm calls) take more stack space than the size specified then a stack overflow trap will be raised.", which sounds like provides deterministic behavior

view this post on Zulip Lann Martin (Oct 25 2024 at 12:46):

There is a bit more detail in this comment but as a private code comment you probably shouldn't treat it as any sort of guarantee of future behavior.

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Lann Martin (Oct 25 2024 at 12:48):

Given that explanation I would expect possible (small) variations in the precise enforced limit between different Wasmtime builds.

view this post on Zulip Alex Crichton (Oct 25 2024 at 15:07):

While it depends a bit on what you mean by "deterministic", I believe the answer is "no" this option isn't deterministic. As Lann mentions different versions of wasmtime will have different optimizations in Cranelift meaning that the exact same amount of stack being used won't be the same. We also may experiment in the future with optimizing stack checks which may make it less deterministic as well. Finally different platforms have different requirements so exact stack space usage won't be the same across architectures.

view this post on Zulip Chris Fallin (Oct 25 2024 at 16:29):

+1 to all the above, and to make it a little more concrete: a function's stack frame size depends on how well the register allocator is able to fit all the values into registers. A better register allocator might pack things more tightly, reducing the number of spillslots we need in the stack frame; in that case, the exact same stack size in bytes could allow for deeper recursive calls. (Or the other way -- tweaking heuristics sometimes produces a few degradations in edge cases as well.) So in a very basic sense, this limit is not deterministic because it depends on implementation-defined behavior and we do not want to disallow ourselves from making future optimizations

view this post on Zulip Chris Fallin (Oct 25 2024 at 16:32):

(To flesh out Alex's question on determinism: do you need the trap to happen in exactly the same place if overflow happens, for the same config value? Or do you only need determinism in terms of how much real memory is used?)

view this post on Zulip Olexiy Kulchitskiy (Oct 27 2024 at 12:14):

Thanks for your replies, guys.
@Chris Fallin @Alex Crichton Sorry for not being specific enough, I need to trap in the same place of WASM bytecode in case of an overflow.

view this post on Zulip Chris Fallin (Oct 27 2024 at 17:59):

Gotcha, thanks for that detail @Olexiy Kulchitskiy . In slightly more detail: do you require determinism with (Wasmtime/Cranelift version, ISA, exact hardware features, OS) fixed, or across all of these dimensions? The reason I ask is that each axis can introduce some changes: Cranelift version can change what opts we do and how we lay out the stack; ISA changing implies different stack layout and different number of registers -> different number of spills as well; exact hardware features within the ISA too (e.g. some ISA extension not available, so we do some fallback sequence with temporary values, which increases register pressure and spilling); and finally OS can affect ABI, and also the exact details of the runtime's stack handling and limit checking.

Overall it seems like "exactly the same trap location" at Wasm level requires a solution at the Wasm level: perhaps you could instrument your Wasm modules before handing them to Wasmtime, such that they count stack usage (in a global that you add, for example) and deterministically trap at a limit.

The tricky part there is ensuring your deterministic limit always fires before the backstop of "actually ran out of machine stack". The latter is, for all the reasons above, hard to predict. You might be able to approximate it (or a worst-case upper bound at least) by taking a cost per Wasm local (according to its size) per function frame: in other words, assume that the compilation puts every Wasm local in its own stack slot, plus some constant other amount, and this is a reasonable-ish upper bound. Not totally guaranteed, because we might introduce temporaries or other values, and regalloc is fully heuristic (we haven't proven upper bounds in stack frame size but it would be surprising to exceed one spillslot for every SSA value).


Last updated: Nov 22 2024 at 16:03 UTC