Stream: wasmtime

Topic: OOM handling in (parts of) Wasmtime


view this post on Zulip Joel Dice (Nov 20 2025 at 18:39):

@Chris Fallin @fitzgen (he/him) Regarding the OOM handling topic in today's meeting: I'm curious how you anticipate OOM errors will be handled. I imagine there are roughly three scenarios:

  1. Just propagate it to the caller like other errors, e.g. vec.try_push(foo)?
  2. Handle it internally in Wasmtime by taking a different code path which presumably requires less or no allocation (and/or freeing up some other allocation) but still allows the operation to succeed
  3. Propagate it to the embedder to deal with, presumably by dropping the store and returning a 500 error, or whatever is appropriate

I'm naively imagining #1 and #3 will be quite common and #2 will be rare. If that's true, then letting allocation failures panic, catching such panics at public API boundaries, and "poisoning" the store so it can't be used again could be a valid alternative approach. For the rare #2 cases, the same approach could be used at a finer-grained level: catch the panic, drop any half-formed state, and try the fallback code path.

I mentioned something like this last time it was brought up, but I don't recall if there were objections, and if so, what they were.

view this post on Zulip fitzgen (he/him) (Nov 20 2025 at 18:41):

Yes mostly 1 and 3 but the problem with panics is that they internally allocate I am pretty sure and also they run destructors which if those attempt to allocate will lead to panic in panic which aborts

view this post on Zulip fitzgen (he/him) (Nov 20 2025 at 18:41):

But also our nostd doesn’t support panic unwinding

view this post on Zulip Joel Dice (Nov 20 2025 at 18:41):

Presumably propagating errors with ? will also run destructors.

view this post on Zulip Joel Dice (Nov 20 2025 at 18:42):

but yeah, I see how it avoids the double panic

view this post on Zulip fitzgen (he/him) (Nov 20 2025 at 18:42):

True but if allocations return a result then you shouldn’t just unwrap that in drop

view this post on Zulip Joel Dice (Nov 20 2025 at 18:43):

yeah, either way you need to have well-behavedDrop implementations

view this post on Zulip fitzgen (he/him) (Nov 20 2025 at 18:43):

I think what you describe is possible but feels a bit hackier / doesn’t match my sense of style as much. Having a hard time putting it into words

view this post on Zulip fitzgen (he/him) (Nov 20 2025 at 18:44):

I guess I like that results don’t hide the error condition

view this post on Zulip fitzgen (he/him) (Nov 20 2025 at 18:44):

If we are going to actually handle it, then we should be explicit about that

view this post on Zulip Joel Dice (Nov 20 2025 at 18:44):

I'm not necessarily advocating for it: just feeling out the solution space, inspired a bit by Erlang-style supervision trees

view this post on Zulip fitzgen (he/him) (Nov 20 2025 at 18:45):

If vec push doesn’t return a result, then why not use it in drop? But that’s a foot gun.

view this post on Zulip fitzgen (he/him) (Nov 20 2025 at 18:45):

I do like the idea of poisoning stores and creating well defined boundaries

view this post on Zulip Chris Fallin (Nov 20 2025 at 18:45):

One other thing that throws the wrench into "catch the panic" is that we build with panic=abort right now because we don't have libunwind in our no-std environment. So this whole branch is out I think

view this post on Zulip Chris Fallin (Nov 20 2025 at 18:46):

in general I don't like DWARF being load-bearing here either

view this post on Zulip fitzgen (he/him) (Nov 20 2025 at 18:46):

Yeah we would also have to unwind over wasm

view this post on Zulip Chris Fallin (Nov 20 2025 at 18:47):

indeed

view this post on Zulip Chris Fallin (Nov 20 2025 at 18:47):

There's also the aspect that I don't know we've tested/audited for correct disentangling of state via destructors at all possible failure points -- maybe? but there's a reason that e.g. mutexes get poisoned on panics, and there is cross-store state too (e.g. global module registry and the like)

view this post on Zulip fitzgen (he/him) (Nov 20 2025 at 18:48):

We would sort of test that indirectly via the oom testing but yeah

view this post on Zulip Chris Fallin (Nov 20 2025 at 18:48):

Right, I guess I'm saying we need to eat the frog either way, so we might as well do it via "normal" control flow

view this post on Zulip Alex Crichton (Nov 20 2025 at 20:48):

While I agree that panicking in no_std is hard...

Yeah we would also have to unwind over wasm

this at least you don't have to deal with, we catch all panics in wasmtime at the wasm boundary, use a trap to unwind wasm, and then resume the panic on the other end

view this post on Zulip bjorn3 (Nov 21 2025 at 09:27):

The unwinding crate can be used with no_std. Though you do need nightly to be able to set the personality function that libstd would otherwise set.

view this post on Zulip Till Schneidereit (Nov 21 2025 at 16:29):

we talked about deserialization of cwasm's yesterday, and I just remembered that Postcard might be useful for that

A no_std + serde compatible message library for Rust - jamesmunns/postcard

view this post on Zulip fitzgen (he/him) (Nov 21 2025 at 18:01):

Till Schneidereit said:

we talked about deserialization of cwasm's yesterday, and I just remembered that Postcard might be useful for that

we already use postcard today in fact: https://github.com/search?q=repo%3Abytecodealliance%2Fwasmtime%20postcard&type=code

but I have not figured out if it supports OOM-handling or not from a very quick glance at its docs. I do know that it reuses the serde traits, and I don't know if those interfaces are going to constrain us here

GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

view this post on Zulip fitzgen (he/him) (Nov 21 2025 at 18:54):

fitzgen (he/him) said:

but I have not figured out if it supports OOM-handling or not from a very quick glance at its docs. I do know that it reuses the serde traits, and I don't know if those interfaces are going to constrain us here

so I looked into this a little more and I think we can make it work if we create new collection types (eg wasmtime_collections::Vec) for our OOM-handling collections rather than do extension traits for std/alloc types.[^0] this is because we need a separate implementation of serde::Deserialize that does OOM handling, but there can only be one implementation of serde::Deserialize for one type.

[^0]: or alternatively use newtypes only within the things we want to deserialize with postcard/serde but if we are doing newtypes for this we might as well do it for all use sites


Last updated: Dec 06 2025 at 06:05 UTC