fitzgen opened issue #12069:
We have been discussing this in a couple recent Wasmtime meetings[^0] and on Zulip and I figured it was time to centralize discussion in a tracking issue.
[^0]: See https://github.com/bytecodealliance/meetings/blob/main/wasmtime/2025/wasmtime-10-23.md and https://github.com/bytecodealliance/meetings/blob/main/wasmtime/2025/wasmtime-11-20.md
What does handling OOM mean in this case? It means turning allocation failure into an
Err(...)return and ultimately propagating that up to the Wasmtime embedder. It may even involve poisoning various data structures if necessary, maybe up to a whole store if necessary, but we haven't fleshed out the details completely yet. That will happen in discussions on this issue and various PRs during implementation.Various, unordered sketches of things that will be involved:
- [ ] Replace
anyhow::Errorwith a customwasmtime::Error. At its most bare-bones, with all cargo features disabled, we will want this to basically be anenumwithout any data payloads in its variants. As we enable more cargo features, we can start adding support for formatting and error context and ultimately get to something likeanyhow::Errorwith all features enabled.- [ ] Create a
wasmtime-collectionscrate that exposes fallibleVec,HashMap, etc... This is probably just going to be newtypes over the types we already use today, butwasmtime_collections::Vec::pushwill return aResultand be implemented via something likeself.0.try_reserve(1)?; self.0.push(item); Ok(()), for example.- [ ] We will need custom
serde::Deserializeimplementations that handle OOM failure for thewasmtime-collectionstypes we use in our metadata that gets serialized into elf sections in our compiled code.- [ ] We would ideally like to statically analyze our code and make sure that we aren't allocating infallibly in the relevant code paths. It seems like we can probably use clippy for this, or at least for a 95% solution to this that is Good Enough in practice.
- [ ] We need a way to dynamically test/fuzz our OOM handling to make sure we are actually getting it right in practice.
We will initially focus on supporting the following code paths:
- Creating a
Config- Creating an
Engine- Creating a
Linker- Creating
InstancePres- Deserializing pre-compiled
Modules andComponents (not compiling new ones!)- Creating
Stores- Creating
Instances- Creating
Memorys,Tables,Globals, etc...- Running Wasm
Basically, everything that is supported in our no-std/pulley builds now: a basic runtime without the compiler, that can only run pre-compiled Wasm. We will not initially support async or the pooling allocator either, for example. I have vague ideas about how we might be able to refactor the pooling allocator for greater flexibility and enable its use in no-std / no-virtual-memory environments, but that is a bit orthogonal.
Eventually we will want to support async Wasm, yielding on out-of-fuel, ..., and the component model's async functionality. That is going to be a larger project on top of this already large project, so I'm going to delay talking about how we will cross that bridge until we get closer to it.
In practice, I expect that we will start with the OOM testing/fuzzing, create something very simple that fails immediately, and land that as "expected to fail". Then we can get that passing, which will be quite a bunch of work for this first iteration. Then we can remove the failure expectation. Then we can do a little bit more stuff inside the OOM testing/fuzzing and reveal new places we need to fix, and then we can fix those. We can continue this process until things are starting to look more and more complete. At some point we will add the clippy lints, initially to smaller modules and eventually to bigger regions of code. But the testing can be the forcing function for what area of code we add OOM handling to each step of the way.
The best way to dynamically test/fuzz OOM handling that I know of is the approach taken by SpiderMonkey's
oomTest()helper: run a piece of code (potentially written by humans or generated by a fuzzer) with a special allocator that will return null on the first allocation made and check that the code didn't fail to handle the OOM, then run that code again but failing on the second allocation, then the third, etc... up to your time/compute budget. Starting by building this infrastructure is my rough plan. I've done a little bit of digging for other approaches to ensuring that your OOM-handling is correct, and I haven't really found anything, just people arguing about whether you should even check for null returns frommallocor not, which is not very helpful. That is to say, if anyone has any other ideas or knows of any other prior art here, I'd love to hear about it!
fitzgen added the wasmtime:platform-support label to Issue #12069.
bjorn3 commented on issue #12069:
I suspect you are going to have a hard time ensuring that all dependencies can also handle OOM. For example for parsing ELF files you are using the object crate which allocates. Symbolizing backtraces allocates a debuginfo cache. async-trait allocates boxes for all futures. Serde allocates (and I don't think it is possible to support handling OOM in there without making an api breaking change)
The unstable
std::alloc::set_alloc_error_hookwould almost certainly be much easier to use. It is unstable pending a decision if unwinding out of it would be allowed though and if we should even allow running user code onhandle_alloc_error. (the argument for not allowing it is that there may be unsafe code that is unsound under the extra reentrancy that this would allow)
bjorn3 edited a comment on issue #12069:
I suspect you are going to have a hard time ensuring that all dependencies can also handle OOM. For example for parsing ELF files you are using the object crate which allocates. Symbolizing backtraces allocates a debuginfo cache. async-trait allocates boxes for all futures. Serde allocates when deserializing common types like
Vec(and I don't think it is possible to support handling OOM in there without making an api breaking change)The unstable
std::alloc::set_alloc_error_hookwould almost certainly be much easier to use. It is unstable pending a decision if unwinding out of it would be allowed though and if we should even allow running user code onhandle_alloc_error. (the argument for not allowing it is that there may be unsafe code that is unsound under the extra reentrancy that this would allow)
cfallin commented on issue #12069:
@bjorn3 thanks for the thoughts; we have thought pretty extensively about this and we think the unwinding-based approach is probably not viable in our no-std embedding (see Zulip thread that Nick linked for more). Nick covered serde above; we can turn off symbolization in our embedding; boxed futures are going to be a big hassle when we get to async but in principle there should be ways to define enough of our own version of dependencies to make this work.
alexcrichton commented on issue #12069:
Personally I view this direction for Wasmtime as a high-level decision point for the project. On one hand we have a proposal like @fitzgen has laid out where the pros are "highly portable" and the cons are "big changes, unknown impact on maintainabiilty". On the other hand could go the route of something like
set_alloc_error_hookand/or unwinding which inverts the pros/cons here a bit -- e.g. few changes but much less portable.In my opinion it's worthwhile to shoot for the more portable version of things. We can all agree it's going to be a big chunk of work, but the portability of such a solution is highly attractive I believe. I'm also somewhat confident that we can thread the needle on the maintainability and runtime cost of this solution so I'm less worried about that.
If, however, the strategy of "check for oom everywhere" doesn't work out then we can go back to the drawing board and think harder about unwinding perhaps, but in lieu of that I'd like to pull on the design thread @fitzgen is proposing here and see how far it gets us.
bjorn3 commented on issue #12069:
I'm afraid you will have to basically forego like half of all wasmtime dependencies and write replacements for them. (I wrote this before I did the inventarization below. turns out I was pretty much spot on. 20 allocating vs 18 non-allocating)
With
cargo check --no-default-features --features "runtime async component-model component-model-async stack-switching"I'm getting the following runtime dependencies (checked means likely unconditionally allocates) excluding Wasmtime controlled ones:
- [x] anyhow
- [ ] bitflags
- [x] bumpalo
- [ ] cfg_if
- [ ] cobs
- [ ] encoding_rs
- [ ] equivalent
- [ ] foldhash
- [ ] futures
- [x] futures_channel
- [ ] futures_core
- [ ] futures_io
- [ ] futures_sink
- [ ] futures_task
- [ ] futures_util
- [x] gimli
- [x] hashbrown
- [x] indexmap
- [ ] libc
- [ ] libm
- [ ] linux_raw_sys
- [ ] log
- [ ] memchr
- [x] memfd (through rustix)
- [x] object (unless limiting yourself to the few apis that don't)
- [ ] once_cell
- [ ] pin_project_lite
- [ ] pin_utils
- [x] postcard (if serde impls do)
- [x] rustix (allocates C string for filename)
- [x] semver
- [x] serde (unless deserializing into types that handle OOM)
- [x] serde_core (unless deserializing into types that handle OOM)
- [x] slab
- [x] smallvec
- [x] target_lexicon
- [x] thiserror
- [x] wasmparser (validator uses
Arc, bunch of other things also allocate)
cfallin commented on issue #12069:
@bjorn3 thanks for this analysis. Most of these we have answers to already, I think.
- Nick covered anyhow above. (Replace our error types with non-allocating versions, and layer allocations for backtraces and error messages on top.)
- We control bumpalo (it's Nick's crate) but I suspect its uses fall in the same bucket as "data structures in general" for which we have the plan with the wasmtime-specific collection.
- Futures-related crates: yes, we'll need to build fallible-boxing versions of futures abstractions.
- gimli: we can forego all backtracing and native-debug support in the OOM-friendly build.
- hashbrown, indexmap: data structures, plan above.
- memfd: in our no-std environment we don't use memfd (or any Linux abstractions).
- object: yes, we'll need to ensure we have fallible allocation paths.
- postcard: Nick looked into this in the Zulip thread.
- rustix: N/A (no-std build)
- semver: I suspect our uses of this are relatively scoped (module version checks?) and we can probably either stub out or fix semver?
- serde, serde_core: see Zulip thread; custom containers
- slab, smallvec: custom data structures
- target_lexicon: we (BA) control this and can fix if needed
- thiserror: see story for error messages above
- wasmparser: compilation is not within scope
Basically: yes, it will be expensive. But in the context we (Nick and I at least) are employed to use Wasmtime, we need to be OOM-friendly to continue using Wasmtime at all, and we likely cannot make unwind work easily (and we would not want its complexity in the critical path of a possibly-frequent failure mode in high-traffic scenarios), so it will take more than "this might be a lot of work" to convince us otherwise.
Soto-J commented on issue #12069:
Hi! I’m interested in contributing to this effort and would love to help with the OOM-handling work.
If there are any recommended starting points or areas that would be most helpful for a new contributor, I’m happy to take them on.
Last updated: Dec 06 2025 at 06:05 UTC