wasmtime / PR #12554 Add a knob to `OomTest` to allow/dis... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #12554 Add a knob to `OomTest` to allow/dis...

Wasmtime GitHub notifications bot (Feb 09 2026 at 23:28):

fitzgen requested alexcrichton for a review on PR #12554.

Wasmtime GitHub notifications bot (Feb 09 2026 at 23:28):

fitzgen requested wasmtime-fuzz-reviewers for a review on PR #12554.

Wasmtime GitHub notifications bot (Feb 09 2026 at 23:28):

fitzgen opened PR #12554 from fitzgen:oom-test-knob-alloc-after-oom to bytecodealliance:main:

Wasmtime GitHub notifications bot (Feb 10 2026 at 00:02):

alexcrichton submitted PR review:

I know we discussed this briefly before, but looking back I'm still not sure what the best thing to do here is. It's easy to reason about the previous behavior, but with this current behavior technically what we need to do is to basically re-run "OOM at the finger" behavior after the original OOM.

For example we in theory need to explore the tree-like behavior where, after an OOM, every future allocation can either OOM or succeed. This PR takes the everything-always-fails route which might not explore as much code as something-failed-then-succeeded-then-failed-again or something like that.

My initial gut was that allocations should succeed after the first OOm. My second gut was that we should go with your original hunch of strictly enforcing no allocations after OOM. My third gut then got confused the more I thought about this...

What do you think about this? We could probably hook something up to fuzzing eventually do pseudo-randomly make allocations fail, but that wouldn't work well in the current infrastructure since it's already exhaustively exploring most state spaces. I'm not sure it's worth it to exhaustively explore this state space necessarily either.

Out of curiosity what was the context this came up in? I'm curious to calibrate my thinking about this problem in general

Wasmtime GitHub notifications bot (Feb 10 2026 at 00:02):

alexcrichton created PR review comment:

I think this is preexisting as well, but I believe that this, existing tests using alloc, and below, are all leaking memory.

Wasmtime GitHub notifications bot (Feb 10 2026 at 04:36):

github-actions[bot] added the label fuzzing on PR #12554.

Wasmtime GitHub notifications bot (Feb 10 2026 at 04:37):

github-actions[bot] commented on PR #12554:

Subscribe to Label Action

cc @fitzgen

<details>
This issue or pull request has been labeled: "fuzzing"

Thus the following users have been cc'd because of the following labels:

fitzgen: fuzzing

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Feb 10 2026 at 16:39):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Feb 10 2026 at 16:39):

fitzgen created PR review comment:

D'oh :facepalm:

Wasmtime GitHub notifications bot (Feb 10 2026 at 16:59):

fitzgen commented on PR #12554:

I know we discussed this briefly before, but looking back I'm still not sure what the best thing to do here is. It's easy to reason about the previous behavior, but with this current behavior technically what we need to do is to basically re-run "OOM at the finger" behavior after the original OOM.

For example we in theory need to explore the tree-like behavior where, after an OOM, every future allocation can either OOM or succeed. This PR takes the everything-always-fails route which might not explore as much code as something-failed-then-succeeded-then-failed-again or something like that.

My initial gut was that allocations should succeed after the first OOm. My second gut was that we should go with your original hunch of strictly enforcing no allocations after OOM. My third gut then got confused the more I thought about this...

We can add another knob to control whether allocations succeed or fail after the first OOM, if we ever need to tweak the behavior.

What do you think about this? We could probably hook something up to fuzzing eventually do pseudo-randomly make allocations fail, but that wouldn't work well in the current infrastructure since it's already exhaustively exploring most state spaces. I'm not sure it's worth it to exhaustively explore this state space necessarily either.

Yeah, there is no way we can exhaustively explore the space as we start testing more and more complicated code paths.

We will want to start having the fuzzer drive OOM tests.

Out of curiosity what was the context this came up in? I'm curious to calibrate my thinking about this problem in general

It is a pretty simple scenario where I'm not too concerned with actually exploring/exercising the post-OOM allocation behavior, so hopefully this will alleviate some of your concerns here (which are all valid, but also not a bridge we have to cross yet really, IMO):

When implementing custom serialization/deserialization for our OOM-handling collections, when we encounter OOM during deserialization, we return a an error via the serde::de::Error::custom(oom) static trait method

Each serde format library (serde_json, postcard, etc...) will typically have its own concrete error type that implements that serde::de::Error trait. In the specific case I am hitting now, this means deserialization returns a postcard::Error, and specifically the SerdeDeCustom variant of that error. That error does not keep around the original OOM error.

Therefore, our call into postcard::deserialize will return an error that we no longer know is an OOM and wasmtime::Error::from(postcard::Error) will attempt to box the postcard error into its internal thin-pointer dyn Error trait object, which will trigger the alloc-after-OOM detector in our OOM test harness.

So this knob is being used just to say "yes, we attempt to make an allocation after OOM in this specific case, but it is ultimately nothing to worry about and the situation is out of our hands."

Aside: yes, it is too bad that we lose the original OOM error and its size-of-failed-alloc info here, but I don't think it is too bad. If, in the future, we really want to avoid losing it, we could perhaps make our own deserialization trait that provides a side-channel for OOM errors and somehow also adapts to serde::de::Deserialize. Would need to think a bit more to flesh this out. Also don't think we need it urgently at this time, more important is just getting stuff working in the face of OOM at all.

Wasmtime GitHub notifications bot (Feb 10 2026 at 17:21):

alexcrichton submitted PR review:

Ok yeah thanks for explaining, and definitely makes sense. I'd agree there's not much we can do about postcard discarding the custom error we pass in, nor anything we can do about recovering the precise error afterwards. That's pretty minor and I'm not too worried about it though, so I think it's reasonable to land this as-is (perhaps modulo the memory leaks)

I also think there's no need to fully flesh out the fuzzing just yet, the fuzzing parts are going to be more interesting when there's a more full-featured thing to run which hits OOM occasionally, and that'll happen later.

Also I was curious why ASAN in CI didn't catch the memory leak, because it should. Turns out we exclude wasmtime-fuzzing from ./ci/run-tests.py specifically and that's because --all-features would require OCaml which we don't want to deal with. The dedicated job to running these tests doesn't run with ASAN

Wasmtime GitHub notifications bot (Feb 10 2026 at 19:47):

fitzgen updated PR #12554.

Wasmtime GitHub notifications bot (Feb 10 2026 at 19:47):

fitzgen has enabled auto merge for PR #12554.

Re-enqueuing, network ghosts.

Wasmtime GitHub notifications bot (Feb 10 2026 at 21:20):

fitzgen merged PR #12554.

Wasmtime GitHub notifications bot (Feb 10 2026 at 21:20):

fitzgen removed PR #12554 Add a knob to OomTest to allow/disallow allocation after OOM injection from the merge queue.

Last updated: Feb 24 2026 at 04:36 UTC

Stream: git-wasmtime

Topic: wasmtime / PR #12554 Add a knob to `OomTest` to allow/dis...

Wasmtime GitHub notifications bot (Feb 09 2026 at 23:28):

Wasmtime GitHub notifications bot (Feb 09 2026 at 23:28):

Wasmtime GitHub notifications bot (Feb 09 2026 at 23:28):

Wasmtime GitHub notifications bot (Feb 10 2026 at 00:02):

Wasmtime GitHub notifications bot (Feb 10 2026 at 00:02):

Wasmtime GitHub notifications bot (Feb 10 2026 at 04:36):

Wasmtime GitHub notifications bot (Feb 10 2026 at 04:37):

Subscribe to Label Action

Wasmtime GitHub notifications bot (Feb 10 2026 at 16:39):

Wasmtime GitHub notifications bot (Feb 10 2026 at 16:39):

Wasmtime GitHub notifications bot (Feb 10 2026 at 16:59):

Wasmtime GitHub notifications bot (Feb 10 2026 at 17:21):

Wasmtime GitHub notifications bot (Feb 10 2026 at 19:47):

Wasmtime GitHub notifications bot (Feb 10 2026 at 19:47):

Wasmtime GitHub notifications bot (Feb 10 2026 at 20:00):

Wasmtime GitHub notifications bot (Feb 10 2026 at 20:37):

Wasmtime GitHub notifications bot (Feb 10 2026 at 20:39):

Wasmtime GitHub notifications bot (Feb 10 2026 at 20:40):

Wasmtime GitHub notifications bot (Feb 10 2026 at 21:20):

Wasmtime GitHub notifications bot (Feb 10 2026 at 21:20):