Stream: git-wasmtime

Topic: wasmtime / issue #11256 Tuning Wasmtime for linear memory...


view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 18:08):

fitzgen opened issue #11256:

This issue is a follow up to https://github.com/bytecodealliance/wasmtime/pull/11230 and the discussion it spawned around Wasm guests like Rust and C++ that may want to use exceptions but will not use GC, and how we can allow Wasmtime to be tuned for these use cases when embedders know that will not also be running Wasm guests that use GC.

I think it makes sense to open the discussion by enumerating our constraints and desired properties and then analyze potential options through that lens.

Constraints:

  1. We must implement the full Wasm exceptions standard. Separately, we can potentially have a (compile-time and/or runtime) configuration where exnrefs are disabled, for example, but Wasmtime must still provide an implementation of the full spec by default.

  2. We must be able to collect exception-and-gc-object cycles. When GC is enabled, an exception can contain a GC reference to a (for example) a struct and that struct can have a reference to the exception. If either edge is an owning/rooting reference, then the cycle can never be collected. (Note that exceptions by themselves cannot form cycles, since they are immutable.)

  3. We must only have one exceptions implementation. We do not have the engineering resources to maintain multiple, completely-disjoint implementations of the exceptions proposal. We can add knobs to turn certain subsets of functionality on or off, and tweak things here and there, but we can't realistically swap out the fundamental approach and representations.

Desired properties:

  1. Small memory overheads. Fundamentally, exceptions must be stored somewhere and that will require memory. Additionally, they are dynamically sized and can have any number of payload fields, which further complicates things. But we shouldn't have to allocate the equivalent of a second full 4GiB linear memory for every LIME-style guest that uses exceptions and throws a couple of times during its execution, if it even throws at all.

  2. Small code size. As always, all else being equal, smaller is better. Ideally we wouldn't have to include code in the binary for allocating structs, for example, in builds for embedders that will only ever run LIME+exceptions Wasm programs.

  3. Fast. As always, all else being equal, faster runtime execution is better.

  4. Simple. As always, all else being equal, simpler and easier to maintain is better.

With that out of the way, I'll open up the issue for any ideas that people have!

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 18:08):

fitzgen added the performance label to Issue #11256.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 18:08):

fitzgen added the wasmtime:config label to Issue #11256.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 18:08):

fitzgen added the wasmtime:code-size label to Issue #11256.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 18:08):

fitzgen added the wasm-proposal:exceptions label to Issue #11256.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 18:33):

fitzgen commented on issue #11256:

Existing WIP Exceptions Implementation

So here is an analysis of the current WIP exceptions implementation through the lens of the above constraints and desired properties:

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 18:33):

fitzgen edited issue #11256:

This issue is a follow up to https://github.com/bytecodealliance/wasmtime/pull/11230 and the discussion it spawned around Wasm guests like Rust and C++ that may want to use exceptions but will not use GC, and how we can allow Wasmtime to be tuned for these use cases when embedders know that will not also be running Wasm guests that use GC.

I think it makes sense to open the discussion by enumerating our constraints and desired properties and then analyze potential options through that lens.

Constraints:

  1. We must implement the full Wasm exceptions standard. Separately, we can potentially have a (compile-time and/or runtime) configuration where exnrefs are disabled, for example, but Wasmtime must still provide an implementation of the full spec by default.

  2. We must be able to collect exception-and-gc-object cycles. When GC is enabled, an exception can contain a GC reference to a (for example) a struct and that struct can have a reference to the exception. If either edge is an owning/rooting reference, then the cycle can never be collected. (Note that exceptions by themselves cannot form cycles, since they are immutable.)

  3. We must only have one exceptions implementation. We do not have the engineering resources to maintain multiple, completely-disjoint implementations of the exceptions proposal. We can add knobs to turn certain subsets of functionality on or off, and tweak things here and there, but we can't realistically swap out the fundamental approach and representations.

Desired properties:

  1. Small memory overheads. Fundamentally, exceptions must be stored somewhere and that will require memory. Additionally, they are dynamically sized and can have any number of payload fields, which further complicates things. But we shouldn't have to allocate the equivalent of a second full 4GiB linear memory for every LIME-style guest that uses exceptions and throws a couple of times during its execution, if it even throws at all.

  2. Small code size. As always, all else being equal, smaller is better. Ideally we wouldn't have to include code in the binary for allocating structs, for example, in builds for embedders that will only ever run LIME+exceptions Wasm programs.

  3. Fast. As always, all else being equal, faster runtime execution is better.

  4. Simple. As always, all else being equal, simpler and easier to maintain is better.

  5. Secure. Everything we add to Wasmtime must be absent known security vulnerabilities or fundamental flaws -- we would never accept an implementation with known use-after-free bugs or which fundamentally allows Wasm guests to escape the sandbox -- but, all else being equal, implementations with more security assurances are better than those with less assurances.

With that out of the way, I'll open up the issue for any ideas that people have!

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 19:16):

fitzgen commented on issue #11256:

Reserve N bytes in the vmctx for a single exception object

This is an idea that has been floated in the past. We reserve space in the Store/vmctx for an exception, only allow one exception to be live at any time in the whole Store, and disallow exceptions that do not fit. This is very nice in terms of our desired properties, but because it is so limited, it doesn't actually satisfy our hard constraints.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 19:20):

fitzgen commented on issue #11256:

Some kind of simple, non-GC representation of exceptions

There are a couple approaches in this family, notably:

And these objects could either be refcounted or leaked until the Store is dropped.

Again, these approaches can rate pretty well in terms of some of our desired properties, depending on the exact details, but they cannot satisfy constraints 2 and 3 at the same time.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 19:34):

fitzgen commented on issue #11256:

Extension to current implementation: add config knobs for GC heap memories that are separate from linear memory config knobs

This alleviates on of the weaknesses of our current implementation by allowing the memories used for GC heaps to be configured separately from regular linear memories.

The pooling allocator would also need some small updates: right now it assumes that GC heap memories and linear memories are identical and uses the same pool for both. That would need to become an optimization or config option and when the two kinds of memories have different configurations the pooling allocator would need to have a separate pool for GC heap memories.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 19:50):

fitzgen commented on issue #11256:

Extension to current implementation: #[cfg]-gate struct and array separately from the core GC runtime

Right now, #[cfg(feature = "gc")] controls whether a build includes both

The current exceptions implementation only requires the former, and does not need the latter. We could tweak our cfg feature flags and our uses of them such that we could include the core GC runtime in a build but not any of the struct and array specifics.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 19:52):

fitzgen commented on issue #11256:

Finally, it is also worth noting that the previous two extensions of the current implementation are compatible with each other, so if we consider them together, then we get the following rating, which is pretty good along all of our dimensions:

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 20:24):

cfallin commented on issue #11256:

Thanks very much for writing out all of these tradeoffs explicitly! This indeed captures all of the reasoning behind #11230 and much more, and more clearly than I laid out there.

I'll add a personal note that I went through many of these options in sequence as I was working out a design and I am sympathetic to the desire to find "something simpler" -- it doesn't feel at first like GC should be so tightly coupled. Unfortunately, in an engine that has both, it does seem to be that way.

I think the only thing I would add is that, with respect to the last option ("#[cfg]-gate struct and array separately") it is absolutely worth measuring, but I suspect the savings here might not be as much as one would hope: almost all of the machinery is in shared bits (GC algorithm but also e.g. the engine type registry); exceptions are "just another enum arm" in most places; and the host API layer will mostly fall away via link-time per-function GC if not used by the embedder. That said, I share your hunch that a small GC algorithm shouldn't add that much code-size. And to the extent that it does, that mechanism is what one would need to build anyway (just as you say: "That implies roughly as much runtime code as Wasm GC does."!).

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 21:13):

alexcrichton commented on issue #11256:

Agreed thanks for writing this all up!

My intuition matches what y'all are reaching as well I think, which is that the most viable path forward is combining knobs for GC memory with more aggressive #[cfg]. What I might add to the mix here though is the concept of a new GC entirely (e.g. implementation of GcRuntime. Everything in Wasmtime is already factored for multiple GcRuntime implementations so I wouldn't consider this too too large of a maintenance burden, and I think we could codify some simple rules for a GC that makes it unsuitable for general-purpose usage but suitable for "low runtime cost and low runtime footprint":

This would make GC allocation and GC operations practically trivial and my guess is that this would reduce a lot of the hypothetical overhead of the full-blown DRC collector for all exnref objects. Naturally though I'd also say that we should measure the different, for example, between the code footprint of the null collector and drc collector since that's sort of the upper and lower bounds of this hypothetical new collector.

In the near-term though I continue to feel that the order, in decreasing priority, things should happen are (1) implement the full proposal with everything integerated (aka the current trajectory), (2) work on separate GC knobs to make the runtime impact as low as we can, or basically don't require a second linear memory just for setjmp/longjmp in practice, and (3) follow-up with work to reduce the runtime footprint of the setjmp/longjmp use case.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 16 2025 at 21:13):

alexcrichton edited a comment on issue #11256:

Agreed thanks for writing this all up!

My intuition matches what y'all are reaching as well I think, which is that the most viable path forward is combining knobs for GC memory with more aggressive #[cfg]. What I might add to the mix here though is the concept of a new GC entirely (e.g. implementation of GcRuntime). Everything in Wasmtime is already factored for multiple GcRuntime implementations so I wouldn't consider this too too large of a maintenance burden, and I think we could codify some simple rules for a GC that makes it unsuitable for general-purpose usage but suitable for "low runtime cost and low runtime footprint":

This would make GC allocation and GC operations practically trivial and my guess is that this would reduce a lot of the hypothetical overhead of the full-blown DRC collector for all exnref objects. Naturally though I'd also say that we should measure the different, for example, between the code footprint of the null collector and drc collector since that's sort of the upper and lower bounds of this hypothetical new collector.

In the near-term though I continue to feel that the order, in decreasing priority, things should happen are (1) implement the full proposal with everything integerated (aka the current trajectory), (2) work on separate GC knobs to make the runtime impact as low as we can, or basically don't require a second linear memory just for setjmp/longjmp in practice, and (3) follow-up with work to reduce the runtime footprint of the setjmp/longjmp use case.


Last updated: Dec 06 2025 at 07:03 UTC