fitzgen opened issue #11256:
This issue is a follow up to https://github.com/bytecodealliance/wasmtime/pull/11230 and the discussion it spawned around Wasm guests like Rust and C++ that may want to use exceptions but will not use GC, and how we can allow Wasmtime to be tuned for these use cases when embedders know that will not also be running Wasm guests that use GC.
I think it makes sense to open the discussion by enumerating our constraints and desired properties and then analyze potential options through that lens.
Constraints:
We must implement the full Wasm exceptions standard. Separately, we can potentially have a (compile-time and/or runtime) configuration where
exnrefs are disabled, for example, but Wasmtime must still provide an implementation of the full spec by default.We must be able to collect exception-and-gc-object cycles. When GC is enabled, an exception can contain a GC reference to a (for example) a
structand thatstructcan have a reference to the exception. If either edge is an owning/rooting reference, then the cycle can never be collected. (Note that exceptions by themselves cannot form cycles, since they are immutable.)We must only have one exceptions implementation. We do not have the engineering resources to maintain multiple, completely-disjoint implementations of the exceptions proposal. We can add knobs to turn certain subsets of functionality on or off, and tweak things here and there, but we can't realistically swap out the fundamental approach and representations.
Desired properties:
Small memory overheads. Fundamentally, exceptions must be stored somewhere and that will require memory. Additionally, they are dynamically sized and can have any number of payload fields, which further complicates things. But we shouldn't have to allocate the equivalent of a second full 4GiB linear memory for every LIME-style guest that uses exceptions and throws a couple of times during its execution, if it even throws at all.
Small code size. As always, all else being equal, smaller is better. Ideally we wouldn't have to include code in the binary for allocating
structs, for example, in builds for embedders that will only ever run LIME+exceptions Wasm programs.Fast. As always, all else being equal, faster runtime execution is better.
Simple. As always, all else being equal, simpler and easier to maintain is better.
With that out of the way, I'll open up the issue for any ideas that people have!
fitzgen added the performance label to Issue #11256.
fitzgen added the wasmtime:config label to Issue #11256.
fitzgen added the wasmtime:code-size label to Issue #11256.
fitzgen added the wasm-proposal:exceptions label to Issue #11256.
fitzgen commented on issue #11256:
Existing WIP Exceptions Implementation
So here is an analysis of the current WIP exceptions implementation through the lens of the above constraints and desired properties:
- Constraint 1: 5/5. It is able to represent all potential exceptions and
exnref. We will be able to implement the full spec on top of it.- Constraint 2: 5/5. Because exceptions are just GC objects, the collector can identify and collect cycles between exceptions and other GC objects.
- Constraint 3: 5/5. It is the only implementation.
- Desired property 1: 2/5. It requires allocating a second linear memory to be the GC heap. For what it is worth, this linear memory starts with zero capacity and is grown in an amortized, doubling manner, rather than immediately to the full maximum capacity. The GC heap memory has identical configuration as regular linear memories today, meaning that its minimum non-zero size will be 64KiB and will require large virtual memory reservations and guard regions in practice.
- Desired property 2: 1/5. The common GC infrastructure, at least one collector, code to allocate/manipulate
structs andarrays, and all ofwasmtime::StructRefet al must be also be enabled when exceptions are enabled.- Desired property 3: 5/5. Depending on the configured collector, we can use bump allocation to allocate exceptions. Doesn't really get faster than that.
- Desired property 4: 5/5. Reuses existing infrastructure we must maintain anyways.
- Desired property 5: 5/5. Sandboxes all accesses of exceptions, same as other accesses into the GC heap.
fitzgen edited issue #11256:
This issue is a follow up to https://github.com/bytecodealliance/wasmtime/pull/11230 and the discussion it spawned around Wasm guests like Rust and C++ that may want to use exceptions but will not use GC, and how we can allow Wasmtime to be tuned for these use cases when embedders know that will not also be running Wasm guests that use GC.
I think it makes sense to open the discussion by enumerating our constraints and desired properties and then analyze potential options through that lens.
Constraints:
We must implement the full Wasm exceptions standard. Separately, we can potentially have a (compile-time and/or runtime) configuration where
exnrefs are disabled, for example, but Wasmtime must still provide an implementation of the full spec by default.We must be able to collect exception-and-gc-object cycles. When GC is enabled, an exception can contain a GC reference to a (for example) a
structand thatstructcan have a reference to the exception. If either edge is an owning/rooting reference, then the cycle can never be collected. (Note that exceptions by themselves cannot form cycles, since they are immutable.)We must only have one exceptions implementation. We do not have the engineering resources to maintain multiple, completely-disjoint implementations of the exceptions proposal. We can add knobs to turn certain subsets of functionality on or off, and tweak things here and there, but we can't realistically swap out the fundamental approach and representations.
Desired properties:
Small memory overheads. Fundamentally, exceptions must be stored somewhere and that will require memory. Additionally, they are dynamically sized and can have any number of payload fields, which further complicates things. But we shouldn't have to allocate the equivalent of a second full 4GiB linear memory for every LIME-style guest that uses exceptions and throws a couple of times during its execution, if it even throws at all.
Small code size. As always, all else being equal, smaller is better. Ideally we wouldn't have to include code in the binary for allocating
structs, for example, in builds for embedders that will only ever run LIME+exceptions Wasm programs.Fast. As always, all else being equal, faster runtime execution is better.
Simple. As always, all else being equal, simpler and easier to maintain is better.
Secure. Everything we add to Wasmtime must be absent known security vulnerabilities or fundamental flaws -- we would never accept an implementation with known use-after-free bugs or which fundamentally allows Wasm guests to escape the sandbox -- but, all else being equal, implementations with more security assurances are better than those with less assurances.
With that out of the way, I'll open up the issue for any ideas that people have!
fitzgen commented on issue #11256:
Reserve N bytes in the
vmctxfor a single exception objectThis is an idea that has been floated in the past. We reserve space in the
Store/vmctxfor an exception, only allow one exception to be live at any time in the wholeStore, and disallow exceptions that do not fit. This is very nice in terms of our desired properties, but because it is so limited, it doesn't actually satisfy our hard constraints.
- Constraint 1: 1/5. We cannot implement the full exceptions standard with this approach, because we cannot have multiple
exnrefs live at the same time.- Constraint 2: ?/5. It is unclear exactly what the memory/lifetime management would be here.
- Constraint 3: 1/5. Because this does not satisfy constraint 1, we would need another implementation that does.
- Desired property 1: 5/5. Reserves minimal additional space, and it is even part of the existing
vmctxallocation, rather than a new allocation.- Desired property 2: 5/5. Would need only minimal additional runtime infrastructure, and would not need the GC infrastructure, for example.
- Desired property 3: 5/5. Allocating an exception just writes the payload data into a known location.
- Desired property 4: 5/5. It's pretty much just
vmctxaccesses, something we must already do a lot of.- Desired property 5: 5/5. Easy to have confindence in because of its simplicity.
fitzgen commented on issue #11256:
Some kind of simple, non-GC representation of exceptions
There are a couple approaches in this family, notably:
- We could represent exceptions as indices into a side table (something like
wasmtime_slab::Slab<VMException>orVec<VMException>) hanging off theStore/vmctx.- We could represent exceptions as
malloced*mut VMExceptions.And these objects could either be refcounted or leaked until the
Storeis dropped.Again, these approaches can rate pretty well in terms of some of our desired properties, depending on the exact details, but they cannot satisfy constraints 2 and 3 at the same time.
- Constraint 1: 5/5. We could implement all of the exceptions proposal this way.
- Constraint 2: 1/5. If we leak these objects until the store is dropped, then we are also leaking any GC objects that they transitively reference and we definitely cannot collect cycles. If they are reference counted, then we cannot collect cycles between exceptions and the GC heap (at least not without adding another meta-collector to collect such cycles, like Firefox must do for Gecko DOM objects and JS GC objects, but this would add code size to builds with GC as well as complexity and duplicate collector implementation/maintenance).
- Constraint 3: 1/5. Because this cannot satisfy constraint 2, we would need another implementation that can.
- Desired property 1: 5/5. Depending on the exact details chosen, reserves minimal additional space. Potentially batches the allocation of many exceptions into a single allocation, if using a side table rather than individual
mallocs. If the side table is fixed size, could even be inlined into theVMStoreContextallocation.- Desired property 2: 2/5 or 5/5. If exceptions are leaked until the
Storeis dropped, then we would need only minimal additional runtime infrastructure. If they are reference counted, however, then we would need to do deferred reference counting, just like our existing DRC collector, or else objects in frames that we skip over when trapping would be leaked. That implies roughly as much runtime code as Wasm GC does.- Desired property 3: 3/5 or 5/5. Either some kind of free list operation or the equivalent of a
Vec::push.- Desired property 4: 1/5 or 3/5. In isolation, seems fairly simple, especially if leaking exceptions until the
Storeis dropped, but either way this code is only additive and doesn't reuse existing infrastructure. Even worse if we are creating new variants of all of deferred reference counting.- Desired property 5: 1/5 or 5/5. If we sandbox the exceptions in a side table, bounds check all those accesses, and do not trust anything we read out of the table, then this is equivalent to our sandboxed GC heaps in terms of security assurances. If we don't do that, then this is a new case of Wasm manipulating raw native pointers, and we are at pretty big risk of miscompilations becoming disastrous.
fitzgen commented on issue #11256:
Extension to current implementation: add config knobs for GC heap memories that are separate from linear memory config knobs
This alleviates on of the weaknesses of our current implementation by allowing the memories used for GC heaps to be configured separately from regular linear memories.
The pooling allocator would also need some small updates: right now it assumes that GC heap memories and linear memories are identical and uses the same pool for both. That would need to become an optimization or config option and when the two kinds of memories have different configurations the pooling allocator would need to have a separate pool for GC heap memories.
- Constraint 1: 5/5. ditto.
- Constraint 2: 5/5. ditto.
- Constraint 3: 5/5. ditto.
- Desired property 1: 5/5. Once GC heap memories and linear memories can be configured separately, GC heaps can be very small, even just (say) 128 bytes, via the use of "custom-page-sizes" for GC heaps' memories. No separate host allocations per exception object. Equivalent, in terms of memory overhead, to the side table approach.
- Desired property 2: 1/5. ditto.
- Desired property 3: 5/5. ditto.
- Desired property 4: 5/5. ditto.
- Desired property 5: 5/5. ditto.
fitzgen commented on issue #11256:
Extension to current implementation:
#[cfg]-gatestructandarrayseparately from the core GC runtimeRight now,
#[cfg(feature = "gc")]controls whether a build includes both
- the core GC runtime (like sandboxed GC heap accesses and whether a
Storecan have an associated GC heap; note that each collector already has its own feature flag), and- the runtime functions for allocating and manipulating
structandarrayobjects in particular and their reflections in host APIs likewasmtime::StructRefandwasmtime::ArrayRef.The current exceptions implementation only requires the former, and does not need the latter. We could tweak our
cfgfeature flags and our uses of them such that we could include the core GC runtime in a build but not any of thestructandarrayspecifics.
- Constraint 1: 5/5. ditto.
- Constraint 2: 5/5. ditto.
- Constraint 3: 5/5. ditto.
- Desired property 1: 2/5. ditto.
- Desired property 2: 2/5, 3/5, or 4/5? This should somewhat improve code size over the current implementation, but it isn't clear by exactly how much.
- Desired property 3: 5/5. ditto.
- Desired property 4: 5/5. ditto.
- Desired property 5: 5/5. ditto.
fitzgen commented on issue #11256:
Finally, it is also worth noting that the previous two extensions of the current implementation are compatible with each other, so if we consider them together, then we get the following rating, which is pretty good along all of our dimensions:
- Constraint 1: 5/5.
- Constraint 2: 5/5.
- Constraint 3: 5/5.
- Desired property 1: 5/5.
- Desired property 2: 2/5, 3/5, or 4/5?
- Desired property 3: 5/5.
- Desired property 4: 5/5.
- Desired property 5: 5/5.
cfallin commented on issue #11256:
Thanks very much for writing out all of these tradeoffs explicitly! This indeed captures all of the reasoning behind #11230 and much more, and more clearly than I laid out there.
I'll add a personal note that I went through many of these options in sequence as I was working out a design and I am sympathetic to the desire to find "something simpler" -- it doesn't feel at first like GC should be so tightly coupled. Unfortunately, in an engine that has both, it does seem to be that way.
I think the only thing I would add is that, with respect to the last option ("
#[cfg]-gatestructandarrayseparately") it is absolutely worth measuring, but I suspect the savings here might not be as much as one would hope: almost all of the machinery is in shared bits (GC algorithm but also e.g. the engine type registry); exceptions are "just another enum arm" in most places; and the host API layer will mostly fall away via link-time per-function GC if not used by the embedder. That said, I share your hunch that a small GC algorithm shouldn't add that much code-size. And to the extent that it does, that mechanism is what one would need to build anyway (just as you say: "That implies roughly as much runtime code as Wasm GC does."!).
alexcrichton commented on issue #11256:
Agreed thanks for writing this all up!
My intuition matches what y'all are reaching as well I think, which is that the most viable path forward is combining knobs for GC memory with more aggressive #[cfg]. What I might add to the mix here though is the concept of a new GC entirely (e.g. implementation of
GcRuntime. Everything in Wasmtime is already factored for multipleGcRuntimeimplementations so I wouldn't consider this too too large of a maintenance burden, and I think we could codify some simple rules for a GC that makes it unsuitable for general-purpose usage but suitable for "low runtime cost and low runtime footprint":
- This gc would basically be a DRC collector for exnref objects, but that's it.
- This DRC collector wouldn't support exnref-in-exnref (probably via some slice/subset of the exception-handling proposal). That means GC objects can't point to other GC objects.
- This DRC collector would only support allocating a fixed size object (also via a slice/subset of the exception-handling proposal), for example each exception is always 128 bytes or something like that.
This would make GC allocation and GC operations practically trivial and my guess is that this would reduce a lot of the hypothetical overhead of the full-blown DRC collector for all exnref objects. Naturally though I'd also say that we should measure the different, for example, between the code footprint of the null collector and drc collector since that's sort of the upper and lower bounds of this hypothetical new collector.
In the near-term though I continue to feel that the order, in decreasing priority, things should happen are (1) implement the full proposal with everything integerated (aka the current trajectory), (2) work on separate GC knobs to make the runtime impact as low as we can, or basically don't require a second linear memory just for setjmp/longjmp in practice, and (3) follow-up with work to reduce the runtime footprint of the setjmp/longjmp use case.
alexcrichton edited a comment on issue #11256:
Agreed thanks for writing this all up!
My intuition matches what y'all are reaching as well I think, which is that the most viable path forward is combining knobs for GC memory with more aggressive #[cfg]. What I might add to the mix here though is the concept of a new GC entirely (e.g. implementation of
GcRuntime). Everything in Wasmtime is already factored for multipleGcRuntimeimplementations so I wouldn't consider this too too large of a maintenance burden, and I think we could codify some simple rules for a GC that makes it unsuitable for general-purpose usage but suitable for "low runtime cost and low runtime footprint":
- This gc would basically be a DRC collector for exnref objects, but that's it.
- This DRC collector wouldn't support exnref-in-exnref (probably via some slice/subset of the exception-handling proposal). That means GC objects can't point to other GC objects.
- This DRC collector would only support allocating a fixed size object (also via a slice/subset of the exception-handling proposal), for example each exception is always 128 bytes or something like that.
This would make GC allocation and GC operations practically trivial and my guess is that this would reduce a lot of the hypothetical overhead of the full-blown DRC collector for all exnref objects. Naturally though I'd also say that we should measure the different, for example, between the code footprint of the null collector and drc collector since that's sort of the upper and lower bounds of this hypothetical new collector.
In the near-term though I continue to feel that the order, in decreasing priority, things should happen are (1) implement the full proposal with everything integerated (aka the current trajectory), (2) work on separate GC knobs to make the runtime impact as low as we can, or basically don't require a second linear memory just for setjmp/longjmp in practice, and (3) follow-up with work to reduce the runtime footprint of the setjmp/longjmp use case.
Last updated: Dec 06 2025 at 07:03 UTC