wasmtime / issue #11445 Wasm GC host API: a slightly less... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #11445 Wasm GC host API: a slightly less...

Wasmtime GitHub notifications bot (Aug 16 2025 at 02:09):

Upon pondering the various rooting options in #11326, and reading through the discussion of API tradeoffs here, it occurs to me that there may be a possible addition to the API, which offers more natural "owned root" semantics in Rust but is still pay-as-you-go, which I'd like to sketch here.

Background and Motivation

Currently, we have two types, Rooted<T> and ManuallyRooted<T>. The former has a LIFO discipline, implicitly attaching to the deepest RootScope in a stack of scopes on the store, and the latter is completely manual. Notably, the latter does not have a Drop impl that actually unroots, because doing so would require holding a reference to the Store somehow -- implicitly with Arcs somehow, because actual borrows of the Store would preclude any other operations.

The LIFO discipline of Rooted works well when it works, i.e., when the user is aware of their scopes, but in this comment I outline a few of the realizations I've had around ergonomics when it combines with other features -- in particular, Rust's ? operator, which pushes users to naturally "propagate errors upward without explicit thought". The usual expectation is that the E type in the Result somehow owns its info. So a dynamic failure (a panic, no less!) when that E crosses over a scope is quite surprising. Exceptional returns are the most obvious example to me right now, but I believe some of the surprise here may also occur wit, h users who naively (but understandably) expect that "GC" means they don't have to worry about lifetimes. Said more succinctly, a type named Rooted in a language like Rust where types often imply ownership might imply that it keeps a root as long as it exists. I know I certainly had that expectation at first. (The docs are very good, but "least surprise" still applies here!)

We provide the "escape hatch", as the docs describe, of ManuallyRooted, and this can certainly work well if the user has extreme discipline -- unfortunately, the requirement of a manual unrooting step means that it is very easy to get wrong, again as the docs describe well.

The middle ground, of a type that somehow unregisters itself on Drop by keeping enough of a handle on the engine's internal state to do so, is described as impractical: it would require an Arc<Mutex<...>> to track internal handle state/registration, as the docs say. This would create synchronization overhead, and potentially pessimize common-case GC operations too. Is there a better way?

Idea: Pay-as-you-Go "Arc'd liveness flags"

Ideally, I want something that:

Acts as an "owned root", i.e., keeps the referent alive as long as the root type exists and the store exists

Has a proper Drop such that dropping the root type ensures the referent is not leaked(*)

Imposes zero overhead on GC if these roots are not used, and imposes only small overhead proportional to the number of such roots if they are

Note the sneaky (*) above: I want to avoid leaks, not to eagerly detect when an unregistration occurs. In other words, let's permit deferring some action from the Drop to, say, the next time a GC runs.

Then to avoid synchronization overhead and contention, rather than a large monolithic registry under a single mutex, let's have some small shared state per owned root.

The idea is: keep a "liveness flag" in an Arc<AtomicBool>. In steady state, when live, there are two references to this Arc: from the owned root type, and from a owned_roots: Vec<(Option<VMGcRef>, Arc<AtomicBool>)> in the GC roots list. When the owned root drops, it sets its atomic bool to false, and drops its Arc reference. When the GC scans roots, it reads out the liveness flags, and removes those roots that the owner has dropped. (E.g. via a retain on the Vec.)[^1]

[^1]: Slight variant: the tuple above could instead be a single Arc<OwnedRootData> with liveness and the VMGcRef, and maybe that's cleaner; I haven't thought too much about how this would live alongside ManuallyRooted and whether it would want to share a GC-ref root array somewhere else or not...

Two realizations make this more efficient than the Arc<Mutex<whole root list>> approach: (i) we do have a mut borrow to the store when we create, so it's fine to have a normal Vec of roots registered -- only Drop is "remote" without the store; (ii) we have a separate bit of state per root, and it's just an atomic bool, which on common architectures (x86 and Apple Silicon's aarch64 at least) has atomic loads that are exactly as cheap as normal loads.

It's also fully safe Rust (Arc makes it so), and is pay-as-you-go: with no such roots existing, GC root scanning has one check of vec-is-empty and a never-taken branch; as close to zero-overhead as we can make it. There is no mutex contention anywhere, because the only "meeting point" is the Vec that's mutated under a &mut Store. In terms of memory allocation, it's certainly more expensive than a LIFO root (which is just an index into an array!), because there's the separate Arc allocation, but I suspect most uses of these roots are likely to be relatively high-level "entry points" or cases like exception returns where scopes don't map well to usage patterns; we can encourage use of LIFO scopes where possible.

Naming

I've called this an "owned root" and I'd gently suggest considering different names for our current Rooted, to more explicitly describe the difference, if we adopt this -- something like ScopedRooted vs. OwnedRooted?

Wasmtime GitHub notifications bot (Aug 16 2025 at 02:09):

cfallin edited issue #11445:

Upon pondering the various rooting options in #11326, and reading through the discussion of API tradeoffs here, it occurs to me that there may be a possible addition to the API, which offers more natural "owned root" semantics in Rust but is still pay-as-you-go, which I'd like to sketch here.

Background and Motivation

Currently, we have two types, Rooted<T> and ManuallyRooted<T>. The former has a LIFO discipline, implicitly attaching to the deepest RootScope in a stack of scopes on the store, and the latter is completely manual. Notably, the latter does not have a Drop impl that actually unroots, because doing so would require holding a reference to the Store somehow -- implicitly with Arcs somehow, because actual borrows of the Store would preclude any other operations.

The LIFO discipline of Rooted works well when it works, i.e., when the user is aware of their scopes, but in this comment I outline a few of the realizations I've had around ergonomics when it combines with other features -- in particular, Rust's ? operator, which pushes users to naturally "propagate errors upward without explicit thought". The usual expectation is that the E type in the Result somehow owns its info. So a dynamic failure (a panic, no less!) when that E crosses over a scope is quite surprising. Exceptional returns are the most obvious example to me right now, but I believe some of the surprise here may also occur wit, h users who naively (but understandably) expect that "GC" means they don't have to worry about lifetimes. Said more succinctly, a type named Rooted in a language like Rust where types often imply ownership might imply that it keeps a root as long as it exists. I know I certainly had that expectation at first. (The docs are very good, but "least surprise" still applies here!)

We provide the "escape hatch", as the docs describe, of ManuallyRooted, and this can certainly work well if the user has extreme discipline -- unfortunately, the requirement of a manual unrooting step means that it is very easy to get wrong, again as the docs describe well.

The middle ground, of a type that somehow unregisters itself on Drop by keeping enough of a handle on the engine's internal state to do so, is described as impractical: it would require an Arc<Mutex<...>> to track internal handle state/registration, as the docs say. This would create synchronization overhead, and potentially pessimize common-case GC operations too. Is there a better way?

Idea: Pay-as-you-Go "Arc'd liveness flags"

Ideally, I want something that:

Acts as an "owned root", i.e., keeps the referent alive as long as the root type exists and the store exists

Has a proper Drop such that dropping the root type ensures the referent is not leaked(*)

Imposes zero overhead on GC if these roots are not used, and imposes only small overhead proportional to the number of such roots if they are

Note the sneaky (*) above: I want to avoid leaks, not to eagerly detect when an unregistration occurs. In other words, let's permit deferring some action from the Drop to, say, the next time a GC runs.

Then to avoid synchronization overhead and contention, rather than a large monolithic registry under a single mutex, let's have some small shared state per owned root.

The idea is: keep a "liveness flag" in an Arc<AtomicBool>. In steady state, when live, there are two references to this Arc: from the owned root type, and from a owned_roots: Vec<(Option<VMGcRef>, Arc<AtomicBool>)> in the GC roots list. When the owned root drops, it sets its atomic bool to false, and drops its Arc reference. When the GC scans roots, it reads out the liveness flags, and removes those roots that the owner has dropped. (E.g. via a retain on the Vec.)[^1]

[^1]: Slight variant: the tuple above could instead be a single Arc<OwnedRootData> with liveness and the VMGcRef, and maybe that's cleaner; I haven't thought too much about how this would live alongside ManuallyRooted and whether it would want to share a GC-ref root array somewhere else or not...

Two realizations make this more efficient than the Arc<Mutex<whole root list>> approach: (i) we do have a mut borrow to the store when we create, so it's fine to have a normal Vec of roots registered -- only Drop is "remote" without the store; (ii) we have a separate bit of state per root, and it's just an atomic bool, which on common architectures (x86 and Apple Silicon's aarch64 at least) has atomic loads that are exactly as cheap as normal loads.

It's also fully safe Rust (Arc makes it so), and is pay-as-you-go: with no such roots existing, GC root scanning has one check of vec-is-empty and a never-taken branch; as close to zero-overhead as we can make it. There is no mutex contention anywhere, because the only "meeting point" is the Vec that's mutated under a &mut Store. In terms of memory allocation, it's certainly more expensive than a LIFO root (which is just an index into an array!), because there's the separate Arc allocation, but I suspect most uses of these roots are likely to be relatively high-level "entry points" or cases like exception returns where scopes don't map well to usage patterns; we can encourage use of LIFO scopes where possible.

Naming

I've called this an "owned root" and I'd gently suggest considering different names for our current Rooted, to more explicitly describe the difference, if we adopt this -- something like ScopedRooted vs. OwnedRooted?

Wasmtime GitHub notifications bot (Aug 16 2025 at 02:11):

cfallin added the wasm-proposal:gc label to Issue #11445.

Wasmtime GitHub notifications bot (Aug 16 2025 at 02:11):

cfallin added the enhancement label to Issue #11445.

Wasmtime GitHub notifications bot (Aug 16 2025 at 02:11):

cfallin added the wasmtime:api label to Issue #11445.

Wasmtime GitHub notifications bot (Aug 16 2025 at 02:12):

cfallin commented on issue #11445:

(cc @fitzgen @alexcrichton for thoughts -- this isn't on the critical path for exceptions in any way, but is an orthogonal thought that seemed worth writing up)

Wasmtime GitHub notifications bot (Aug 16 2025 at 02:13):

cfallin edited issue #11445:

Upon pondering the various rooting options in #11326, and reading through the discussion of API tradeoffs here, it occurs to me that there may be a possible addition to the API, which offers more natural "owned root" semantics in Rust but is still pay-as-you-go, which I'd like to sketch here.

Background and Motivation

Currently, we have two types, Rooted<T> and ManuallyRooted<T>. The former has a LIFO discipline, implicitly attaching to the deepest RootScope in a stack of scopes on the store, and the latter is completely manual. Notably, the latter does not have a Drop impl that actually unroots, because doing so would require holding a reference to the Store somehow -- implicitly with Arcs somehow, because actual borrows of the Store would preclude any other operations.

The LIFO discipline of Rooted works well when it works, i.e., when the user is aware of their scopes, but in this comment I outline a few of the realizations I've had around ergonomics when it combines with other features -- in particular, Rust's ? operator, which pushes users to naturally "propagate errors upward without explicit thought". The usual expectation is that the E type in the Result somehow owns its info. So a dynamic failure (a panic, no less!) when that E crosses over a scope is quite surprising. Exceptional returns are the most obvious example to me right now, but I believe some of the surprise here may also occur with users who naively (but understandably) expect that "GC" means they don't have to worry about lifetimes. Said more succinctly, a type named Rooted in a language like Rust where types often imply ownership might imply that it keeps a root as long as it exists. I know I certainly had that expectation at first. (The docs are very good, but "least surprise" still applies here!)

We provide the "escape hatch", as the docs describe, of ManuallyRooted, and this can certainly work well if the user has extreme discipline -- unfortunately, the requirement of a manual unrooting step means that it is very easy to get wrong, again as the docs describe well.

The middle ground, of a type that somehow unregisters itself on Drop by keeping enough of a handle on the engine's internal state to do so, is described as impractical: it would require an Arc<Mutex<...>> to track internal handle state/registration, as the docs say. This would create synchronization overhead, and potentially pessimize common-case GC operations too. Is there a better way?

Idea: Pay-as-you-Go "Arc'd liveness flags"

Ideally, I want something that:

Acts as an "owned root", i.e., keeps the referent alive as long as the root type exists and the store exists

Has a proper Drop such that dropping the root type ensures the referent is not leaked(*)

Imposes zero overhead on GC if these roots are not used, and imposes only small overhead proportional to the number of such roots if they are

Note the sneaky (*) above: I want to avoid leaks, not to eagerly detect when an unregistration occurs. In other words, let's permit deferring some action from the Drop to, say, the next time a GC runs.

Then to avoid synchronization overhead and contention, rather than a large monolithic registry under a single mutex, let's have some small shared state per owned root.

The idea is: keep a "liveness flag" in an Arc<AtomicBool>. In steady state, when live, there are two references to this Arc: from the owned root type, and from a owned_roots: Vec<(Option<VMGcRef>, Arc<AtomicBool>)> in the GC roots list. When the owned root drops, it sets its atomic bool to false, and drops its Arc reference. When the GC scans roots, it reads out the liveness flags, and removes those roots that the owner has dropped. (E.g. via a retain on the Vec.)[^1]

[^1]: Slight variant: the tuple above could instead be a single Arc<OwnedRootData> with liveness and the VMGcRef, and maybe that's cleaner; I haven't thought too much about how this would live alongside ManuallyRooted and whether it would want to share a GC-ref root array somewhere else or not...

Two realizations make this more efficient than the Arc<Mutex<whole root list>> approach: (i) we do have a mut borrow to the store when we create, so it's fine to have a normal Vec of roots registered -- only Drop is "remote" without the store; (ii) we have a separate bit of state per root, and it's just an atomic bool, which on common architectures (x86 and Apple Silicon's aarch64 at least) has atomic loads that are exactly as cheap as normal loads.

It's also fully safe Rust (Arc makes it so), and is pay-as-you-go: with no such roots existing, GC root scanning has one check of vec-is-empty and a never-taken branch; as close to zero-overhead as we can make it. There is no mutex contention anywhere, because the only "meeting point" is the Vec that's mutated under a &mut Store. In terms of memory allocation, it's certainly more expensive than a LIFO root (which is just an index into an array!), because there's the separate Arc allocation, but I suspect most uses of these roots are likely to be relatively high-level "entry points" or cases like exception returns where scopes don't map well to usage patterns; we can encourage use of LIFO scopes where possible.

Naming

I've called this an "owned root" and I'd gently suggest considering different names for our current Rooted, to more explicitly describe the difference, if we adopt this -- something like ScopedRooted vs. OwnedRooted?

Wasmtime GitHub notifications bot (Aug 18 2025 at 15:24):

alexcrichton commented on issue #11445:

Something like this seems reasonable to me yeah, but I'd defer to @fitzgen. We don't really have a great benchmark or application of sorts to weigh this against though at this time in the sense of evaluating "is the Arc overhead acceptable?" Without that we're kind of just shooting in the dark and predicting. For example Arc is relatively expensive with an allocation plus strong/weak counts so there's synchronization not just on the Drop-sets-the-flag but also Drop-the-Arc itself. We could perhaps get that down with a custom Arc of sorts but then there's also the question of the allocation. (and so on and so forth agbout exact possible vectors we could tweak things, hard to weigh in my mind at least...)

Wasmtime GitHub notifications bot (Aug 18 2025 at 15:25):

alexcrichton commented on issue #11445:

Oh, sorry, I'm still waking up. Renaming Rooted I also think would be reasonable.

Wasmtime GitHub notifications bot (Aug 18 2025 at 16:18):

cfallin commented on issue #11445:

That's all true; allocation of an Arc is noted above as the main cost here. I'd note that the synchronization on the Arc itself is also "root-local", i.e. of a different character than the synchronization overhead noted in docs to argue against registered roots.

High-level I suspect it'd be nice to provide some tool here, even with some cost when used (and zero cost when not), because right now there is no way in the public API to build something like this on top of it :-)

Wasmtime GitHub notifications bot (Aug 18 2025 at 17:34):

alexcrichton commented on issue #11445:

Good point yeah, and I'd agree with that. I'd prefer to avoid exposing any Arc-related details in the public API, and instead provide the constraint that "when you drop this root handle thing it'll eventually get GC'd if not rooted elsewhere". We could then refactor various details internally over time as necessary

Wasmtime GitHub notifications bot (Aug 18 2025 at 17:53):

cfallin commented on issue #11445:

Indeed, the above proposal doesn't expose any Arcs at all; the only public API is the proposed OwnedRoot. (I think you're agreeing with this but just for clarity to state it explicitly...)

Wasmtime GitHub notifications bot (Aug 18 2025 at 20:43):

fitzgen closed issue #11445:

Upon pondering the various rooting options in #11326, and reading through the discussion of API tradeoffs here, it occurs to me that there may be a possible addition to the API, which offers more natural "owned root" semantics in Rust but is still pay-as-you-go, which I'd like to sketch here.

Background and Motivation

Currently, we have two types, Rooted<T> and ManuallyRooted<T>. The former has a LIFO discipline, implicitly attaching to the deepest RootScope in a stack of scopes on the store, and the latter is completely manual. Notably, the latter does not have a Drop impl that actually unroots, because doing so would require holding a reference to the Store somehow -- implicitly with Arcs somehow, because actual borrows of the Store would preclude any other operations.

The LIFO discipline of Rooted works well when it works, i.e., when the user is aware of their scopes, but in this comment I outline a few of the realizations I've had around ergonomics when it combines with other features -- in particular, Rust's ? operator, which pushes users to naturally "propagate errors upward without explicit thought". The usual expectation is that the E type in the Result somehow owns its info. So a dynamic failure (a panic, no less!) when that E crosses over a scope is quite surprising. Exceptional returns are the most obvious example to me right now, but I believe some of the surprise here may also occur with users who naively (but understandably) expect that "GC" means they don't have to worry about lifetimes. Said more succinctly, a type named Rooted in a language like Rust where types often imply ownership might imply that it keeps a root as long as it exists. I know I certainly had that expectation at first. (The docs are very good, but "least surprise" still applies here!)

We provide the "escape hatch", as the docs describe, of ManuallyRooted, and this can certainly work well if the user has extreme discipline -- unfortunately, the requirement of a manual unrooting step means that it is very easy to get wrong, again as the docs describe well.

The middle ground, of a type that somehow unregisters itself on Drop by keeping enough of a handle on the engine's internal state to do so, is described as impractical: it would require an Arc<Mutex<...>> to track internal handle state/registration, as the docs say. This would create synchronization overhead, and potentially pessimize common-case GC operations too. Is there a better way?

Idea: Pay-as-you-Go "Arc'd liveness flags"

Ideally, I want something that:

Acts as an "owned root", i.e., keeps the referent alive as long as the root type exists and the store exists

Has a proper Drop such that dropping the root type ensures the referent is not leaked(*)

Imposes zero overhead on GC if these roots are not used, and imposes only small overhead proportional to the number of such roots if they are

Note the sneaky (*) above: I want to avoid leaks, not to eagerly detect when an unregistration occurs. In other words, let's permit deferring some action from the Drop to, say, the next time a GC runs.

Then to avoid synchronization overhead and contention, rather than a large monolithic registry under a single mutex, let's have some small shared state per owned root.

The idea is: keep a "liveness flag" in an Arc<AtomicBool>. In steady state, when live, there are two references to this Arc: from the owned root type, and from a owned_roots: Vec<(Option<VMGcRef>, Arc<AtomicBool>)> in the GC roots list. When the owned root drops, it sets its atomic bool to false, and drops its Arc reference. When the GC scans roots, it reads out the liveness flags, and removes those roots that the owner has dropped. (E.g. via a retain on the Vec.)[^1]

[^1]: Slight variant: the tuple above could instead be a single Arc<OwnedRootData> with liveness and the VMGcRef, and maybe that's cleaner; I haven't thought too much about how this would live alongside ManuallyRooted and whether it would want to share a GC-ref root array somewhere else or not...

Two realizations make this more efficient than the Arc<Mutex<whole root list>> approach: (i) we do have a mut borrow to the store when we create, so it's fine to have a normal Vec of roots registered -- only Drop is "remote" without the store; (ii) we have a separate bit of state per root, and it's just an atomic bool, which on common architectures (x86 and Apple Silicon's aarch64 at least) has atomic loads that are exactly as cheap as normal loads.

It's also fully safe Rust (Arc makes it so), and is pay-as-you-go: with no such roots existing, GC root scanning has one check of vec-is-empty and a never-taken branch; as close to zero-overhead as we can make it. There is no mutex contention anywhere, because the only "meeting point" is the Vec that's mutated under a &mut Store. In terms of memory allocation, it's certainly more expensive than a LIFO root (which is just an index into an array!), because there's the separate Arc allocation, but I suspect most uses of these roots are likely to be relatively high-level "entry points" or cases like exception returns where scopes don't map well to usage patterns; we can encourage use of LIFO scopes where possible.

Naming

I've called this an "owned root" and I'd gently suggest considering different names for our current Rooted, to more explicitly describe the difference, if we adopt this -- something like ScopedRooted vs. OwnedRooted?

Wasmtime GitHub notifications bot (Aug 18 2025 at 20:43):

fitzgen commented on issue #11445:

Yes, LIFO Rooteds are definitely a bit footgun-y with exceptions. I like the immediate solution for that (the StoreContextMut::throw(&mut self, &Rooted<ExnRef>) -> wasmtime::ThrownException API or whatever) discussed on the exceptions PR, so I will focus on the general rooting ideas here.

Also, I agree that this scheme would work to root GC objects and would ultimately provide users with a more-ergonomic API than what we have today.

The big challenge that springs to mind when considering this Arc<AtomicBool> scheme is management of the Vec<(Option<VMGcRef>, Arc<AtomicBool>)> inside the roots list. How big do we let that Vec get? Do we trigger GC if it reaches some limit?

The old VMGcRefActivationsTable had a bump region that did similar-ish things as proposed here, but in service of different ends, and managing its size/capacity proved to be difficult. We wanted to impose a size limit, but because cleanup happened on GC, this meant we needed to trigger full GCs even when it was unlikely we would have any productive work to do other than cleaning up that bump region. This was _very_ slow. We then adapted the limit to grow in an amortized fashion, and this sped things up some (still not as much as eventually removing the bump region completely did) but it also meant that memory overheads started getting bigger and bigger. We ideally want the pooling allocator and its config to control and pre-allocate ~all major allocations and things that are O(wasm module) or O(wasm runtime) or anything like that. If I say I want to give 128MiB to each GC heap in my pooling allocator, I don't want extra bookkeeping allocations outside of that, as much as possible.

The size of this Vec would be more-directly correlated with the host's behavior than with the guest's, but I could imagine ways to indirectly control it from the guest, like throwing and catching in a loop.

This is not a solution to the previous challenge, but miiiiight alleviate it a little: we could replace the Vec with HashMap<VMGcRef, Arc<AtomicBool>> so that we wouldn't ever create multiple Arc<AtomicBool> roots for the same object. In turn, it would make root-creation that much slower.

We could also have a free list of available Arc<bool>s to help lower the cost of (some of) the Arcs' mallocs. Although the malloc implementation's size classes (that it presumably has) would probably already do most of that.

Now that I think about it some more, I don't think we even need the AtomicBool and can just have Arc<()> and use the Arc's reference count to keep track of whether the OwnedRoot or whatever is still alive. We could either hold a weak reference in our Vec/HashMap or else we could just subtract one from the reference count to get the "real" reference count.

Also, this is starting to look very much like what we do with registered types, except with waiting for GC to clean up the "dead" roots rather than expecting callers of decref to do it when the ref count reaches zero. (With the type registry, we put types behind an Arc so that they can be accessed and their ref counts incremented and decremented without taking the registry's RwLock, and the lock is only taken once the teh ref count reaches zero for deregistration.)

Oh the other challenge that springs to mind now is updating rooted references after a moving GC. I think we could handle this with Arc<AtomicU32> where the AtomicU32 is the raw VMGcRef. We could maybe even have a helper that takes a &mut StoreOpaque and then does a relaxed load because the only thing that could update that reference is

Wasmtime GitHub notifications bot (Aug 18 2025 at 20:43):

fitzgen reopened issue #11445:

Upon pondering the various rooting options in #11326, and reading through the discussion of API tradeoffs here, it occurs to me that there may be a possible addition to the API, which offers more natural "owned root" semantics in Rust but is still pay-as-you-go, which I'd like to sketch here.

Background and Motivation

Currently, we have two types, Rooted<T> and ManuallyRooted<T>. The former has a LIFO discipline, implicitly attaching to the deepest RootScope in a stack of scopes on the store, and the latter is completely manual. Notably, the latter does not have a Drop impl that actually unroots, because doing so would require holding a reference to the Store somehow -- implicitly with Arcs somehow, because actual borrows of the Store would preclude any other operations.

The LIFO discipline of Rooted works well when it works, i.e., when the user is aware of their scopes, but in this comment I outline a few of the realizations I've had around ergonomics when it combines with other features -- in particular, Rust's ? operator, which pushes users to naturally "propagate errors upward without explicit thought". The usual expectation is that the E type in the Result somehow owns its info. So a dynamic failure (a panic, no less!) when that E crosses over a scope is quite surprising. Exceptional returns are the most obvious example to me right now, but I believe some of the surprise here may also occur with users who naively (but understandably) expect that "GC" means they don't have to worry about lifetimes. Said more succinctly, a type named Rooted in a language like Rust where types often imply ownership might imply that it keeps a root as long as it exists. I know I certainly had that expectation at first. (The docs are very good, but "least surprise" still applies here!)

We provide the "escape hatch", as the docs describe, of ManuallyRooted, and this can certainly work well if the user has extreme discipline -- unfortunately, the requirement of a manual unrooting step means that it is very easy to get wrong, again as the docs describe well.

The middle ground, of a type that somehow unregisters itself on Drop by keeping enough of a handle on the engine's internal state to do so, is described as impractical: it would require an Arc<Mutex<...>> to track internal handle state/registration, as the docs say. This would create synchronization overhead, and potentially pessimize common-case GC operations too. Is there a better way?

Idea: Pay-as-you-Go "Arc'd liveness flags"

Ideally, I want something that:

Acts as an "owned root", i.e., keeps the referent alive as long as the root type exists and the store exists

Has a proper Drop such that dropping the root type ensures the referent is not leaked(*)

Imposes zero overhead on GC if these roots are not used, and imposes only small overhead proportional to the number of such roots if they are

Note the sneaky (*) above: I want to avoid leaks, not to eagerly detect when an unregistration occurs. In other words, let's permit deferring some action from the Drop to, say, the next time a GC runs.

Then to avoid synchronization overhead and contention, rather than a large monolithic registry under a single mutex, let's have some small shared state per owned root.

The idea is: keep a "liveness flag" in an Arc<AtomicBool>. In steady state, when live, there are two references to this Arc: from the owned root type, and from a owned_roots: Vec<(Option<VMGcRef>, Arc<AtomicBool>)> in the GC roots list. When the owned root drops, it sets its atomic bool to false, and drops its Arc reference. When the GC scans roots, it reads out the liveness flags, and removes those roots that the owner has dropped. (E.g. via a retain on the Vec.)[^1]

[^1]: Slight variant: the tuple above could instead be a single Arc<OwnedRootData> with liveness and the VMGcRef, and maybe that's cleaner; I haven't thought too much about how this would live alongside ManuallyRooted and whether it would want to share a GC-ref root array somewhere else or not...

Two realizations make this more efficient than the Arc<Mutex<whole root list>> approach: (i) we do have a mut borrow to the store when we create, so it's fine to have a normal Vec of roots registered -- only Drop is "remote" without the store; (ii) we have a separate bit of state per root, and it's just an atomic bool, which on common architectures (x86 and Apple Silicon's aarch64 at least) has atomic loads that are exactly as cheap as normal loads.

It's also fully safe Rust (Arc makes it so), and is pay-as-you-go: with no such roots existing, GC root scanning has one check of vec-is-empty and a never-taken branch; as close to zero-overhead as we can make it. There is no mutex contention anywhere, because the only "meeting point" is the Vec that's mutated under a &mut Store. In terms of memory allocation, it's certainly more expensive than a LIFO root (which is just an index into an array!), because there's the separate Arc allocation, but I suspect most uses of these roots are likely to be relatively high-level "entry points" or cases like exception returns where scopes don't map well to usage patterns; we can encourage use of LIFO scopes where possible.

Naming

I've called this an "owned root" and I'd gently suggest considering different names for our current Rooted, to more explicitly describe the difference, if we adopt this -- something like ScopedRooted vs. OwnedRooted?

Wasmtime GitHub notifications bot (Aug 18 2025 at 20:43):

fitzgen commented on issue #11445:

(Sorry mis-click)

Wasmtime GitHub notifications bot (Aug 18 2025 at 20:45):

fitzgen edited a comment on issue #11445:

Yes, LIFO Rooteds are definitely a bit footgun-y with exceptions. I like the immediate solution for that (the StoreContextMut::throw(&mut self, &Rooted<ExnRef>) -> wasmtime::ThrownException API or whatever) discussed on the exceptions PR, so I will focus on the general rooting ideas here.

Also, I agree that this scheme would work to root GC objects and would ultimately provide users with a more-ergonomic API than what we have today.

The big challenge that springs to mind when considering this Arc<AtomicBool> scheme is management of the Vec<(Option<VMGcRef>, Arc<AtomicBool>)> inside the roots list. How big do we let that Vec get? Do we trigger GC if it reaches some limit?

The old VMGcRefActivationsTable had a bump region that did similar-ish things as proposed here, but in service of different ends, and managing its size/capacity proved to be difficult. We wanted to impose a size limit, but because cleanup happened on GC, this meant we needed to trigger full GCs even when it was unlikely we would have any productive work to do other than cleaning up that bump region. This was _very_ slow. We then adapted the limit to grow in an amortized fashion, and this sped things up some (still not as much as eventually removing the bump region completely did) but it also meant that memory overheads started getting bigger and bigger. We ideally want the pooling allocator and its config to control and pre-allocate ~all major allocations and things that are O(wasm module) or O(wasm runtime) or anything like that. If I say I want to give 128MiB to each GC heap in my pooling allocator, I don't want extra bookkeeping allocations outside of that, as much as possible.

The size of this Vec would be more-directly correlated with the host's behavior than with the guest's, but I could imagine ways to indirectly control it from the guest, like throwing and catching in a loop.

This is not a solution to the previous challenge, but miiiiight alleviate it a little: we could replace the Vec with HashMap<VMGcRef, Arc<AtomicBool>> so that we wouldn't ever create multiple Arc<AtomicBool> roots for the same object. In turn, it would make root-creation that much slower.

We could also have a free list of available Arc<bool>s to help lower the cost of (some of) the Arcs' mallocs. Although the malloc implementation's size classes (that it presumably has) would probably already do most of that.

Now that I think about it some more, I don't think we even need the AtomicBool and can just have Arc<()> and use the Arc's reference count to keep track of whether the OwnedRoot or whatever is still alive. We could either hold a weak reference in our Vec/HashMap or else we could just subtract one from the reference count to get the "real" reference count.

Also, this is starting to look very much like what we do with registered types, except with waiting for GC to clean up the "dead" roots rather than expecting callers of decref to do it when the ref count reaches zero. (With the type registry, we put types behind an Arc so that they can be accessed and their ref counts incremented and decremented without taking the registry's RwLock, and the lock is only taken once the teh ref count reaches zero for deregistration.)

Oh the other challenge that springs to mind now is updating rooted references after a moving GC. I think we could handle this with Arc<AtomicU32> where the AtomicU32 is the raw VMGcRef. We could maybe even have a helper that takes a &mut StoreOpaque and then does a relaxed load because the only thing that could update that reference is a GC which takes a mutable store, so if you have a mutable store (and it is the correct store; probably we would need a StoreId soemwhere too) then you know the reference can't move out from under you for the time being.

Wasmtime GitHub notifications bot (Aug 18 2025 at 20:46):

fitzgen edited a comment on issue #11445:

Yes, LIFO Rooteds are definitely a bit footgun-y with exceptions. I like the immediate solution for that (the StoreContextMut::throw(&mut self, &Rooted<ExnRef>) -> wasmtime::ThrownException API or whatever) discussed on the exceptions PR, so I will focus on the general rooting ideas here.

Also, I agree that this scheme would work to root GC objects and would ultimately provide users with a more-ergonomic API than what we have today.

The big challenge that springs to mind when considering this Arc<AtomicBool> scheme is management of the Vec<(Option<VMGcRef>, Arc<AtomicBool>)> inside the roots list. How big do we let that Vec get? Do we trigger GC if it reaches some limit?

The old VMGcRefActivationsTable had a bump region that did similar-ish things as proposed here, but in service of different ends, and managing its size/capacity proved to be difficult. We wanted to impose a size limit, but because cleanup happened on GC, this meant we needed to trigger full GCs even when it was unlikely we would have any productive work to do other than cleaning up that bump region. This was _very_ slow. We then adapted the limit to grow in an amortized fashion, and this sped things up some (still not as much as eventually removing the bump region completely did) but it also meant that memory overheads started getting bigger and bigger. We ideally want the pooling allocator and its config to control and pre-allocate ~all major allocations and things that are O(wasm module) or O(wasm runtime) or anything like that. If I say I want to give 128MiB to each GC heap in my pooling allocator, I don't want extra bookkeeping allocations outside of that, as much as possible.

The size of this Vec would be more-directly correlated with the host's behavior than with the guest's, but I could imagine ways to indirectly control it from the guest, like throwing and catching in a loop.

This is not a solution to the previous challenge, but miiiiight alleviate it a little: we could replace the Vec with HashMap<VMGcRef, Arc<AtomicBool>> so that we wouldn't ever create multiple Arc<AtomicBool> roots for the same object. In turn, it would make root-creation that much slower.

We could also have a free list of available Arc<bool>s to help lower the cost of (some of) the Arcs' mallocs. Although the malloc implementation's size classes (that it presumably has) would probably already do most of that.

Now that I think about it some more, I don't think we even need the AtomicBool and can just have Arc<()> and use the Arc's reference count to keep track of whether the OwnedRoot or whatever is still alive. We could either hold a weak reference in our Vec/HashMap or else we could just subtract one from the reference count to get the "real" reference count.

Also, this is starting to look very much like what we do with registered types, except with waiting for GC to clean up the "dead" roots rather than expecting callers of decref to do it when the ref count reaches zero. (With the type registry, we put types behind an Arc so that they can be accessed and their ref counts incremented and decremented without taking the registry's RwLock, and the lock is only taken once the teh ref count reaches zero for deregistration. Taking the lock there is roughly equivalent to accessing the store here. Something to mull over.)

Oh the other challenge that springs to mind now is updating rooted references after a moving GC. I think we could handle this with Arc<AtomicU32> where the AtomicU32 is the raw VMGcRef. We could maybe even have a helper that takes a &mut StoreOpaque and then does a relaxed load because the only thing that could update that reference is a GC which takes a mutable store, so if you have a mutable store (and it is the correct store; probably we would need a StoreId soemwhere too) then you know the reference can't move out from under you for the time being.

Wasmtime GitHub notifications bot (Aug 18 2025 at 20:55):

cfallin commented on issue #11445:

It's possible I'm misunderstanding, or the thought is a follow-on from your mutations, but

Oh the other challenge that springs to mind now is updating rooted references after a moving GC. I think we could handle this with Arc<AtomicU32> where the AtomicU32 is the raw VMGcRef

isn't an issue in the scheme as proposed since the VMGcRefs are still in the store / GC roots list (indirected as with all other roots) -- only the liveness flag is accessible when one doesn't have a &mut Store.

Growth of the roots list is certainly a question; I suppose one could have a list trim step that is independent of a GC, essentially doing a pass with Vec::retain and unrooting the dead roots but doing no other work (and this would be in a context where we're creating a new root, so we have a &mut Store already).

That does, however, make me realize something I said above isn't true: one can actually build this abstraction on top of ManuallyRooted today; in essence one could have a StoreWithNicerRoots<T> that internally owns a Store<T> and carries a list of ManuallyRooted and the Arc<AtomicBool>; at any point it scans the root-list to clean it up, it has a &mut Store and can do a ManuallyRooted::unroot. So then it's a question of whether we want to absorb such a library into Store or not...

Wasmtime GitHub notifications bot (Aug 18 2025 at 21:11):

fitzgen commented on issue #11445:

isn't an issue in the scheme as proposed since the VMGcRefs are still in the store / GC roots list (indirected as with all other roots) -- only the liveness flag is accessible when one doesn't have a &mut Store.

Ah okay I think I misunderstood.

Given an OwnedRoot and a store, how do Wasmtime internals get the raw VMGcRef? That is, what is inside the OwnedRoot other than an Arc handle that lets us get the VMGcRef in O(1) time? I'm assuming we don't want to do an O(n) linear scan through the Vec, comparing each entry's Arc pointer to the OwnedRoot's Arc pointer. But I also don't know what it would be:

It can't be an index into the Vec unless we never shrink the Vec's size[^0] which means we would pretty much have to do the full free list thing (e.g. by using wasmtime_slab::Slab) but also means that peak memory overheads would never go away once reached (unless we want to get into the game of having the free list try to hand out low indices when possible, keeping high indices vacant, and shrinking the Vec in an amortized way when we can, but this is getting into full-malloc-implementation territory which seems like more complexity than we probably want here)

It also can't be a VMGcRef unless we do the Arc<AtomicU32> thing do deal with moving collectors.

[^0]: Or if we do shrink the Vec's size but use indices inside OwnedRoot, then we need a way to fixup all the existing OwnedRoot's indices after we shrink the Vec's size. Basically the same problem as a moving GC.

Wasmtime GitHub notifications bot (Aug 18 2025 at 21:13):

fitzgen edited a comment on issue #11445:

isn't an issue in the scheme as proposed since the VMGcRefs are still in the store / GC roots list (indirected as with all other roots) -- only the liveness flag is accessible when one doesn't have a &mut Store.

Ah okay I think I misunderstood.

Given an OwnedRoot and a store, how do Wasmtime internals get the raw VMGcRef? That is, what is inside the OwnedRoot other than an Arc<AtomicBool> handle that lets us get the VMGcRef in O(1) time? I'm assuming we don't want to do an O(n) linear scan through the Vec, comparing each entry's Arc pointer to the OwnedRoot's Arc pointer. But I also don't know what it would be:

It can't be an index into the Vec unless we never shrink the Vec's size[^0] which means we would pretty much have to do the full free list thing (e.g. by using wasmtime_slab::Slab) but also means that peak memory overheads would never go away once reached (unless we want to get into the game of having the free list try to hand out low indices when possible, keeping high indices vacant, and shrinking the Vec in an amortized way when we can, but this is getting into full-malloc-implementation territory which seems like more complexity than we probably want here)

It also can't be a VMGcRef unless we do the Arc<AtomicU32> thing do deal with moving collectors.

[^0]: Or if we do shrink the Vec's size but use indices inside OwnedRoot, then we need a way to fixup all the existing OwnedRoot's indices after we shrink the Vec's size. Basically the same problem as a moving GC.

Wasmtime GitHub notifications bot (Aug 18 2025 at 21:13):

fitzgen edited a comment on issue #11445:

isn't an issue in the scheme as proposed since the VMGcRefs are still in the store / GC roots list (indirected as with all other roots) -- only the liveness flag is accessible when one doesn't have a &mut Store.

Ah okay I think I misunderstood.

Given an OwnedRoot and a store, how do Wasmtime internals get the raw VMGcRef? That is, what is inside the OwnedRoot other than an Arc<AtomicBool> handle that lets us get the VMGcRef in O(1) time? I'm assuming we don't want to do an O(n) linear scan through the Vec, comparing each entry's Arc pointer to the OwnedRoot's Arc pointer. But I also don't know what it would be:

It can't be an index into the Vec unless we never shrink the Vec's size[^0] which means we would pretty much have to do the full free list thing (e.g. by using wasmtime_slab::Slab) but also means that peak memory overheads would never go away once reached even after GC (unless we want to get into the game of having the free list try to hand out low indices when possible, keeping high indices vacant, and shrinking the Vec in an amortized way when we can, but this is getting into full-malloc-implementation territory which seems like more complexity than we probably want here)

It also can't be a VMGcRef unless we do the Arc<AtomicU32> thing do deal with moving collectors.

[^0]: Or if we do shrink the Vec's size but use indices inside OwnedRoot, then we need a way to fixup all the existing OwnedRoot's indices after we shrink the Vec's size. Basically the same problem as a moving GC.

Wasmtime GitHub notifications bot (Aug 18 2025 at 21:28):

cfallin commented on issue #11445:

It would be the secret third option: a GcRootIndex, just like today's ManuallyRooted. As long as the OwnedRooted exists, it will not set its liveness flag to false, so the index remains valid (same argument as ManuallyRooted, which explicitly deallocs its index when unroot() is called). I suppose the part I missed in the description is that the Vec really holds the Arc along with the associated GcRootIndex (or the external library holds a ManuallyRooted), not the raw VMGcRef.

Wasmtime GitHub notifications bot (Aug 19 2025 at 20:21):

fitzgen commented on issue #11445:

It would be the secret third option: a GcRootIndex, just like today's ManuallyRooted. As long as the OwnedRooted exists, it will not set its liveness flag to false, so the index remains valid (same argument as ManuallyRooted, which explicitly deallocs its index when unroot() is called). I suppose the part I missed in the description is that the Vec really holds the Arc along with the associated GcRootIndex (or the external library holds a ManuallyRooted), not the raw VMGcRef.

Ah okay, I've got it now, makes sense.

I suppose one could have a list trim step that is independent of a GC, essentially doing a pass with Vec::retain and unrooting the dead roots but doing no other work (and this would be in a context where we're creating a new root, so we have a &mut Store already).

This makes sense. Will be slightly tricky tuning this well and making sure it is amortized and all that.

Still slightly concerned about sizes but I think we have figured out enough of the details that it is worth putting up a PR for, where we can have more-concrete discussion.

Wasmtime GitHub notifications bot (Aug 19 2025 at 20:23):

fitzgen commented on issue #11445:

And FWIW, I don't think this should be an external library thing. If it works well and provides better ergonomics than what we have, then we should just add it to Wasmtime and maybe replace our existing APIs.

Wasmtime GitHub notifications bot (Aug 26 2025 at 19:34):

cfallin closed issue #11445:

Upon pondering the various rooting options in #11326, and reading through the discussion of API tradeoffs here, it occurs to me that there may be a possible addition to the API, which offers more natural "owned root" semantics in Rust but is still pay-as-you-go, which I'd like to sketch here.

Background and Motivation

Currently, we have two types, Rooted<T> and ManuallyRooted<T>. The former has a LIFO discipline, implicitly attaching to the deepest RootScope in a stack of scopes on the store, and the latter is completely manual. Notably, the latter does not have a Drop impl that actually unroots, because doing so would require holding a reference to the Store somehow -- implicitly with Arcs somehow, because actual borrows of the Store would preclude any other operations.

The LIFO discipline of Rooted works well when it works, i.e., when the user is aware of their scopes, but in this comment I outline a few of the realizations I've had around ergonomics when it combines with other features -- in particular, Rust's ? operator, which pushes users to naturally "propagate errors upward without explicit thought". The usual expectation is that the E type in the Result somehow owns its info. So a dynamic failure (a panic, no less!) when that E crosses over a scope is quite surprising. Exceptional returns are the most obvious example to me right now, but I believe some of the surprise here may also occur with users who naively (but understandably) expect that "GC" means they don't have to worry about lifetimes. Said more succinctly, a type named Rooted in a language like Rust where types often imply ownership might imply that it keeps a root as long as it exists. I know I certainly had that expectation at first. (The docs are very good, but "least surprise" still applies here!)

We provide the "escape hatch", as the docs describe, of ManuallyRooted, and this can certainly work well if the user has extreme discipline -- unfortunately, the requirement of a manual unrooting step means that it is very easy to get wrong, again as the docs describe well.

The middle ground, of a type that somehow unregisters itself on Drop by keeping enough of a handle on the engine's internal state to do so, is described as impractical: it would require an Arc<Mutex<...>> to track internal handle state/registration, as the docs say. This would create synchronization overhead, and potentially pessimize common-case GC operations too. Is there a better way?

Idea: Pay-as-you-Go "Arc'd liveness flags"

Ideally, I want something that:

Acts as an "owned root", i.e., keeps the referent alive as long as the root type exists and the store exists

Has a proper Drop such that dropping the root type ensures the referent is not leaked(*)

Imposes zero overhead on GC if these roots are not used, and imposes only small overhead proportional to the number of such roots if they are

Note the sneaky (*) above: I want to avoid leaks, not to eagerly detect when an unregistration occurs. In other words, let's permit deferring some action from the Drop to, say, the next time a GC runs.

Then to avoid synchronization overhead and contention, rather than a large monolithic registry under a single mutex, let's have some small shared state per owned root.

The idea is: keep a "liveness flag" in an Arc<AtomicBool>. In steady state, when live, there are two references to this Arc: from the owned root type, and from a owned_roots: Vec<(Option<VMGcRef>, Arc<AtomicBool>)> in the GC roots list. When the owned root drops, it sets its atomic bool to false, and drops its Arc reference. When the GC scans roots, it reads out the liveness flags, and removes those roots that the owner has dropped. (E.g. via a retain on the Vec.)[^1]

[^1]: Slight variant: the tuple above could instead be a single Arc<OwnedRootData> with liveness and the VMGcRef, and maybe that's cleaner; I haven't thought too much about how this would live alongside ManuallyRooted and whether it would want to share a GC-ref root array somewhere else or not...

Two realizations make this more efficient than the Arc<Mutex<whole root list>> approach: (i) we do have a mut borrow to the store when we create, so it's fine to have a normal Vec of roots registered -- only Drop is "remote" without the store; (ii) we have a separate bit of state per root, and it's just an atomic bool, which on common architectures (x86 and Apple Silicon's aarch64 at least) has atomic loads that are exactly as cheap as normal loads.

It's also fully safe Rust (Arc makes it so), and is pay-as-you-go: with no such roots existing, GC root scanning has one check of vec-is-empty and a never-taken branch; as close to zero-overhead as we can make it. There is no mutex contention anywhere, because the only "meeting point" is the Vec that's mutated under a &mut Store. In terms of memory allocation, it's certainly more expensive than a LIFO root (which is just an index into an array!), because there's the separate Arc allocation, but I suspect most uses of these roots are likely to be relatively high-level "entry points" or cases like exception returns where scopes don't map well to usage patterns; we can encourage use of LIFO scopes where possible.

Naming

I've called this an "owned root" and I'd gently suggest considering different names for our current Rooted, to more explicitly describe the difference, if we adopt this -- something like ScopedRooted vs. OwnedRooted?

Last updated: Feb 24 2026 at 05:28 UTC