cfallin opened issue #11445:
Upon pondering the various rooting options in #11326, and reading through the discussion of API tradeoffs here, it occurs to me that there may be a possible addition to the API, which offers more natural "owned root" semantics in Rust but is still pay-as-you-go, which I'd like to sketch here.
Background and Motivation
Currently, we have two types,
Rooted<T>andManuallyRooted<T>. The former has a LIFO discipline, implicitly attaching to the deepestRootScopein a stack of scopes on the store, and the latter is completely manual. Notably, the latter does not have aDropimpl that actually unroots, because doing so would require holding a reference to theStoresomehow -- implicitly withArcs somehow, because actual borrows of theStorewould preclude any other operations.The LIFO discipline of
Rootedworks well when it works, i.e., when the user is aware of their scopes, but in this comment I outline a few of the realizations I've had around ergonomics when it combines with other features -- in particular, Rust's?operator, which pushes users to naturally "propagate errors upward without explicit thought". The usual expectation is that theEtype in theResultsomehow owns its info. So a dynamic failure (a panic, no less!) when thatEcrosses over a scope is quite surprising. Exceptional returns are the most obvious example to me right now, but I believe some of the surprise here may also occur wit, h users who naively (but understandably) expect that "GC" means they don't have to worry about lifetimes. Said more succinctly, a type namedRootedin a language like Rust where types often imply ownership might imply that it keeps a root as long as it exists. I know I certainly had that expectation at first. (The docs are very good, but "least surprise" still applies here!)We provide the "escape hatch", as the docs describe, of
ManuallyRooted, and this can certainly work well if the user has extreme discipline -- unfortunately, the requirement of a manual unrooting step means that it is very easy to get wrong, again as the docs describe well.The middle ground, of a type that somehow unregisters itself on
Dropby keeping enough of a handle on the engine's internal state to do so, is described as impractical: it would require anArc<Mutex<...>>to track internal handle state/registration, as the docs say. This would create synchronization overhead, and potentially pessimize common-case GC operations too. Is there a better way?Idea: Pay-as-you-Go "Arc'd liveness flags"
Ideally, I want something that:
- Acts as an "owned root", i.e., keeps the referent alive as long as the root type exists and the store exists
- Has a proper
Dropsuch that dropping the root type ensures the referent is not leaked(*)- Imposes zero overhead on GC if these roots are not used, and imposes only small overhead proportional to the number of such roots if they are
Note the sneaky (*) above: I want to avoid leaks, not to eagerly detect when an unregistration occurs. In other words, let's permit deferring some action from the
Dropto, say, the next time a GC runs.Then to avoid synchronization overhead and contention, rather than a large monolithic registry under a single mutex, let's have some small shared state per owned root.
The idea is: keep a "liveness flag" in an
Arc<AtomicBool>. In steady state, when live, there are two references to thisArc: from the owned root type, and from aowned_roots: Vec<(Option<VMGcRef>, Arc<AtomicBool>)>in the GC roots list. When the owned root drops, it sets its atomic bool to false, and drops itsArcreference. When the GC scans roots, it reads out the liveness flags, and removes those roots that the owner has dropped. (E.g. via aretainon theVec.)[^1][^1]: Slight variant: the tuple above could instead be a single
Arc<OwnedRootData>with liveness and theVMGcRef, and maybe that's cleaner; I haven't thought too much about how this would live alongsideManuallyRootedand whether it would want to share a GC-ref root array somewhere else or not...Two realizations make this more efficient than the
Arc<Mutex<whole root list>>approach: (i) we do have a mut borrow to the store when we create, so it's fine to have a normalVecof roots registered -- onlyDropis "remote" without the store; (ii) we have a separate bit of state per root, and it's just an atomic bool, which on common architectures (x86 and Apple Silicon's aarch64 at least) has atomic loads that are exactly as cheap as normal loads.It's also fully safe Rust (
Arcmakes it so), and is pay-as-you-go: with no such roots existing, GC root scanning has one check of vec-is-empty and a never-taken branch; as close to zero-overhead as we can make it. There is no mutex contention anywhere, because the only "meeting point" is theVecthat's mutated under a&mut Store. In terms of memory allocation, it's certainly more expensive than a LIFO root (which is just an index into an array!), because there's the separateArcallocation, but I suspect most uses of these roots are likely to be relatively high-level "entry points" or cases like exception returns where scopes don't map well to usage patterns; we can encourage use of LIFO scopes where possible.Naming
I've called this an "owned root" and I'd gently suggest considering different names for our current
Rooted, to more explicitly describe the difference, if we adopt this -- something likeScopedRootedvs.OwnedRooted?
cfallin edited issue #11445:
Upon pondering the various rooting options in #11326, and reading through the discussion of API tradeoffs here, it occurs to me that there may be a possible addition to the API, which offers more natural "owned root" semantics in Rust but is still pay-as-you-go, which I'd like to sketch here.
Background and Motivation
Currently, we have two types,
Rooted<T>andManuallyRooted<T>. The former has a LIFO discipline, implicitly attaching to the deepestRootScopein a stack of scopes on the store, and the latter is completely manual. Notably, the latter does not have aDropimpl that actually unroots, because doing so would require holding a reference to theStoresomehow -- implicitly withArcs somehow, because actual borrows of theStorewould preclude any other operations.The LIFO discipline of
Rootedworks well when it works, i.e., when the user is aware of their scopes, but in this comment I outline a few of the realizations I've had around ergonomics when it combines with other features -- in particular, Rust's?operator, which pushes users to naturally "propagate errors upward without explicit thought". The usual expectation is that theEtype in theResultsomehow owns its info. So a dynamic failure (a panic, no less!) when thatEcrosses over a scope is quite surprising. Exceptional returns are the most obvious example to me right now, but I believe some of the surprise here may also occur wit, h users who naively (but understandably) expect that "GC" means they don't have to worry about lifetimes. Said more succinctly, a type namedRootedin a language like Rust where types often imply ownership might imply that it keeps a root as long as it exists. I know I certainly had that expectation at first. (The docs are very good, but "least surprise" still applies here!)We provide the "escape hatch", as the docs describe, of
ManuallyRooted, and this can certainly work well if the user has extreme discipline -- unfortunately, the requirement of a manual unrooting step means that it is very easy to get wrong, again as the docs describe well.The middle ground, of a type that somehow unregisters itself on
Dropby keeping enough of a handle on the engine's internal state to do so, is described as impractical: it would require anArc<Mutex<...>>to track internal handle state/registration, as the docs say. This would create synchronization overhead, and potentially pessimize common-case GC operations too. Is there a better way?Idea: Pay-as-you-Go "Arc'd liveness flags"
Ideally, I want something that:
- Acts as an "owned root", i.e., keeps the referent alive as long as the root type exists and the store exists
- Has a proper
Dropsuch that dropping the root type ensures the referent is not leaked(*)- Imposes zero overhead on GC if these roots are not used, and imposes only small overhead proportional to the number of such roots if they are
Note the sneaky (*) above: I want to avoid leaks, not to eagerly detect when an unregistration occurs. In other words, let's permit deferring some action from the
Dropto, say, the next time a GC runs.Then to avoid synchronization overhead and contention, rather than a large monolithic registry under a single mutex, let's have some small shared state per owned root.
The idea is: keep a "liveness flag" in an
Arc<AtomicBool>. In steady state, when live, there are two references to thisArc: from the owned root type, and from aowned_roots: Vec<(Option<VMGcRef>, Arc<AtomicBool>)>in the GC roots list. When the owned root drops, it sets its atomic bool to false, and drops itsArcreference. When the GC scans roots, it reads out the liveness flags, and removes those roots that the owner has dropped. (E.g. via aretainon theVec.)[^1][^1]: Slight variant: the tuple above could instead be a single
Arc<OwnedRootData>with liveness and theVMGcRef, and maybe that's cleaner; I haven't thought too much about how this would live alongsideManuallyRootedand whether it would want to share a GC-ref root array somewhere else or not...Two realizations make this more efficient than the
Arc<Mutex<whole root list>>approach: (i) we do have a mut borrow to the store when we create, so it's fine to have a normalVecof roots registered -- onlyDropis "remote" without the store; (ii) we have a separate bit of state per root, and it's just an atomic bool, which on common architectures (x86 and Apple Silicon's aarch64 at least) has atomic loads that are exactly as cheap as normal loads.It's also fully safe Rust (
Arcmakes it so), and is pay-as-you-go: with no such roots existing, GC root scanning has one check of vec-is-empty and a never-taken branch; as close to zero-overhead as we can make it. There is no mutex contention anywhere, because the only "meeting point" is theVecthat's mutated under a&mut Store. In terms of memory allocation, it's certainly more expensive than a LIFO root (which is just an index into an array!), because there's the separateArcallocation, but I suspect most uses of these roots are likely to be relatively high-level "entry points" or cases like exception returns where scopes don't map well to usage patterns; we can encourage use of LIFO scopes where possible.Naming
I've called this an "owned root" and I'd gently suggest considering different names for our current
Rooted, to more explicitly describe the difference, if we adopt this -- something likeScopedRootedvs.OwnedRooted?
cfallin added the wasm-proposal:gc label to Issue #11445.
cfallin added the enhancement label to Issue #11445.
cfallin added the wasmtime:api label to Issue #11445.
cfallin commented on issue #11445:
(cc @fitzgen @alexcrichton for thoughts -- this isn't on the critical path for exceptions in any way, but is an orthogonal thought that seemed worth writing up)
cfallin edited issue #11445:
Upon pondering the various rooting options in #11326, and reading through the discussion of API tradeoffs here, it occurs to me that there may be a possible addition to the API, which offers more natural "owned root" semantics in Rust but is still pay-as-you-go, which I'd like to sketch here.
Background and Motivation
Currently, we have two types,
Rooted<T>andManuallyRooted<T>. The former has a LIFO discipline, implicitly attaching to the deepestRootScopein a stack of scopes on the store, and the latter is completely manual. Notably, the latter does not have aDropimpl that actually unroots, because doing so would require holding a reference to theStoresomehow -- implicitly withArcs somehow, because actual borrows of theStorewould preclude any other operations.The LIFO discipline of
Rootedworks well when it works, i.e., when the user is aware of their scopes, but in this comment I outline a few of the realizations I've had around ergonomics when it combines with other features -- in particular, Rust's?operator, which pushes users to naturally "propagate errors upward without explicit thought". The usual expectation is that theEtype in theResultsomehow owns its info. So a dynamic failure (a panic, no less!) when thatEcrosses over a scope is quite surprising. Exceptional returns are the most obvious example to me right now, but I believe some of the surprise here may also occur with users who naively (but understandably) expect that "GC" means they don't have to worry about lifetimes. Said more succinctly, a type namedRootedin a language like Rust where types often imply ownership might imply that it keeps a root as long as it exists. I know I certainly had that expectation at first. (The docs are very good, but "least surprise" still applies here!)We provide the "escape hatch", as the docs describe, of
ManuallyRooted, and this can certainly work well if the user has extreme discipline -- unfortunately, the requirement of a manual unrooting step means that it is very easy to get wrong, again as the docs describe well.The middle ground, of a type that somehow unregisters itself on
Dropby keeping enough of a handle on the engine's internal state to do so, is described as impractical: it would require anArc<Mutex<...>>to track internal handle state/registration, as the docs say. This would create synchronization overhead, and potentially pessimize common-case GC operations too. Is there a better way?Idea: Pay-as-you-Go "Arc'd liveness flags"
Ideally, I want something that:
- Acts as an "owned root", i.e., keeps the referent alive as long as the root type exists and the store exists
- Has a proper
Dropsuch that dropping the root type ensures the referent is not leaked(*)- Imposes zero overhead on GC if these roots are not used, and imposes only small overhead proportional to the number of such roots if they are
Note the sneaky (*) above: I want to avoid leaks, not to eagerly detect when an unregistration occurs. In other words, let's permit deferring some action from the
Dropto, say, the next time a GC runs.Then to avoid synchronization overhead and contention, rather than a large monolithic registry under a single mutex, let's have some small shared state per owned root.
The idea is: keep a "liveness flag" in an
Arc<AtomicBool>. In steady state, when live, there are two references to thisArc: from the owned root type, and from aowned_roots: Vec<(Option<VMGcRef>, Arc<AtomicBool>)>in the GC roots list. When the owned root drops, it sets its atomic bool to false, and drops itsArcreference. When the GC scans roots, it reads out the liveness flags, and removes those roots that the owner has dropped. (E.g. via aretainon theVec.)[^1][^1]: Slight variant: the tuple above could instead be a single
Arc<OwnedRootData>with liveness and theVMGcRef, and maybe that's cleaner; I haven't thought too much about how this would live alongsideManuallyRootedand whether it would want to share a GC-ref root array somewhere else or not...Two realizations make this more efficient than the
Arc<Mutex<whole root list>>approach: (i) we do have a mut borrow to the store when we create, so it's fine to have a normalVecof roots registered -- onlyDropis "remote" without the store; (ii) we have a separate bit of state per root, and it's just an atomic bool, which on common architectures (x86 and Apple Silicon's aarch64 at least) has atomic loads that are exactly as cheap as normal loads.It's also fully safe Rust (
Arcmakes it so), and is pay-as-you-go: with no such roots existing, GC root scanning has one check of vec-is-empty and a never-taken branch; as close to zero-overhead as we can make it. There is no mutex contention anywhere, because the only "meeting point" is theVecthat's mutated under a&mut Store. In terms of memory allocation, it's certainly more expensive than a LIFO root (which is just an index into an array!), because there's the separateArcallocation, but I suspect most uses of these roots are likely to be relatively high-level "entry points" or cases like exception returns where scopes don't map well to usage patterns; we can encourage use of LIFO scopes where possible.Naming
I've called this an "owned root" and I'd gently suggest considering different names for our current
Rooted, to more explicitly describe the difference, if we adopt this -- something likeScopedRootedvs.OwnedRooted?
alexcrichton commented on issue #11445:
Something like this seems reasonable to me yeah, but I'd defer to @fitzgen. We don't really have a great benchmark or application of sorts to weigh this against though at this time in the sense of evaluating "is the
Arcoverhead acceptable?" Without that we're kind of just shooting in the dark and predicting. For exampleArcis relatively expensive with an allocation plus strong/weak counts so there's synchronization not just on theDrop-sets-the-flag but alsoDrop-the-Arcitself. We could perhaps get that down with a customArcof sorts but then there's also the question of the allocation. (and so on and so forth agbout exact possible vectors we could tweak things, hard to weigh in my mind at least...)
alexcrichton commented on issue #11445:
Oh, sorry, I'm still waking up. Renaming
RootedI also think would be reasonable.
cfallin commented on issue #11445:
That's all true; allocation of an
Arcis noted above as the main cost here. I'd note that the synchronization on the Arc itself is also "root-local", i.e. of a different character than the synchronization overhead noted in docs to argue against registered roots.High-level I suspect it'd be nice to provide some tool here, even with some cost when used (and zero cost when not), because right now there is no way in the public API to build something like this on top of it :-)
alexcrichton commented on issue #11445:
Good point yeah, and I'd agree with that. I'd prefer to avoid exposing any
Arc-related details in the public API, and instead provide the constraint that "when you drop this root handle thing it'll eventually get GC'd if not rooted elsewhere". We could then refactor various details internally over time as necessary
cfallin commented on issue #11445:
Indeed, the above proposal doesn't expose any Arcs at all; the only public API is the proposed
OwnedRoot. (I think you're agreeing with this but just for clarity to state it explicitly...)
fitzgen closed issue #11445:
Upon pondering the various rooting options in #11326, and reading through the discussion of API tradeoffs here, it occurs to me that there may be a possible addition to the API, which offers more natural "owned root" semantics in Rust but is still pay-as-you-go, which I'd like to sketch here.
Background and Motivation
Currently, we have two types,
Rooted<T>andManuallyRooted<T>. The former has a LIFO discipline, implicitly attaching to the deepestRootScopein a stack of scopes on the store, and the latter is completely manual. Notably, the latter does not have aDropimpl that actually unroots, because doing so would require holding a reference to theStoresomehow -- implicitly withArcs somehow, because actual borrows of theStorewould preclude any other operations.The LIFO discipline of
Rootedworks well when it works, i.e., when the user is aware of their scopes, but in this comment I outline a few of the realizations I've had around ergonomics when it combines with other features -- in particular, Rust's?operator, which pushes users to naturally "propagate errors upward without explicit thought". The usual expectation is that theEtype in theResultsomehow owns its info. So a dynamic failure (a panic, no less!) when thatEcrosses over a scope is quite surprising. Exceptional returns are the most obvious example to me right now, but I believe some of the surprise here may also occur with users who naively (but understandably) expect that "GC" means they don't have to worry about lifetimes. Said more succinctly, a type namedRootedin a language like Rust where types often imply ownership might imply that it keeps a root as long as it exists. I know I certainly had that expectation at first. (The docs are very good, but "least surprise" still applies here!)We provide the "escape hatch", as the docs describe, of
ManuallyRooted, and this can certainly work well if the user has extreme discipline -- unfortunately, the requirement of a manual unrooting step means that it is very easy to get wrong, again as the docs describe well.The middle ground, of a type that somehow unregisters itself on
Dropby keeping enough of a handle on the engine's internal state to do so, is described as impractical: it would require anArc<Mutex<...>>to track internal handle state/registration, as the docs say. This would create synchronization overhead, and potentially pessimize common-case GC operations too. Is there a better way?Idea: Pay-as-you-Go "Arc'd liveness flags"
Ideally, I want something that:
- Acts as an "owned root", i.e., keeps the referent alive as long as the root type exists and the store exists
- Has a proper
Dropsuch that dropping the root type ensures the referent is not leaked(*)- Imposes zero overhead on GC if these roots are not used, and imposes only small overhead proportional to the number of such roots if they are
Note the sneaky (*) above: I want to avoid leaks, not to eagerly detect when an unregistration occurs. In other words, let's permit deferring some action from the
Dropto, say, the next time a GC runs.Then to avoid synchronization overhead and contention, rather than a large monolithic registry under a single mutex, let's have some small shared state per owned root.
The idea is: keep a "liveness flag" in an
Arc<AtomicBool>. In steady state, when live, there are two references to thisArc: from the owned root type, and from aowned_roots: Vec<(Option<VMGcRef>, Arc<AtomicBool>)>in the GC roots list. When the owned root drops, it sets its atomic bool to false, and drops itsArcreference. When the GC scans roots, it reads out the liveness flags, and removes those roots that the owner has dropped. (E.g. via aretainon theVec.)[^1][^1]: Slight variant: the tuple above could instead be a single
Arc<OwnedRootData>with liveness and theVMGcRef, and maybe that's cleaner; I haven't thought too much about how this would live alongsideManuallyRootedand whether it would want to share a GC-ref root array somewhere else or not...Two realizations make this more efficient than the
Arc<Mutex<whole root list>>approach: (i) we do have a mut borrow to the store when we create, so it's fine to have a normalVecof roots registered -- onlyDropis "remote" without the store; (ii) we have a separate bit of state per root, and it's just an atomic bool, which on common architectures (x86 and Apple Silicon's aarch64 at least) has atomic loads that are exactly as cheap as normal loads.It's also fully safe Rust (
Arcmakes it so), and is pay-as-you-go: with no such roots existing, GC root scanning has one check of vec-is-empty and a never-taken branch; as close to zero-overhead as we can make it. There is no mutex contention anywhere, because the only "meeting point" is theVecthat's mutated under a&mut Store. In terms of memory allocation, it's certainly more expensive than a LIFO root (which is just an index into an array!), because there's the separateArcallocation, but I suspect most uses of these roots are likely to be relatively high-level "entry points" or cases like exception returns where scopes don't map well to usage patterns; we can encourage use of LIFO scopes where possible.Naming
I've called this an "owned root" and I'd gently suggest considering different names for our current
Rooted, to more explicitly describe the difference, if we adopt this -- something likeScopedRootedvs.OwnedRooted?
fitzgen commented on issue #11445:
Yes, LIFO
Rooteds are definitely a bit footgun-y with exceptions. I like the immediate solution for that (theStoreContextMut::throw(&mut self, &Rooted<ExnRef>) -> wasmtime::ThrownExceptionAPI or whatever) discussed on the exceptions PR, so I will focus on the general rooting ideas here.Also, I agree that this scheme would work to root GC objects and would ultimately provide users with a more-ergonomic API than what we have today.
The big challenge that springs to mind when considering this
Arc<AtomicBool>scheme is management of theVec<(Option<VMGcRef>, Arc<AtomicBool>)>inside the roots list. How big do we let thatVecget? Do we trigger GC if it reaches some limit?The old
VMGcRefActivationsTablehad a bump region that did similar-ish things as proposed here, but in service of different ends, and managing its size/capacity proved to be difficult. We wanted to impose a size limit, but because cleanup happened on GC, this meant we needed to trigger full GCs even when it was unlikely we would have any productive work to do other than cleaning up that bump region. This was _very_ slow. We then adapted the limit to grow in an amortized fashion, and this sped things up some (still not as much as eventually removing the bump region completely did) but it also meant that memory overheads started getting bigger and bigger. We ideally want the pooling allocator and its config to control and pre-allocate ~all major allocations and things that areO(wasm module)orO(wasm runtime)or anything like that. If I say I want to give 128MiB to each GC heap in my pooling allocator, I don't want extra bookkeeping allocations outside of that, as much as possible.The size of this
Vecwould be more-directly correlated with the host's behavior than with the guest's, but I could imagine ways to indirectly control it from the guest, like throwing and catching in a loop.
This is not a solution to the previous challenge, but miiiiight alleviate it a little: we could replace the
VecwithHashMap<VMGcRef, Arc<AtomicBool>>so that we wouldn't ever create multipleArc<AtomicBool>roots for the same object. In turn, it would make root-creation that much slower.
We could also have a free list of available
Arc<bool>s to help lower the cost of (some of) theArcs'mallocs. Although themallocimplementation's size classes (that it presumably has) would probably already do most of that.
Now that I think about it some more, I don't think we even need the
AtomicBooland can just haveArc<()>and use theArc's reference count to keep track of whether theOwnedRootor whatever is still alive. We could either hold a weak reference in ourVec/HashMapor else we could just subtract one from the reference count to get the "real" reference count.
Also, this is starting to look very much like what we do with registered types, except with waiting for GC to clean up the "dead" roots rather than expecting callers of
decrefto do it when the ref count reaches zero. (With the type registry, we put types behind anArcso that they can be accessed and their ref counts incremented and decremented without taking the registry'sRwLock, and the lock is only taken once the teh ref count reaches zero for deregistration.)
Oh the other challenge that springs to mind now is updating rooted references after a moving GC. I think we could handle this with
Arc<AtomicU32>where theAtomicU32is the rawVMGcRef. We could maybe even have a helper that takes a&mut StoreOpaqueand then does a relaxed load because the only thing that could update that reference is
fitzgen reopened issue #11445:
Upon pondering the various rooting options in #11326, and reading through the discussion of API tradeoffs here, it occurs to me that there may be a possible addition to the API, which offers more natural "owned root" semantics in Rust but is still pay-as-you-go, which I'd like to sketch here.
Background and Motivation
Currently, we have two types,
Rooted<T>andManuallyRooted<T>. The former has a LIFO discipline, implicitly attaching to the deepestRootScopein a stack of scopes on the store, and the latter is completely manual. Notably, the latter does not have aDropimpl that actually unroots, because doing so would require holding a reference to theStoresomehow -- implicitly withArcs somehow, because actual borrows of theStorewould preclude any other operations.The LIFO discipline of
Rootedworks well when it works, i.e., when the user is aware of their scopes, but in this comment I outline a few of the realizations I've had around ergonomics when it combines with other features -- in particular, Rust's?operator, which pushes users to naturally "propagate errors upward without explicit thought". The usual expectation is that theEtype in theResultsomehow owns its info. So a dynamic failure (a panic, no less!) when thatEcrosses over a scope is quite surprising. Exceptional returns are the most obvious example to me right now, but I believe some of the surprise here may also occur with users who naively (but understandably) expect that "GC" means they don't have to worry about lifetimes. Said more succinctly, a type namedRootedin a language like Rust where types often imply ownership might imply that it keeps a root as long as it exists. I know I certainly had that expectation at first. (The docs are very good, but "least surprise" still applies here!)We provide the "escape hatch", as the docs describe, of
ManuallyRooted, and this can certainly work well if the user has extreme discipline -- unfortunately, the requirement of a manual unrooting step means that it is very easy to get wrong, again as the docs describe well.The middle ground, of a type that somehow unregisters itself on
Dropby keeping enough of a handle on the engine's internal state to do so, is described as impractical: it would require anArc<Mutex<...>>to track internal handle state/registration, as the docs say. This would create synchronization overhead, and potentially pessimize common-case GC operations too. Is there a better way?Idea: Pay-as-you-Go "Arc'd liveness flags"
Ideally, I want something that:
- Acts as an "owned root", i.e., keeps the referent alive as long as the root type exists and the store exists
- Has a proper
Dropsuch that dropping the root type ensures the referent is not leaked(*)- Imposes zero overhead on GC if these roots are not used, and imposes only small overhead proportional to the number of such roots if they are
Note the sneaky (*) above: I want to avoid leaks, not to eagerly detect when an unregistration occurs. In other words, let's permit deferring some action from the
Dropto, say, the next time a GC runs.Then to avoid synchronization overhead and contention, rather than a large monolithic registry under a single mutex, let's have some small shared state per owned root.
The idea is: keep a "liveness flag" in an
Arc<AtomicBool>. In steady state, when live, there are two references to thisArc: from the owned root type, and from aowned_roots: Vec<(Option<VMGcRef>, Arc<AtomicBool>)>in the GC roots list. When the owned root drops, it sets its atomic bool to false, and drops itsArcreference. When the GC scans roots, it reads out the liveness flags, and removes those roots that the owner has dropped. (E.g. via aretainon theVec.)[^1][^1]: Slight variant: the tuple above could instead be a single
Arc<OwnedRootData>with liveness and theVMGcRef, and maybe that's cleaner; I haven't thought too much about how this would live alongsideManuallyRootedand whether it would want to share a GC-ref root array somewhere else or not...Two realizations make this more efficient than the
Arc<Mutex<whole root list>>approach: (i) we do have a mut borrow to the store when we create, so it's fine to have a normalVecof roots registered -- onlyDropis "remote" without the store; (ii) we have a separate bit of state per root, and it's just an atomic bool, which on common architectures (x86 and Apple Silicon's aarch64 at least) has atomic loads that are exactly as cheap as normal loads.It's also fully safe Rust (
Arcmakes it so), and is pay-as-you-go: with no such roots existing, GC root scanning has one check of vec-is-empty and a never-taken branch; as close to zero-overhead as we can make it. There is no mutex contention anywhere, because the only "meeting point" is theVecthat's mutated under a&mut Store. In terms of memory allocation, it's certainly more expensive than a LIFO root (which is just an index into an array!), because there's the separateArcallocation, but I suspect most uses of these roots are likely to be relatively high-level "entry points" or cases like exception returns where scopes don't map well to usage patterns; we can encourage use of LIFO scopes where possible.Naming
I've called this an "owned root" and I'd gently suggest considering different names for our current
Rooted, to more explicitly describe the difference, if we adopt this -- something likeScopedRootedvs.OwnedRooted?
fitzgen commented on issue #11445:
(Sorry mis-click)
fitzgen edited a comment on issue #11445:
Yes, LIFO
Rooteds are definitely a bit footgun-y with exceptions. I like the immediate solution for that (theStoreContextMut::throw(&mut self, &Rooted<ExnRef>) -> wasmtime::ThrownExceptionAPI or whatever) discussed on the exceptions PR, so I will focus on the general rooting ideas here.Also, I agree that this scheme would work to root GC objects and would ultimately provide users with a more-ergonomic API than what we have today.
The big challenge that springs to mind when considering this
Arc<AtomicBool>scheme is management of theVec<(Option<VMGcRef>, Arc<AtomicBool>)>inside the roots list. How big do we let thatVecget? Do we trigger GC if it reaches some limit?The old
VMGcRefActivationsTablehad a bump region that did similar-ish things as proposed here, but in service of different ends, and managing its size/capacity proved to be difficult. We wanted to impose a size limit, but because cleanup happened on GC, this meant we needed to trigger full GCs even when it was unlikely we would have any productive work to do other than cleaning up that bump region. This was _very_ slow. We then adapted the limit to grow in an amortized fashion, and this sped things up some (still not as much as eventually removing the bump region completely did) but it also meant that memory overheads started getting bigger and bigger. We ideally want the pooling allocator and its config to control and pre-allocate ~all major allocations and things that areO(wasm module)orO(wasm runtime)or anything like that. If I say I want to give 128MiB to each GC heap in my pooling allocator, I don't want extra bookkeeping allocations outside of that, as much as possible.The size of this
Vecwould be more-directly correlated with the host's behavior than with the guest's, but I could imagine ways to indirectly control it from the guest, like throwing and catching in a loop.
This is not a solution to the previous challenge, but miiiiight alleviate it a little: we could replace the
VecwithHashMap<VMGcRef, Arc<AtomicBool>>so that we wouldn't ever create multipleArc<AtomicBool>roots for the same object. In turn, it would make root-creation that much slower.
We could also have a free list of available
Arc<bool>s to help lower the cost of (some of) theArcs'mallocs. Although themallocimplementation's size classes (that it presumably has) would probably already do most of that.
Now that I think about it some more, I don't think we even need the
AtomicBooland can just haveArc<()>and use theArc's reference count to keep track of whether theOwnedRootor whatever is still alive. We could either hold a weak reference in ourVec/HashMapor else we could just subtract one from the reference count to get the "real" reference count.
Also, this is starting to look very much like what we do with registered types, except with waiting for GC to clean up the "dead" roots rather than expecting callers of
decrefto do it when the ref count reaches zero. (With the type registry, we put types behind anArcso that they can be accessed and their ref counts incremented and decremented without taking the registry'sRwLock, and the lock is only taken once the teh ref count reaches zero for deregistration.)
Oh the other challenge that springs to mind now is updating rooted references after a moving GC. I think we could handle this with
Arc<AtomicU32>where theAtomicU32is the rawVMGcRef. We could maybe even have a helper that takes a&mut StoreOpaqueand then does a relaxed load because the only thing that could update that reference is a GC which takes a mutable store, so if you have a mutable store (and it is the correct store; probably we would need aStoreIdsoemwhere too) then you know the reference can't move out from under you for the time being.
fitzgen edited a comment on issue #11445:
Yes, LIFO
Rooteds are definitely a bit footgun-y with exceptions. I like the immediate solution for that (theStoreContextMut::throw(&mut self, &Rooted<ExnRef>) -> wasmtime::ThrownExceptionAPI or whatever) discussed on the exceptions PR, so I will focus on the general rooting ideas here.Also, I agree that this scheme would work to root GC objects and would ultimately provide users with a more-ergonomic API than what we have today.
The big challenge that springs to mind when considering this
Arc<AtomicBool>scheme is management of theVec<(Option<VMGcRef>, Arc<AtomicBool>)>inside the roots list. How big do we let thatVecget? Do we trigger GC if it reaches some limit?The old
VMGcRefActivationsTablehad a bump region that did similar-ish things as proposed here, but in service of different ends, and managing its size/capacity proved to be difficult. We wanted to impose a size limit, but because cleanup happened on GC, this meant we needed to trigger full GCs even when it was unlikely we would have any productive work to do other than cleaning up that bump region. This was _very_ slow. We then adapted the limit to grow in an amortized fashion, and this sped things up some (still not as much as eventually removing the bump region completely did) but it also meant that memory overheads started getting bigger and bigger. We ideally want the pooling allocator and its config to control and pre-allocate ~all major allocations and things that areO(wasm module)orO(wasm runtime)or anything like that. If I say I want to give 128MiB to each GC heap in my pooling allocator, I don't want extra bookkeeping allocations outside of that, as much as possible.The size of this
Vecwould be more-directly correlated with the host's behavior than with the guest's, but I could imagine ways to indirectly control it from the guest, like throwing and catching in a loop.
This is not a solution to the previous challenge, but miiiiight alleviate it a little: we could replace the
VecwithHashMap<VMGcRef, Arc<AtomicBool>>so that we wouldn't ever create multipleArc<AtomicBool>roots for the same object. In turn, it would make root-creation that much slower.
We could also have a free list of available
Arc<bool>s to help lower the cost of (some of) theArcs'mallocs. Although themallocimplementation's size classes (that it presumably has) would probably already do most of that.
Now that I think about it some more, I don't think we even need the
AtomicBooland can just haveArc<()>and use theArc's reference count to keep track of whether theOwnedRootor whatever is still alive. We could either hold a weak reference in ourVec/HashMapor else we could just subtract one from the reference count to get the "real" reference count.
Also, this is starting to look very much like what we do with registered types, except with waiting for GC to clean up the "dead" roots rather than expecting callers of
decrefto do it when the ref count reaches zero. (With the type registry, we put types behind anArcso that they can be accessed and their ref counts incremented and decremented without taking the registry'sRwLock, and the lock is only taken once the teh ref count reaches zero for deregistration. Taking the lock there is roughly equivalent to accessing the store here. Something to mull over.)
Oh the other challenge that springs to mind now is updating rooted references after a moving GC. I think we could handle this with
Arc<AtomicU32>where theAtomicU32is the rawVMGcRef. We could maybe even have a helper that takes a&mut StoreOpaqueand then does a relaxed load because the only thing that could update that reference is a GC which takes a mutable store, so if you have a mutable store (and it is the correct store; probably we would need aStoreIdsoemwhere too) then you know the reference can't move out from under you for the time being.
cfallin commented on issue #11445:
It's possible I'm misunderstanding, or the thought is a follow-on from your mutations, but
Oh the other challenge that springs to mind now is updating rooted references after a moving GC. I think we could handle this with
Arc<AtomicU32>where theAtomicU32is the rawVMGcRefisn't an issue in the scheme as proposed since the
VMGcRefs are still in the store / GC roots list (indirected as with all other roots) -- only the liveness flag is accessible when one doesn't have a&mut Store.
Growth of the roots list is certainly a question; I suppose one could have a list trim step that is independent of a GC, essentially doing a pass with
Vec::retainand unrooting the dead roots but doing no other work (and this would be in a context where we're creating a new root, so we have a&mut Storealready).
That does, however, make me realize something I said above isn't true: one can actually build this abstraction on top of
ManuallyRootedtoday; in essence one could have aStoreWithNicerRoots<T>that internally owns aStore<T>and carries a list ofManuallyRootedand theArc<AtomicBool>; at any point it scans the root-list to clean it up, it has a&mut Storeand can do aManuallyRooted::unroot. So then it's a question of whether we want to absorb such a library intoStoreor not...
fitzgen commented on issue #11445:
isn't an issue in the scheme as proposed since the
VMGcRefs are still in the store / GC roots list (indirected as with all other roots) -- only the liveness flag is accessible when one doesn't have a&mut Store.Ah okay I think I misunderstood.
Given an
OwnedRootand a store, how do Wasmtime internals get the rawVMGcRef? That is, what is inside theOwnedRootother than anArchandle that lets us get theVMGcRefinO(1)time? I'm assuming we don't want to do anO(n)linear scan through theVec, comparing each entry'sArcpointer to theOwnedRoot'sArcpointer. But I also don't know what it would be:
It can't be an index into the
Vecunless we never shrink theVec's size[^0] which means we would pretty much have to do the full free list thing (e.g. by usingwasmtime_slab::Slab) but also means that peak memory overheads would never go away once reached (unless we want to get into the game of having the free list try to hand out low indices when possible, keeping high indices vacant, and shrinking theVecin an amortized way when we can, but this is getting into full-malloc-implementation territory which seems like more complexity than we probably want here)It also can't be a
VMGcRefunless we do theArc<AtomicU32>thing do deal with moving collectors.[^0]: Or if we do shrink the
Vec's size but use indices insideOwnedRoot, then we need a way to fixup all the existingOwnedRoot's indices after we shrink theVec's size. Basically the same problem as a moving GC.
fitzgen edited a comment on issue #11445:
isn't an issue in the scheme as proposed since the
VMGcRefs are still in the store / GC roots list (indirected as with all other roots) -- only the liveness flag is accessible when one doesn't have a&mut Store.Ah okay I think I misunderstood.
Given an
OwnedRootand a store, how do Wasmtime internals get the rawVMGcRef? That is, what is inside theOwnedRootother than anArc<AtomicBool>handle that lets us get theVMGcRefinO(1)time? I'm assuming we don't want to do anO(n)linear scan through theVec, comparing each entry'sArcpointer to theOwnedRoot'sArcpointer. But I also don't know what it would be:
It can't be an index into the
Vecunless we never shrink theVec's size[^0] which means we would pretty much have to do the full free list thing (e.g. by usingwasmtime_slab::Slab) but also means that peak memory overheads would never go away once reached (unless we want to get into the game of having the free list try to hand out low indices when possible, keeping high indices vacant, and shrinking theVecin an amortized way when we can, but this is getting into full-malloc-implementation territory which seems like more complexity than we probably want here)It also can't be a
VMGcRefunless we do theArc<AtomicU32>thing do deal with moving collectors.[^0]: Or if we do shrink the
Vec's size but use indices insideOwnedRoot, then we need a way to fixup all the existingOwnedRoot's indices after we shrink theVec's size. Basically the same problem as a moving GC.
fitzgen edited a comment on issue #11445:
isn't an issue in the scheme as proposed since the
VMGcRefs are still in the store / GC roots list (indirected as with all other roots) -- only the liveness flag is accessible when one doesn't have a&mut Store.Ah okay I think I misunderstood.
Given an
OwnedRootand a store, how do Wasmtime internals get the rawVMGcRef? That is, what is inside theOwnedRootother than anArc<AtomicBool>handle that lets us get theVMGcRefinO(1)time? I'm assuming we don't want to do anO(n)linear scan through theVec, comparing each entry'sArcpointer to theOwnedRoot'sArcpointer. But I also don't know what it would be:
It can't be an index into the
Vecunless we never shrink theVec's size[^0] which means we would pretty much have to do the full free list thing (e.g. by usingwasmtime_slab::Slab) but also means that peak memory overheads would never go away once reached even after GC (unless we want to get into the game of having the free list try to hand out low indices when possible, keeping high indices vacant, and shrinking theVecin an amortized way when we can, but this is getting into full-malloc-implementation territory which seems like more complexity than we probably want here)It also can't be a
VMGcRefunless we do theArc<AtomicU32>thing do deal with moving collectors.[^0]: Or if we do shrink the
Vec's size but use indices insideOwnedRoot, then we need a way to fixup all the existingOwnedRoot's indices after we shrink theVec's size. Basically the same problem as a moving GC.
cfallin commented on issue #11445:
It would be the secret third option: a
GcRootIndex, just like today'sManuallyRooted. As long as theOwnedRootedexists, it will not set its liveness flag to false, so the index remains valid (same argument asManuallyRooted, which explicitly deallocs its index whenunroot()is called). I suppose the part I missed in the description is that theVecreally holds theArcalong with the associatedGcRootIndex(or the external library holds aManuallyRooted), not the rawVMGcRef.
fitzgen commented on issue #11445:
It would be the secret third option: a
GcRootIndex, just like today'sManuallyRooted. As long as theOwnedRootedexists, it will not set its liveness flag to false, so the index remains valid (same argument asManuallyRooted, which explicitly deallocs its index whenunroot()is called). I suppose the part I missed in the description is that theVecreally holds theArcalong with the associatedGcRootIndex(or the external library holds aManuallyRooted), not the rawVMGcRef.Ah okay, I've got it now, makes sense.
I suppose one could have a list trim step that is independent of a GC, essentially doing a pass with
Vec::retainand unrooting the dead roots but doing no other work (and this would be in a context where we're creating a new root, so we have a&mut Storealready).This makes sense. Will be slightly tricky tuning this well and making sure it is amortized and all that.
Still slightly concerned about sizes but I think we have figured out enough of the details that it is worth putting up a PR for, where we can have more-concrete discussion.
fitzgen commented on issue #11445:
And FWIW, I don't think this should be an external library thing. If it works well and provides better ergonomics than what we have, then we should just add it to Wasmtime and maybe replace our existing APIs.
cfallin closed issue #11445:
Upon pondering the various rooting options in #11326, and reading through the discussion of API tradeoffs here, it occurs to me that there may be a possible addition to the API, which offers more natural "owned root" semantics in Rust but is still pay-as-you-go, which I'd like to sketch here.
Background and Motivation
Currently, we have two types,
Rooted<T>andManuallyRooted<T>. The former has a LIFO discipline, implicitly attaching to the deepestRootScopein a stack of scopes on the store, and the latter is completely manual. Notably, the latter does not have aDropimpl that actually unroots, because doing so would require holding a reference to theStoresomehow -- implicitly withArcs somehow, because actual borrows of theStorewould preclude any other operations.The LIFO discipline of
Rootedworks well when it works, i.e., when the user is aware of their scopes, but in this comment I outline a few of the realizations I've had around ergonomics when it combines with other features -- in particular, Rust's?operator, which pushes users to naturally "propagate errors upward without explicit thought". The usual expectation is that theEtype in theResultsomehow owns its info. So a dynamic failure (a panic, no less!) when thatEcrosses over a scope is quite surprising. Exceptional returns are the most obvious example to me right now, but I believe some of the surprise here may also occur with users who naively (but understandably) expect that "GC" means they don't have to worry about lifetimes. Said more succinctly, a type namedRootedin a language like Rust where types often imply ownership might imply that it keeps a root as long as it exists. I know I certainly had that expectation at first. (The docs are very good, but "least surprise" still applies here!)We provide the "escape hatch", as the docs describe, of
ManuallyRooted, and this can certainly work well if the user has extreme discipline -- unfortunately, the requirement of a manual unrooting step means that it is very easy to get wrong, again as the docs describe well.The middle ground, of a type that somehow unregisters itself on
Dropby keeping enough of a handle on the engine's internal state to do so, is described as impractical: it would require anArc<Mutex<...>>to track internal handle state/registration, as the docs say. This would create synchronization overhead, and potentially pessimize common-case GC operations too. Is there a better way?Idea: Pay-as-you-Go "Arc'd liveness flags"
Ideally, I want something that:
- Acts as an "owned root", i.e., keeps the referent alive as long as the root type exists and the store exists
- Has a proper
Dropsuch that dropping the root type ensures the referent is not leaked(*)- Imposes zero overhead on GC if these roots are not used, and imposes only small overhead proportional to the number of such roots if they are
Note the sneaky (*) above: I want to avoid leaks, not to eagerly detect when an unregistration occurs. In other words, let's permit deferring some action from the
Dropto, say, the next time a GC runs.Then to avoid synchronization overhead and contention, rather than a large monolithic registry under a single mutex, let's have some small shared state per owned root.
The idea is: keep a "liveness flag" in an
Arc<AtomicBool>. In steady state, when live, there are two references to thisArc: from the owned root type, and from aowned_roots: Vec<(Option<VMGcRef>, Arc<AtomicBool>)>in the GC roots list. When the owned root drops, it sets its atomic bool to false, and drops itsArcreference. When the GC scans roots, it reads out the liveness flags, and removes those roots that the owner has dropped. (E.g. via aretainon theVec.)[^1][^1]: Slight variant: the tuple above could instead be a single
Arc<OwnedRootData>with liveness and theVMGcRef, and maybe that's cleaner; I haven't thought too much about how this would live alongsideManuallyRootedand whether it would want to share a GC-ref root array somewhere else or not...Two realizations make this more efficient than the
Arc<Mutex<whole root list>>approach: (i) we do have a mut borrow to the store when we create, so it's fine to have a normalVecof roots registered -- onlyDropis "remote" without the store; (ii) we have a separate bit of state per root, and it's just an atomic bool, which on common architectures (x86 and Apple Silicon's aarch64 at least) has atomic loads that are exactly as cheap as normal loads.It's also fully safe Rust (
Arcmakes it so), and is pay-as-you-go: with no such roots existing, GC root scanning has one check of vec-is-empty and a never-taken branch; as close to zero-overhead as we can make it. There is no mutex contention anywhere, because the only "meeting point" is theVecthat's mutated under a&mut Store. In terms of memory allocation, it's certainly more expensive than a LIFO root (which is just an index into an array!), because there's the separateArcallocation, but I suspect most uses of these roots are likely to be relatively high-level "entry points" or cases like exception returns where scopes don't map well to usage patterns; we can encourage use of LIFO scopes where possible.Naming
I've called this an "owned root" and I'd gently suggest considering different names for our current
Rooted, to more explicitly describe the difference, if we adopt this -- something likeScopedRootedvs.OwnedRooted?
Last updated: Dec 06 2025 at 07:03 UTC