fitzgen opened issue #13231:
My rough assumption as well is that long-term we'll want nurseries as well, right? If that's the case would it make sense to document that in an issue as an open future work item?
Not totally clear to me right now. A nursery probably isn't worth it for our typical short-running programs, since we can mostly just avoid collecting in those cases, and would only slow down wasm execution via the write barriers it would require. (Also, our amortized GC heap growth algorithm effectively means we have N nursery collections on the way to the full heap size, if you squint hard.)
But someone with long-running wasm instance use cases may indeed want a nursery. But then we also wouldn't want to force a nursery on the short-running use cases that don't need it.
Seems like the kind of bridge we can cross when we get to it. I wouldn't go so far as to say that we should expect it to happen with enough certainty that a tracking issue is required at this time.
_Originally posted by @fitzgen in https://github.com/bytecodealliance/wasmtime/issues/13107#issuecomment-4254865035_
Also, regarding my above comment, we could potentially introduce another collector that has a nursery and is tuned for longer-running Wasm instances, while keeping the copying collector as it is today without a nursery.
Or somehow make it a cargo feature or something, but that seems pretty hairy.
In addition to the above discussion, there is the question of how to implement the nursery, and in particular its remembered set, with O(1)-sized host data structures outside of the GC heap's memory.
The remembered set is the set of GC edges from the tenured heap into the nursery.
A typical approach might be something like a chunk-list where we bump into each chunk until we run out of capacity and then allocate another chunk, and chunks form a linked list together. The problem is that this means that
GcStore::write_gc_refmight need to allocate, which might need to grow the GC heap or trigger a collection, so it would need to become async and fallible, which means it needs to keep GC objects in a valid state for retry after GC, which is just a huge can of worms given that the remembered set is required to perform GC correctly.We could alternatively have an intrusive linked list in tenured objects' headers, and inclusion in this list means that one of your edges is in the remembered set. This is a big overapproximation of the remembered set, and if a very large array has one edge in the remembered set, then we will need to scan the whole array when collecting the nursery. But it does give use the O(1)-sized host data structures property.
- We could potentially alleviate the large-array issue by doing a variant of card marking, where we have a bitmap in the array with a bit for every N elements and we set an element's associated bit when it is inserted into the remembered set. This caps wasted tracing to
N - 1at most but imposes size overhead and makes barriers more expensive.Some kind of hybrid set up where we use a chunk-list when possible but fall back to the intrusive linked list of tenured objects when allocating a new chunk fails and would require GC or heap growth? Does the fast thing when it can, falls back to the slower thing when it can't to avoid the fallibility/async/gc in the middle of a barrier question. Downsides are additional complexity, we would always pay for the size of the linked-list link in object headers, even when the remembered set fully fits in the chunk-list, and more-expensive barriers.
Certainly there are other ideas to explore here. I mostly just wanted to write this stuff down and get it out of my head while it is fresh. I don't actually expect us to make any moves here anytime soon.
fitzgen added the wasm-proposal:gc label to Issue #13231.
fitzgen commented on issue #13231:
Other implementation option I thought of the other day: try to allocate new chunks for the remembered set's chunk list when we can, but if it fails, then set a bit somewhere saying that everything in the nursery is conservatively considered referenced from the tenured heap, so we promote everything on next nursery GC. Note that this would only be required for the host implementations of the write barriers, not the JIT code's barriers, so is probably rare and acceptable.
cfallin commented on issue #13231:
There might also be other dimensions to limit space in: for example, per minimal-object-size chunk of the nursery, have 1 (or K) pointers-to-references-to-this-object-in-tenured-heap? E.g. for the 1 pointer case, and let's say min obj size 16 bytes so 25% space overhead (4 bytes per 16 bytes):
write_barrier(field_ptr, pointee): if is_nursery(pointee): if shadow[pointee] != null: slow path (push into linear remembered-set edge list, maybe async pause, ...) else: shadow[pointee] = field_ptrThis is optimizing on a hypothesis that most remembered nursery objects are going to be referenced by at most one edge into the nursery (common "allocate an object/object graph then root it by storing it into durable data structures" pattern). It does have a cliff when doing more than that of course. One could tune K upward, trading off complexity of the write barrier to move that cliff. It's also a not-too-steep cliff if the slow path is a push-into-remembered-list-chunk-with-space-available most of the time.
Last updated: Jun 01 2026 at 09:49 UTC