fitzgen opened PR #13107 from fitzgen:initial-impl-copying-collector to bytecodealliance:main:
This is a classic semi-space copying collector that uses bump allocation and Cheney-style worklists to avoid an explicit stack for grey (relocated but not yet scanned) objects outside the GC heap. Forwarding "pointers" (really
VMGcRefindices) are stored inline in objects in the old semi-space, which again avoids explicit data structures outside of the GC heap. Furthermore, an intrusive linked-list of allexternrefs is maintained to allow for efficiently sweeping their associated host data after collection (and, once again, avoiding additional data structures outside the GC heap).Allocation in compiled Wasm code always happens by calling out to the
gc_alloc_rawlibcall currently. This is expected to become inline bump allocation, very similar to the null collector, in a follow-up commit shortly after this one. It is delayed to make review easier.Collection is not incremental (in the Wasmtime sense, not the GC literature sense) yet and the
CopyingCollection::collect_incrementimplementation only has a single increment. This is delayed to make review easier and is also expected to come shortly in follow-up commits.I've also added new disas tests for the copying collector and also enabled running wast tests which need a GC heap with the copying collector.
Finally, fuzz config generation can now generate configurations that enable the copying collector.
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
fitzgen requested alexcrichton for a review on PR #13107.
fitzgen requested wasmtime-fuzz-reviewers for a review on PR #13107.
fitzgen requested wasmtime-compiler-reviewers for a review on PR #13107.
fitzgen requested wasmtime-core-reviewers for a review on PR #13107.
fitzgen updated PR #13107.
github-actions[bot] added the label wasmtime:api on PR #13107.
github-actions[bot] added the label wasmtime:ref-types on PR #13107.
github-actions[bot] added the label fuzzing on PR #13107.
github-actions[bot] commented on PR #13107:
Subscribe to Label Action
cc @fitzgen
<details>
This issue or pull request has been labeled: "fuzzing", "wasmtime:api", "wasmtime:ref-types"Thus the following users have been cc'd because of the following labels:
- fitzgen: fuzzing, wasmtime:ref-types
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
fitzgen updated PR #13107.
alexcrichton submitted PR review:
Overall I think this looks fantastic, thanks so much!
I've got some minor comments below, but I'm also happy to defer anything to issues as you see fit. My rough assumption as well is that long-term we'll want nurseries as well, right? If that's the case would it make sense to document that in an issue as an open future work item?
alexcrichton created PR review comment:
Random thought while reading this, shouldn't
notrapandalignedbe absent here? Given the sandboxing strategy I'd expect this to not be asserted to be aligned and additionally would be allowed to trap
alexcrichton created PR review comment:
Along the lines of sharing code with other collectors, this is one example where I suspect that the extern/any/exn branch may differ between drc and null/copying, but otherwise all these other variants are probably the same. Additionally even the null/copying collector are probably the same here
alexcrichton created PR review comment:
Minor comment on this and surrounding code -- this is a lot of boilerplate I suspect is more-or-less duplicated between the drc and null collector as well. It might be worth looking into at some point having various hooks into this process where a lot more of the boilerplate can be shared. For example inline allocations are probably completely different per collector but libcall-based allocations I suspect are almost exactly the same. Even calling
init.initialize(...)and such is probably pretty similar would be my guess across collectors.Not needed for now of course, but something that might be worth keeping in the back of our heads
alexcrichton created PR review comment:
This is one example where I suspect that the null collector and the copying collector probably have the same code here (for future refactoring)
alexcrichton created PR review comment:
This logic is present during initialization as well I think? Could that get deduplicated?
alexcrichton created PR review comment:
nowadays Rust's built-in
next_multiple_ofmight work here?
alexcrichton created PR review comment:
Would it make sense to expand this module comment with a
vmoffsets.rs-like comment showing a pseudo-rust-struct for all the various layouts involved?
alexcrichton created PR review comment:
Ah this might answer my question from the
disastests, I thinktrustedmay not be what we want here. This may be a preexisting issue as well, though, in which case feel free to defer. I do see a fair number of::trusted()throughout this module though.
alexcrichton created PR review comment:
Since right after the swap one of these fields are overwritten I'd say this might be clearer. without the swap and instead just set idle to active then set active to none
alexcrichton created PR review comment:
This gotcha is in
initialize_semi_spacesabove too, perhaps extract to a helper?
alexcrichton created PR review comment:
Why not have a
VMCopyingHeaderAndForwardingRefat the head here?
alexcrichton created PR review comment:
I think this is all duplicated with the struct field case above, so maybe a helper function to encapsulate?
alexcrichton created PR review comment:
Perhaps
copy_withininstead ofget_disjoint_mut+copy_from_slice?
alexcrichton commented on PR #13107:
Oh, also, to kick the tire some more I might recommend locally making the copying collector the default (instead of drc) and running the full test suite. Iunno if that'll turn up much but might be good as a first pass
fitzgen commented on PR #13107:
My rough assumption as well is that long-term we'll want nurseries as well, right? If that's the case would it make sense to document that in an issue as an open future work item?
Not totally clear to me right now. A nursery probably isn't worth it for our typical short-running programs, since we can mostly just avoid collecting in those cases, and would only slow down wasm execution via the write barriers it would require. (Also, our amortized GC heap growth algorithm effectively means we have N nursery collections on the way to the full heap size, if you squint hard.)
But someone with long-running wasm instance use cases may indeed want a nursery. But then we also wouldn't want to force a nursery on the short-running use cases that don't need it.
Seems like the kind of bridge we can cross when we get to it. I wouldn't go so far as to say that we should expect it to happen with enough certainty that a tracking issue is required at this time.
fitzgen submitted PR review.
fitzgen created PR review comment:
Its a bit subtle and I went back and forth on this in the original GC implementation days, but basically traps at the clif level aren't really what we want:
- They produce trap code metadata, which we don't care about here
- They prevent GVNing and RLEing of read-only loads, which we do want here
- They prevent dead-load removal, which we do want here
And most of all these loads should never trap, so they shouldn't pessimize compilation in general, but we add sandboxing as an extra layer of safety just in case.
Maybe we should remove the
alignedflag, but again this should never actually happen. So if there was a bug where something was misaligned when we said it was aligned, that will at most result in a segfault or whatever, and that is not really any different from a panic because we said an index was in bounds of a vec but it wasn't, and we wouldn't remove#[cold]from a panic handler or whatever.
fitzgen created PR review comment:
Yeah we could probably define more common helpers.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
We could probably even move the
VM*HeaderandVMGcRefdefinitions intowasmtime-environsinceVMGcRefis always 32 bits regardless of the target pointer size.
fitzgen created PR review comment:
I went back and forth on doing that. Can switch to that.
fitzgen submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
What I'm worried about is UB-of-sorts where we're telling Craneilft that this load is always aligned and never traps and then at runtime, assuming there's a bug in either Cranelift or Wasmtime's GC, that's violated (in theory causing UB). I'm wary to bucket this under having a known set of possible outcomes because we're effectively violating a core assumption and I'm not sure we can enumerate all the outcomes. By analogy, the vec OOB isn't UB to hit the
#[cold]block naturally, but here I'm worried that it would be UB somehow to hit a trap here.In thinking more about this it might be a bit of a broader topic. For example right now the GC runtime is very panic-heavy if anything gets corrupted, effectively meaning that any logic bug turns into a DoS CVE. Not as bad as an arbitrary read/write CVE, but still just as high a bar for getting things right. I'm wondering if we should apply a component-model-async-lookalike strategy where everything returns a
Resultwith abail_bug!or similar instead. This is a bit beyond this here, though, but definitely touches on it I think because it affects our stance/assumptions when thinking about the GC runtime
fitzgen requested uweigand for a review on PR #13107.
fitzgen requested wasmtime-default-reviewers for a review on PR #13107.
fitzgen updated PR #13107.
fitzgen has enabled auto merge for PR #13107.
fitzgen commented on PR #13107:
Adding this to the review queue now, but we can definitely continue with some of these other bits in follow ups.
fitzgen submitted PR review.
fitzgen created PR review comment:
Filed https://github.com/bytecodealliance/wasmtime/issues/13112 to continue this discussion
fitzgen submitted PR review.
fitzgen created PR review comment:
Split this out into https://github.com/bytecodealliance/wasmtime/issues/13113
fitzgen created PR review comment:
Filed https://github.com/bytecodealliance/wasmtime/issues/13114 to keep discussing this
fitzgen submitted PR review.
fitzgen edited a comment on PR #13107:
Adding this to the merge queue now, but we can definitely continue with some of these other bits in follow ups.
fitzgen updated PR #13107.
fitzgen added PR #13107 Implement the semi-space copying garbage collector to the merge queue.
github-merge-queue[bot] removed PR #13107 Implement the semi-space copying garbage collector from the merge queue.
fitzgen edited PR #13107:
This is a classic semi-space copying collector that uses bump allocation and Cheney-style worklists to avoid an explicit stack for grey (relocated but not yet scanned) objects outside the GC heap. Forwarding "pointers" (really
VMGcRefindices) are stored inline in objects in the old semi-space, which again avoids explicit data structures outside of the GC heap. Furthermore, an intrusive linked-list of allexternrefs is maintained to allow for efficiently sweeping their associated host data after collection (and, once again, avoiding additional data structures outside the GC heap).Allocation in compiled Wasm code always happens by calling out to the
gc_alloc_rawlibcall currently. This is expected to become inline bump allocation, very similar to the null collector, in a follow-up commit shortly after this one. It is delayed to make review easier.Collection is not incremental (in the Wasmtime sense, not the GC literature sense) yet and the
CopyingCollection::collect_incrementimplementation only has a single increment. This is delayed to make review easier and is also expected to come shortly in follow-up commits.I've also added new disas tests for the copying collector and also enabled running wast tests which need a GC heap with the copying collector.
Finally, fuzz config generation can now generate configurations that enable the copying collector.
Fixes https://github.com/bytecodealliance/wasmtime/issues/10329
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
fitzgen commented on PR #13107:
My rough assumption as well is that long-term we'll want nurseries as well, right? If that's the case would it make sense to document that in an issue as an open future work item?
Not totally clear to me right now. A nursery probably isn't worth it for our typical short-running programs, since we can mostly just avoid collecting in those cases, and would only slow down wasm execution via the write barriers it would require. (Also, our amortized GC heap growth algorithm effectively means we have N nursery collections on the way to the full heap size, if you squint hard.)
But someone with long-running wasm instance use cases may indeed want a nursery. But then we also wouldn't want to force a nursery on the short-running use cases that don't need it.
Seems like the kind of bridge we can cross when we get to it. I wouldn't go so far as to say that we should expect it to happen with enough certainty that a tracking issue is required at this time.
FWIW, I just spun out this issue with some forward thinking ideas/complications related to generational GC and our desire to generally only have O(1)-sized host allocations outside the GC heap: https://github.com/bytecodealliance/wasmtime/issues/13231
fitzgen updated PR #13107.
fitzgen commented on PR #13107:
Rebased on top of
mainand two non-copying-collector-specific bug fixes for pre-existing bugs uncovered by the copying collector (https://github.com/bytecodealliance/wasmtime/pull/13228 and https://github.com/bytecodealliance/wasmtime/pull/13230).Only new commit outside of that is 896b0b0288, which just introduces a few additional tests.
This should be good to land (after another rebase) once those bug fixes land.
fitzgen updated PR #13107.
fitzgen updated PR #13107.
github-actions[bot] added the label cranelift on PR #13107.
fitzgen updated PR #13107.
fitzgen has enabled auto merge for PR #13107.
fitzgen updated PR #13107.
fitzgen added PR #13107 Implement the semi-space copying garbage collector to the merge queue.
:check: fitzgen merged PR #13107.
fitzgen removed PR #13107 Implement the semi-space copying garbage collector from the merge queue.
Last updated: May 03 2026 at 22:13 UTC