fitzgen opened PR #13323 from fitzgen:wasmtime-inline-copying-collector-alloc to bytecodealliance:main:
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
fitzgen requested uweigand for a review on PR #13323.
fitzgen requested wasmtime-compiler-reviewers for a review on PR #13323.
fitzgen requested dicej for a review on PR #13323.
fitzgen requested wasmtime-core-reviewers for a review on PR #13323.
fitzgen unassigned uweigand from PR #13323 Inline the copying collector's bump allocation in Wasm code.
fitzgen requested alexcrichton for a review on PR #13323.
fitzgen requested wasmtime-compiler-reviewers for a review on PR #13323.
github-actions[bot] added the label wasmtime:api on PR #13323.
github-actions[bot] added the label wasmtime:ref-types on PR #13323.
github-actions[bot] commented on PR #13323:
Subscribe to Label Action
cc @fitzgen
<details>
This issue or pull request has been labeled: "wasmtime:api", "wasmtime:ref-types"Thus the following users have been cc'd because of the following labels:
- fitzgen: wasmtime:ref-types
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
:memo: alexcrichton submitted PR review.
:speech_balloon: alexcrichton created PR review comment:
These might be good methods to have on the
PtrSizetrait in case they ever, in the future, need to be parameterized on that?
:speech_balloon: alexcrichton created PR review comment:
Is this by-and-large
emit_gc_raw_allocfrom the drc collector? (modulo some control flow on the edges)
:speech_balloon: alexcrichton created PR review comment:
This is preexisting I think for other collectors, but there's 2 primary ways that this method is UB according to Miri:
- This is deriving a pointer from
&selfwhich is later mutated through in compiled code. That's UB in Rust because by deriving the pointer from&selfyou're saying that the contents can't be mutated. This is a playground example showing the UB (top right, Tools -> Miri). The fix for this specific issue is to change this method to&mut selfand then have the implementation beNonNull::from(&mut self.vmctx_data).cast()- The second problem is that this struture is stored inline with the
CopyingHeap, which means that re-acquiring a mutable pointer toCopyingHeapwill actually invalidate the previous pointer returned by this method. This is a playground example of the UB here.Fixing the latter one is not going to be super easy. I can think a few possible ways:
- Every time we return back to wasm re-acquire the pointer and put it back in the
VMContext. Probably too slow to work.- Place the VM data behind an indirection (e.g.
Box<VMCopyingHeapData>here). The goal is that by using&mut CopyingHeapit doesn't auto-invaildate the vmctx data structures. This alone is not enough, however. Another added fix necessary would be to create the raw pointer at creation ofCopyingHeap, and thenCopyingHeapitself only accesses the data through that pointer. The end result is sort of like this and models what we do withvm::Instanceand its siblingVMContextstorage data that comes after it.I haven't double-checked, but I believe this would all plague the drc/null collectors as well. If you'd like I think it'd be reasonable to fix them all in a subsequent PR.
In terms of detecting all of this, in theory Miri should be able to capture everything here. I think that may mean we're not running much GC-compiled wasms through Miri, which might be a good addition to have? For miri coverage copying this line is probably the easiest. Along those lines
rustup run nightly ./ci/miri-wast.sh ./tests/spec_testsuite/struct.wastlocally shows a Miri violation which may be related to this, but it's always sort of hard to tell where exactly the miri error comes from due to the mix of pulley and lack of strict provenance...
:speech_balloon: alexcrichton created PR review comment:
I think the alias region here is wrong?
:speech_balloon: alexcrichton created PR review comment:
This might be able to use
istore32to avoid theireduceahead of it. (same for updating the bump ptr above)
:speech_balloon: alexcrichton created PR review comment:
How come this became optional? It's passed as
Someat all call-sites it looks like. It seems a bit weird to storeVMSharedTypeIndex::reserved_valuein headers?
:speech_balloon: alexcrichton created PR review comment:
Are you sure about the big/little endian here? That shouldn't swap the order of the fields, but instead just the byte order within the fields?
alexcrichton unassigned dicej from PR #13323 Inline the copying collector's bump allocation in Wasm code.
alexcrichton requested alexcrichton for a review on PR #13323.
fitzgen commented on PR #13323:
FWIW this is a 2% reduction in instructions retired on
splay.wasm, but I haven't been able to measure a statistically significant difference in cycles yet. Will try bumping up the number of iterations.
alexcrichton unassigned alexcrichton from PR #13323 Inline the copying collector's bump allocation in Wasm code.
fitzgen commented on PR #13323:
FWIW this is a 2% reduction in instructions retired on
splay.wasm, but I haven't been able to measure a statistically significant difference in cycles yet. Will try bumping up the number of iterations.With 200 iterations:
execution :: cycles :: benchmarks/splay/splay.wasm Δ = 84634629.66 ± 43938351.82 (confidence = 99%) inline-alloc.so (-Wgc,function-references -Ccollector=copying) is 1.00x to 1.01x faster than main.so (-Wgc,function-references -Ccollector=copying)! [13062747333 13278335414.74 13771371538] inline-alloc.so (-Wgc,function-references -Ccollector=copying) [13062358191 13362970044.40 13846668642] main.so (-Wgc,function-references -Ccollector=copying)
:memo: fitzgen submitted PR review.
:speech_balloon: fitzgen created PR review comment:
We aren't storing into the
VMContextbut we are storing into a vmctx-style structure (VMCopyingHeapData), which we use the same alias region for (same as for e.g.VMStoreContextaccesses).
:memo: fitzgen submitted PR review.
:speech_balloon: fitzgen created PR review comment:
Will handle this in a follow up.
:memo: alexcrichton submitted PR review.
:speech_balloon: alexcrichton created PR review comment:
Oh, another possible fix is like
Store::data_mutwhere provenance of some Miri-safe pointer is used but the address is used like usual, meaning that the actual machine code doesn't change. This will still require an indirection, however, I think.
fitzgen updated PR #13323.
:memo: fitzgen submitted PR review.
:speech_balloon: fitzgen created PR review comment:
Yeah, its a bit funky and probably buggy so lets just do two stores for now (same as the null collector, fwiw).
fitzgen updated PR #13323.
fitzgen requested alexcrichton for a review on PR #13323.
:thumbs_up: alexcrichton submitted PR review:
Any idea what's going on with the splay benchmark? Is it not actually allocating that much? Or is something else so massively dominating that allocation isn't factoring in?
alexcrichton added PR #13323 Inline the copying collector's bump allocation in Wasm code to the merge queue.
github-merge-queue[bot] removed PR #13323 Inline the copying collector's bump allocation in Wasm code from the merge queue.
fitzgen commented on PR #13323:
Any idea what's going on with the splay benchmark? Is it not actually allocating that much? Or is something else so massively dominating that allocation isn't factoring in?
It has a few constants that control the size of the splay tree to create, the number of splay tree mutations to perform once its created, and one other thing that I forget. When porting it to Wasm, I replaced those constants with configurable parameters read from
default.input, and then I also adjusted the default inputs to make executing it take a little longer and match some of our other benchmarks. That last bit involved mostly doing 10x as many (IIRC) mutations to the tree, while keeping it the same size, so this change could have made it less allocation-heavy in an overall relative sense, and exercising GC object accesses more. (IIRC also, trying to make the tree bigger was starting to hit GC OOM).Also, it is written in an "idiomatic" style with lots of little functions, and we don't do any inlining by default. That could probably change things drastically for it.
fitzgen updated PR #13323.
fitzgen has enabled auto merge for PR #13323.
fitzgen added PR #13323 Inline the copying collector's bump allocation in Wasm code to the merge queue.
:check: fitzgen merged PR #13323.
fitzgen removed PR #13323 Inline the copying collector's bump allocation in Wasm code from the merge queue.
fitzgen commented on PR #13323:
Any idea what's going on with the splay benchmark? Is it not actually allocating that much? Or is something else so massively dominating that allocation isn't factoring in?
FWIW, here is a quick
perfprofile ofsplay.wasm:<img width="2042" height="398" alt="image" src="https://github.com/user-attachments/assets/33a26afd-9e5d-4697-a07d-5af5e38d8940" />
Nearly all time is spent inside the collector itself.
Last updated: Jun 01 2026 at 09:49 UTC