fitzgen requested dicej for a review on PR #8196.
fitzgen requested alexcrichton for a review on PR #8196.
fitzgen opened PR #8196 from fitzgen:i31ref
to bytecodealliance:main
:
This is still a WIP. I think architecture and everything is there, although there are certain things to improve upon still, like the pooling allocator integration and the question of who allocates the memory used in a GC heap, but I think that can happen in follow up PRs. The big thing is that there are still some tests failing, a bunch new tests that need to be written, and at least one blocker (https://github.com/bytecodealliance/wasmtime/issues/8180) to be fixed. Also I need to rebase, which will be fun given all the churn related to tables recently. However, this is big enough that I think those things can happen in parallel with review on the main bits and the architecture and all that.
The
GcRuntime
andGcCompiler
TraitsThis commit factors out the details of the garbage collector away from the rest of the runtime and the compiler. It does this by introducing two new traits, very similar to a subset of [those proposed in the Wasm GC RFC], although not all equivalent functionality has been added yet because Wasmtime doesn't support, for example, GC structs yet:
[those proposed in the Wasm GC RFC]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasm-gc.md#defining-the-pluggable-gc-interface
The
GcRuntime
trait: This trait defines how to create new GC heaps, run collections within them, and execute the various GC barriers the collector requires.Rather than monomorphize all of Wasmtime on this trait, we use it as a dynamic trait object. This does imply some virtual call overhead and missing some inlining (and resulting post-inlining) optimization opportunities. However, it is much less disruptive to the existing embedder API, results in a cleaner embedder API anyways, and we don't believe that VM runtime/embedder code is on the hot path for working with the GC at this time anyways (that would be the actual Wasm code, which has inlined GC barriers and direct calls and all of that). In the future, once we have optimized enough of the GC that such code is ever hot, we have options we can investigate at that time to avoid these dynamic virtual calls, like only enabling one single collector at build time and then creating a static type alias like
type TheOneGcImpl = ...;
based on the compile time configuration, and using this type alias in the runtime rather than a dynamic trait object.The
GcRuntime
trait additionally defines a method to reset a GC heap, for use by the pooling allocator. This allows reuse of GC heaps across different stores. This integration is very rudimentary at the moment, and is missing all kinds of configuration knobs that we should have before deploying Wasm GC in production. This commit is large enough as it is already! Ideally, in the future, I'd like to make it so that GC heaps receive their memory region, rather than allocate/reserve it themselves, and let each slot in the pooling allocator's memory pool be either a linear memory or a GC heap. This would unask various capacity planning questions such as "what percent of memory capacity should we dedicate to linear memories vs GC heaps?". It also seems like basically all the same configuration knobs we have for linear memories apply equally to GC heaps (see also the "Indexed Heaps" section below).The
GcCompiler
trait: This trait defines how to emit CLIF that implements GC barriers for various operations on GC-managed references. The Rust code calls into this trait dynamically via a trait object, but since it is customizing the CLIF that is generated for Wasm code, the Wasm code itself is not making dynamic, indirect calls for GC barriers. TheGcCompiler
implementation can inline the parts of GC barrier that it believes should be inline, and leave out-of-line calls to rare slow paths.All that said, there is still only a single implementation of each of these traits: the existing deferred reference-counting (DRC) collector. So there is a bunch of code motion in this commit as the DRC collector was further isolated from the rest of the runtime and moved to its own submodule. That said, this was not purely code motion (see "Indexed Heaps" below) so it is worth not simply skipping over the DRC collector's code in review.
Indexed Heaps
This commit does bake in a couple assumptions that must be shared across all collector implementations, such as a shared
VMGcHeader
that all objects allocated within a GC heap must begin with, but the most notable and far-reaching of these assumptions is that all collectors will use "indexed heaps".What we are calling indexed heaps are basically the three following invariants:
All GC heaps will be a single contiguous region of memory, and all GC objects will be allocated within this region of memory. The collector may ask the system allocator for additional memory, e.g. to maintain its free lists, but GC objects themselves will never be allocated via
malloc
.A pointer to a GC-managed object (i.e. a
VMGcRef
) is a 32-bit offset into the GC heap's contiguous region of memory. We never hold raw pointers to GC objects (although, of course, we have to compute them and use them temporarily when actually accessing objects). This means that deref'ing GC pointers is equivalent to deref'ing linear memory pointers: we need to add a base and we also check that the GC pointer/index is within the bounds of the GC heap. Furthermore, compressing 64-bit pointers into 32 bits is a fairly common technique among high-performance GC implementations[^compressed-oops][^v8-ptr-compression] so we are in good company.Anything stored inside the GC heap is untrusted. Even each GC reference that is an element of an
(array (ref any))
is untrusted, and bounds checked on access. This means that, for example, we do not store the raw pointer to anexternref
's host object inside the GC heap. Instead anexternref
now stores an ID that can be used to index into a side table in the store that holds the actualBox<dyn Any>
host object, and accessing that side table is always checked.[^compressed-oops]: See "Compressed OOPs" in OpenJDK.
[^v8-ptr-compression]: See V8's pointer compression.
The good news with regards to all the bounds checking that this scheme implies is that we can use all the same virtual memory tricks that linear memories use to omit explicit bounds checks. Additionally, (2) means that the sizes of GC objects is that much smaller (and therefore that much more cache friendly) because they are only holding onto 32-bit, rather than 64-bit, references to other GC objects. (We can, in the future, support GC heaps up to 16GiB in size without losing 32-bit GC pointers by taking advantage of
VMGcHeader
alignment and storing aligned indices rather than byte indices, while still leaving the bottom bit available for tagging as ani31ref
discriminant. Should we ever need to support even larger GC heap capacities, we could go to full 64-bit references, but we would need explicit bounds checks.)The biggest benefit of indexed heaps is that, because we are (explicitly or implicitly) bounds checking GC heap accesses, and because we are not otherwise trusting any data from inside the GC heap, we greatly reduce how badly things can go wrong in the face of collector bugs and GC heap corruption. We are essentially sandboxing the GC heap region, the same way that linear memory is a sandbox. GC bugs could lead to the guest program accessing the wrong GC object, or getting garbage data from within the GC heap. But only garbage data from within the GC heap, never outside it. The worse that could happen would be if we decided not to zero out GC heaps between reuse across stores (which is a valid trade off to make, since zeroing a GC heap is a defense-in-depth technique similar to zeroing a Wasm stack and not semantically visible in the absence of GC bugs) and then a GC bug would allow the current Wasm guest to read old GC data from the old Wasm guest that previously used this GC heap. But again, it could never access host data.
Taken altogether, this allows for collector implementations that are nearly free from
unsafe
code, and unsafety can otherwise be targeted and limited in scope, such as interactions with JIT code. Most importantly, we do not have to maintain critical invariants across the whole system -- invariants which can't be nicely encapsulated or abstracted -- to preserve memory safety. Such holistic invariants that refuse encapsulation are otherwise generally a huge safety problem with GC implementations.
VMGcRef
is NOTClone
orCopy
Anymore
VMGcRef
used to beClone
andCopy
. It is not anymore. The motivation here was to be sure that I was actually calling GC barriers at all the correct places. I couldn't be sure before. Now, you can still explicitly copy a raw GC reference without running GC barriers if you need to and understand why that's okay (aka you are implementing the collector), but that is something you have to opt into explicitly by callingunchecked_copy
. The default now is that you can't just copy the reference, and instead call an explicitclone
method (not theClone
trait, because we need to pass in the GC heap context to run the GC barriers) and it is hard to forget to do that accidentally. This resulted in a pretty big amount of churn, but I am wayyyyyy more confident that the correct GC barriers are called at the correct times now than I was before.
i31ref
I started this commit by trying to add
i31ref
support. And it grew into the whole traits interface because I found that I needed to abstract GC barriers into helpers anyways to avoid running them
[message truncated]
fitzgen requested cfallin for a review on PR #8196.
fitzgen requested wasmtime-compiler-reviewers for a review on PR #8196.
fitzgen requested wasmtime-core-reviewers for a review on PR #8196.
fitzgen requested wasmtime-default-reviewers for a review on PR #8196.
github-actions[bot] commented on PR #8196:
Subscribe to Label Action
cc @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:wasm", "fuzzing"Thus the following users have been cc'd because of the following labels:
- fitzgen: fuzzing
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
fitzgen updated PR #8196.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
This I think is accidentally doubling the size of the table
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen updated PR #8196.
dicej submitted PR review:
Looks great! I'm excited to see this move forward.
Please see a few inline comments and suggestions. The only one that might be a blocker is the cast from a pointer to a 64-bit value to a pointer to a 32-bit value due to endianness concerns. Based on our earlier conversation, sounds like you're planning to get rid of that anyway.
dicej submitted PR review:
Looks great! I'm excited to see this move forward.
Please see a few inline comments and suggestions. The only one that might be a blocker is the cast from a pointer to a 64-bit value to a pointer to a 32-bit value due to endianness concerns. Based on our earlier conversation, sounds like you're planning to get rid of that anyway.
dicej created PR review comment:
See above regarding wording.
dicej created PR review comment:
Can we clarify the wording here given the r64/NonZeroU32 duality?
dicej created PR review comment:
The phrasing in this comment is slightly confusing ("will no longer be true", yet the value currently returned is
false
). Perhaps something like "Once we support concrete struct and array types, we'll need to look at the payload to determine whether the type is GC-managed."
dicej created PR review comment:
/// will lead to general incorrectness such as panics or wrong results.
dicej created PR review comment:
Should there be pseudo-code here to update the bump region
next
position?
dicej created PR review comment:
Replace this cast (from ptr to 64-bit value to ptr to 32-bit value) with something safer/portable.
dicej created PR review comment:
See my review comment for crates/types/src/lib.rs above.
alexcrichton submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Yeah I think we want to keep
Store
allocation infallible if we can, so I think it'd be best to defer fallible work to when it's required rather than needing the unwrap here.
alexcrichton created PR review comment:
If this
catch_unwind
is only here to restoreinstances
, can this be done with aDrop
rather than catching unwind?
alexcrichton created PR review comment:
Isn't the
TODO
here still applicable? Shouldn't funcrefs do the contra/covariance thing?
alexcrichton created PR review comment:
To avoid new/new_async, could the auto-gc here be pushed up to the caller, e.g. communicated through the
Result
?
fitzgen submitted PR review.
fitzgen created PR review comment:
To clarify, you mean define a new error type like
GcHeapOutOfMemory
and return that, allowing callers to decide whether they want to do a GC and try again, rather than automatically doing a GC on their behalf?
fitzgen submitted PR review.
fitzgen created PR review comment:
I guess the original TODO, in the context of the typed function references proposal which did not introduce function type subtyping, is resolved since we have canonicalized types at this point now. But we have a new TODO for Wasm GC, which introduces function type subtyping, to do subtyping checks. I can add this second TODO.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Indeed!
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen submitted PR review.
fitzgen created PR review comment:
Hm actually I'm not sure how this would work, because if the
externref
allocation fails, we need to give the host value back to the caller. We could stuff it back into theGcHeapOutOfMemory
error, but then we would need a type parameter on the error type, and things start to get pretty funky. Not completely unworkable, but pretty funky. And then we need a different error type for non-externref
errors that don't have a host value to return to the caller on OOM.Do you still think it is worth pursuing this?
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Personally if you're ok with it I'd prefer to stick to the route of a custom error, basicaly
pub GcHeapOutOfMemory<T>(pub T)
. Otherwise I'd fear that there's little chance of embedders remembering to usenew_async
vsnew
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen has enabled auto merge for PR #8196.
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen requested wasmtime-fuzz-reviewers for a review on PR #8196.
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen has enabled auto merge for PR #8196.
github-actions[bot] commented on PR #8196:
Subscribe to Label Action
cc @peterhuene
<details>
This issue or pull request has been labeled: "wasmtime:c-api"Thus the following users have been cc'd because of the following labels:
- peterhuene: wasmtime:c-api
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen has enabled auto merge for PR #8196.
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen updated PR #8196.
fitzgen merged PR #8196.
Last updated: Nov 22 2024 at 16:03 UTC