fitzgen requested wasmtime-core-reviewers for a review on PR #8933.
fitzgen opened PR #8933 from fitzgen:host-alloc-gc-objects
to bytecodealliance:main
:
This commit introduces the
wasmtime::StructRef
type and support for allocating Wasm GC structs from the host. This commit does not add support for thestruct.new
family of Wasm instructions; guests still cannot allocate Wasm GC objects yet, but initial support should be pretty straightforward after this commit lands.The
StructRef
type has everything you expect from other value types in thewasmtime
crate:
A method to get its type or check whether it matches a given type
An implementation of
WasmTy
so that it can be used withFunc::wrap
-style APIsThe ability to upcast it into an
AnyRef
and to do checked downcasts in the opposite directionThere are, additionally, methods for getting, setting, and enumerating a
StructRef
's fields.To allocate a
StructRef
, we need proof that the struct type we are allocating is being kept alive for the duration that the allocation may live. This is required for many reasons, but a basic example is getting a struct instance's type from the embedder API: this does a type-index-to-StructType
lookup and conversion and if the type wasn't kept alive, then the type-index lookup will result in what is logically a use-after-free bug. This won't be a problem for Wasm guests (when we get around to implementing allocation for them) since their module defines the type, the store holds onto its instances' modules, and the allocation cannot outlive the store. For the host, we need another method of keeping the object's type alive, since it might be that the host defined the type and there is no module that also defined it, let alone such a module that is being kept alive in the store.The solution to the struct-type-lifetime problem that this commit implements for hosts is for the store to hold a hash set of
RegisteredType
s specifically for objects which were allocated via the embedder API. But we also don't want to do a hash lookup on every allocation, so we also implement aStructRefPre
type. AStructRefPre
is proof that the embedder has inserted aStructType
's innerRegisteredType
into a store. Structurally, it is a pair of the struct type and a store id. AllStructRef
allocation methods require aStructRefPre
argument, which does a fast store id check, rather than a whole hash table insertion.I opted to require
StructRefPre
in all allocation cases -- even though this has the downside of always forcing callers to create one before they allocate, even if they are only allocating a single object -- because of two reasons. First, this avoids needing to define duplicate methods, with and without aStructRefPre
argument. Second, this avoids a performance footgun in the API where users don't realize that they can avoid extra work by creating a singleStructRefPre
and then using it multiple times. Anecdotally, I've heard multiple people complain about instantiation being slower than advertised but it turns out they weren't usingInstancePre
, and I'd like to avoid that situation for allocation if we can.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
fitzgen requested alexcrichton for a review on PR #8933.
fitzgen submitted PR review.
fitzgen created PR review comment:
Note that this is intentionally left for follow up PRs. I want to focus on getting the spec tests passing and all that first, especially since the DRC collector is just the stand in until we eventually grow our semi-space collector.
github-actions[bot] commented on PR #8933:
Subscribe to Label Action
cc @fitzgen
<details>
This issue or pull request has been labeled: "wasmtime:api", "wasmtime:ref-types"Thus the following users have been cc'd because of the following labels:
- fitzgen: wasmtime:ref-types
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
alexcrichton submitted PR review:
I'm liking how this is all shaping up, very nice! I've got cosmetic comments here and there but nothing major.
alexcrichton submitted PR review:
I'm liking how this is all shaping up, very nice! I've got cosmetic comments here and there but nothing major.
alexcrichton created PR review comment:
Performing a GC requires
&mut
though, right? So given&StoreOpaque
that's static proof that a GC won't happen?Musing though: because
unchecked_try_gc_ref
returns a lifetime bound to&StoreOpaque
doesn't that mean that GC is already statically disallowed while we're working with thegc_ref
? If that's the case could theunchecked_
prefix be removed? (fine to defer this to a future PR of course)
alexcrichton created PR review comment:
Should this perhaps be hoisted up to the
disabled
module layer?
alexcrichton created PR review comment:
Another source of errors is if
fields
have the wrong types forallocator
, right?
alexcrichton created PR review comment:
Should this (and below) perhaps be called
unchecked_cast
to encourage more scrutiny at callsites?
alexcrichton created PR review comment:
I suspect that this looks the same for anyref/arrayref/etc, do you think this could be refactored into a shared method on
Rooted
perhaps or something like that?
alexcrichton created PR review comment:
Oh I thought you made a helper that went from storage type to val type?
alexcrichton created PR review comment:
I think that this is double-verifying the type because this type-check happens in
write_field
as well?
alexcrichton created PR review comment:
Could this be made a safe trait with something like:
pub trait PodValueType: Copy { const SIZE: usize; fn from_le_bytes(bytes: &[u8; Self::SIZE]) -> Self; fn to_le_bytes(&self, dst: &mut [u8; Self::SIZE]); }
alexcrichton created PR review comment:
Above I was thinking that there should be
unchecked_cast
to require more scrutiny, but now I'm wondering if it might be good to have that and specific methods per-type to cast "safely" to the parent types? Not memory-safety-related-safety but more correctness-related-safety. That way the upcast method could useunchecked_cast
internally butunchecked_cast
wouldn't be used externally much as the safer upcasts would be used instead.
alexcrichton created PR review comment:
I haven't gotten far enough in the diff so this may be more apparent later, but the
?
here is a bit suspect while there's uninitialized memory above. What's the consequence of returning an error here with uninitialized memory? Is that ok insofar as the actual reference won't go anywhere and it'll get cleaned up on the next GC?
alexcrichton created PR review comment:
Much of this prefix is shared between here and
set
, would it be possible to refactor that to a shared helper?Also it feels a little odd where reading fields is always done with the iterator here and
field
is defined in terms of the iterator whileset_field
is defined a field-at-a-time. Could the implementation here delegate tofield
instead of the other way around?
alexcrichton created PR review comment:
It took me a bit to work out why this exists given
write_field
above. I think I understand now in that it has to do with gc barriers and things like that. This leads me to a few questions:
- Could the documentation be updated to explicitly discuss the difference between this method and
write_field
?- Would it be possible to change these to a single method? Perhaps with a boolean flag indicating if the previous data is "valid"?
alexcrichton created PR review comment:
To confirm, all data on the gc heap is always initialized, right? We don't have to use
MaybeUninit<u8>
here?
alexcrichton created PR review comment:
(same for the impls below, and of course ok to defer to a future PR if you'd like)
fitzgen submitted PR review.
fitzgen created PR review comment:
Spun this out to https://github.com/bytecodealliance/wasmtime/issues/8940
fitzgen updated PR #8933.
fitzgen submitted PR review.
fitzgen created PR review comment:
heh, I wrote the helper only after I'd already open coded it a bunch and I guess I missed this spot
fitzgen submitted PR review.
fitzgen created PR review comment:
ah, actually I wrote the helper for
wasmtime_types::FieldType
notwasmtime::FieldType
... we might want to actually do some trait nastiness to dedupe all this stuff at some point...
fitzgen submitted PR review.
fitzgen created PR review comment:
oh no nvm again: it is a method on
StorageType
notFieldType
, d'oh
fitzgen submitted PR review.
fitzgen created PR review comment:
How are you imagining this? Because this
From
impl is that safe upcast in my imagination. We could have a trait or something tho toopub trait Upcast<T> {} impl Rooted<T> { pub fn upcast<U>(self) -> Rooted<U> where T: Upcast<U> { // ... } }
fitzgen submitted PR review.
fitzgen created PR review comment:
(we would have to seal the trait as well)
fitzgen submitted PR review.
fitzgen created PR review comment:
From the Rust perspective: yes. We get it zeroed from mmap. At worst it has stale data from a since-reclaimed GC allocation.
From the Wasm perspective, it could be uninitialized, and we have to initialize it before we expose the object to Wasm.
But to answer the question of
MaybeUninit
: no, it shouldn't be necessary.
fitzgen submitted PR review.
fitzgen created PR review comment:
I can certainly add more docs clarifying their relationship.
I don't think I want to merge them into a single method, since if the method isn't inlined then that boolean argument will be materialized and branched on at runtime, which seems pretty silly since all call sites will be statically choosing one or the other. I guess I could have a
const bool
generic parameter... not sure that is actually any nicer than two different methods tho.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Oh I'm thinking something like:
impl Rooted<StructRef> { fn to_anyref(&self) -> Rooted<AnyRef> { self.unchecked_cast() } }
and repeating that for
Rooted<...>
as appropriate and havingto_foo
method for anything that's appropriate
alexcrichton submitted PR review.
alexcrichton created PR review comment:
None of this is perf-sensitive though, right? If perf matters there's tons of places we'd want to improve I think, chiefly being removing the usage of
Val
. Regardless though I think more docs is sufficient for now.
fitzgen created PR review comment:
Ah, gotcha. Yeah I can do that.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
It isn't perf-sensitive yet, but it will likely eventually be, yes.
I'd certainly like to add a
TypedStructRefPre
or something to that effect, but I didn't want to do too much too eagerly, especially before I even have the spec tests passing.
fitzgen updated PR #8933.
fitzgen submitted PR review.
fitzgen created PR review comment:
Is that ok insofar as the actual reference won't go anywhere and it'll get cleaned up on the next GC?
That would be the case for tracing GCs. For the DRC collector, I think we would actually permanently leak it, since it's refcount would be stuck at
1
.So, good eye. I'll handle these errors locally and free the allocation before bailing out of the function.
fitzgen updated PR #8933.
fitzgen submitted PR review.
fitzgen created PR review comment:
Spun out into https://github.com/bytecodealliance/wasmtime/issues/8942 but yeah you're right, this is all basically the same.
fitzgen updated PR #8933.
fitzgen updated PR #8933.
fitzgen updated PR #8933.
fitzgen updated PR #8933.
fitzgen has enabled auto merge for PR #8933.
fitzgen requested cfallin for a review on PR #8933.
fitzgen updated PR #8933.
fitzgen requested wasmtime-default-reviewers for a review on PR #8933.
fitzgen has enabled auto merge for PR #8933.
fitzgen updated PR #8933.
fitzgen merged PR #8933.
Last updated: Dec 23 2024 at 12:05 UTC