Stream: git-wasmtime

Topic: wasmtime / PR #8933 Introduce `wasmtime::StructRef` and a...


view this post on Zulip Wasmtime GitHub notifications bot (Jul 10 2024 at 20:13):

fitzgen requested wasmtime-core-reviewers for a review on PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 10 2024 at 20:13):

fitzgen opened PR #8933 from fitzgen:host-alloc-gc-objects to bytecodealliance:main:

This commit introduces the wasmtime::StructRef type and support for allocating Wasm GC structs from the host. This commit does not add support for the struct.new family of Wasm instructions; guests still cannot allocate Wasm GC objects yet, but initial support should be pretty straightforward after this commit lands.

The StructRef type has everything you expect from other value types in the wasmtime crate:

There are, additionally, methods for getting, setting, and enumerating a StructRef's fields.

To allocate a StructRef, we need proof that the struct type we are allocating is being kept alive for the duration that the allocation may live. This is required for many reasons, but a basic example is getting a struct instance's type from the embedder API: this does a type-index-to-StructType lookup and conversion and if the type wasn't kept alive, then the type-index lookup will result in what is logically a use-after-free bug. This won't be a problem for Wasm guests (when we get around to implementing allocation for them) since their module defines the type, the store holds onto its instances' modules, and the allocation cannot outlive the store. For the host, we need another method of keeping the object's type alive, since it might be that the host defined the type and there is no module that also defined it, let alone such a module that is being kept alive in the store.

The solution to the struct-type-lifetime problem that this commit implements for hosts is for the store to hold a hash set of RegisteredTypes specifically for objects which were allocated via the embedder API. But we also don't want to do a hash lookup on every allocation, so we also implement a StructRefPre type. A StructRefPre is proof that the embedder has inserted a StructType's inner RegisteredType into a store. Structurally, it is a pair of the struct type and a store id. All StructRef allocation methods require a StructRefPre argument, which does a fast store id check, rather than a whole hash table insertion.

I opted to require StructRefPre in all allocation cases -- even though this has the downside of always forcing callers to create one before they allocate, even if they are only allocating a single object -- because of two reasons. First, this avoids needing to define duplicate methods, with and without a StructRefPre argument. Second, this avoids a performance footgun in the API where users don't realize that they can avoid extra work by creating a single StructRefPre and then using it multiple times. Anecdotally, I've heard multiple people complain about instantiation being slower than advertised but it turns out they weren't using InstancePre, and I'd like to avoid that situation for allocation if we can.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Jul 10 2024 at 20:13):

fitzgen requested alexcrichton for a review on PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 10 2024 at 20:15):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 10 2024 at 20:15):

fitzgen created PR review comment:

Note that this is intentionally left for follow up PRs. I want to focus on getting the spec tests passing and all that first, especially since the DRC collector is just the stand in until we eventually grow our semi-space collector.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 10 2024 at 21:44):

github-actions[bot] commented on PR #8933:

Subscribe to Label Action

cc @fitzgen

<details>
This issue or pull request has been labeled: "wasmtime:api", "wasmtime:ref-types"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton submitted PR review:

I'm liking how this is all shaping up, very nice! I've got cosmetic comments here and there but nothing major.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton submitted PR review:

I'm liking how this is all shaping up, very nice! I've got cosmetic comments here and there but nothing major.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

Performing a GC requires &mut though, right? So given &StoreOpaque that's static proof that a GC won't happen?

Musing though: because unchecked_try_gc_ref returns a lifetime bound to &StoreOpaque doesn't that mean that GC is already statically disallowed while we're working with the gc_ref? If that's the case could the unchecked_ prefix be removed? (fine to defer this to a future PR of course)

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

Should this perhaps be hoisted up to the disabled module layer?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

Another source of errors is if fields have the wrong types for allocator, right?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

Should this (and below) perhaps be called unchecked_cast to encourage more scrutiny at callsites?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

I suspect that this looks the same for anyref/arrayref/etc, do you think this could be refactored into a shared method on Rooted perhaps or something like that?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

Oh I thought you made a helper that went from storage type to val type?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

I think that this is double-verifying the type because this type-check happens in write_field as well?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

Could this be made a safe trait with something like:

pub trait PodValueType: Copy {
    const SIZE: usize;
    fn from_le_bytes(bytes: &[u8; Self::SIZE]) -> Self;
    fn to_le_bytes(&self, dst: &mut [u8; Self::SIZE]);
}

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

Above I was thinking that there should be unchecked_cast to require more scrutiny, but now I'm wondering if it might be good to have that and specific methods per-type to cast "safely" to the parent types? Not memory-safety-related-safety but more correctness-related-safety. That way the upcast method could use unchecked_cast internally but unchecked_cast wouldn't be used externally much as the safer upcasts would be used instead.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

I haven't gotten far enough in the diff so this may be more apparent later, but the ? here is a bit suspect while there's uninitialized memory above. What's the consequence of returning an error here with uninitialized memory? Is that ok insofar as the actual reference won't go anywhere and it'll get cleaned up on the next GC?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

Much of this prefix is shared between here and set, would it be possible to refactor that to a shared helper?

Also it feels a little odd where reading fields is always done with the iterator here and field is defined in terms of the iterator while set_field is defined a field-at-a-time. Could the implementation here delegate to field instead of the other way around?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

It took me a bit to work out why this exists given write_field above. I think I understand now in that it has to do with gc barriers and things like that. This leads me to a few questions:

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

To confirm, all data on the gc heap is always initialized, right? We don't have to use MaybeUninit<u8> here?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 15:52):

alexcrichton created PR review comment:

(same for the impls below, and of course ok to defer to a future PR if you'd like)

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 16:46):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 16:46):

fitzgen created PR review comment:

Spun this out to https://github.com/bytecodealliance/wasmtime/issues/8940

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:40):

fitzgen updated PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:44):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:44):

fitzgen created PR review comment:

heh, I wrote the helper only after I'd already open coded it a bunch and I guess I missed this spot

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:46):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:46):

fitzgen created PR review comment:

ah, actually I wrote the helper for wasmtime_types::FieldType not wasmtime::FieldType... we might want to actually do some trait nastiness to dedupe all this stuff at some point...

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:47):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:47):

fitzgen created PR review comment:

oh no nvm again: it is a method on StorageType not FieldType, d'oh

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:50):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:50):

fitzgen created PR review comment:

How are you imagining this? Because this From impl is that safe upcast in my imagination. We could have a trait or something tho too

pub trait Upcast<T> {}

impl Rooted<T> {
    pub fn upcast<U>(self) -> Rooted<U>
    where
        T: Upcast<U>
    {
        // ...
    }
}

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:51):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:51):

fitzgen created PR review comment:

(we would have to seal the trait as well)

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:54):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:54):

fitzgen created PR review comment:

From the Rust perspective: yes. We get it zeroed from mmap. At worst it has stale data from a since-reclaimed GC allocation.

From the Wasm perspective, it could be uninitialized, and we have to initialize it before we expose the object to Wasm.

But to answer the question of MaybeUninit: no, it shouldn't be necessary.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:56):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 17:56):

fitzgen created PR review comment:

I can certainly add more docs clarifying their relationship.

I don't think I want to merge them into a single method, since if the method isn't inlined then that boolean argument will be materialized and branched on at runtime, which seems pretty silly since all call sites will be statically choosing one or the other. I guess I could have a const bool generic parameter... not sure that is actually any nicer than two different methods tho.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:03):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:03):

alexcrichton created PR review comment:

Oh I'm thinking something like:

impl Rooted<StructRef> {
    fn to_anyref(&self) -> Rooted<AnyRef> { self.unchecked_cast() }
}

and repeating that for Rooted<...> as appropriate and having to_foo method for anything that's appropriate

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:04):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:04):

alexcrichton created PR review comment:

None of this is perf-sensitive though, right? If perf matters there's tons of places we'd want to improve I think, chiefly being removing the usage of Val. Regardless though I think more docs is sufficient for now.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:07):

fitzgen created PR review comment:

Ah, gotcha. Yeah I can do that.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:07):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:09):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:09):

fitzgen created PR review comment:

It isn't perf-sensitive yet, but it will likely eventually be, yes.

I'd certainly like to add a TypedStructRefPre or something to that effect, but I didn't want to do too much too eagerly, especially before I even have the spec tests passing.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:13):

fitzgen updated PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:16):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:16):

fitzgen created PR review comment:

Is that ok insofar as the actual reference won't go anywhere and it'll get cleaned up on the next GC?

That would be the case for tracing GCs. For the DRC collector, I think we would actually permanently leak it, since it's refcount would be stuck at 1.

So, good eye. I'll handle these errors locally and free the allocation before bailing out of the function.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:30):

fitzgen updated PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:31):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 18:31):

fitzgen created PR review comment:

Spun out into https://github.com/bytecodealliance/wasmtime/issues/8942 but yeah you're right, this is all basically the same.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 19:32):

fitzgen updated PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 19:35):

fitzgen updated PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 19:51):

fitzgen updated PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 20:37):

fitzgen updated PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 20:37):

fitzgen has enabled auto merge for PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 21:23):

fitzgen requested cfallin for a review on PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 21:23):

fitzgen updated PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 21:23):

fitzgen requested wasmtime-default-reviewers for a review on PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 21:23):

fitzgen has enabled auto merge for PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 22:00):

fitzgen updated PR #8933.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 11 2024 at 22:29):

fitzgen merged PR #8933.


Last updated: Nov 22 2024 at 17:03 UTC