wasmtime / PR #7812 wasi-io: Reimplement wasi-io/poll usi... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #7812 wasi-io: Reimplement wasi-io/poll usi...

Wasmtime GitHub notifications bot (Jan 24 2024 at 20:15):

badeend requested fitzgen for a review on PR #7812.

Wasmtime GitHub notifications bot (Jan 24 2024 at 20:15):

badeend opened PR #7812 from badeend:pollable to bytecodealliance:main:

Prior discussion: https://bytecodealliance.zulipchat.com/#narrow/stream/217126-wasmtime/topic/Change.20Subscribe.20trait

Renamed the existing Pollable struct to PollableResource

Reimplemented wasi-io/poll. This introduces a new Pollable trait which is lower level, doesn't require heap allocations to poll, has mutable access to the WasiView, and can be used as a standalone resource without a parent. The Subscribe trait is kept intact, but this is now a utility interface, implemented in terms of Pollable.

Eliminate the (now) unnecessary surrogate parent resource of clock pollables

Added ResourceTable take & restore as a general purpose replacement for iter_entries. That one was used only by the old poll implementation.

Additionally:

@pchickey Forbid empty poll list. Fixes: WebAssembly/wasi-io#67

Wasmtime GitHub notifications bot (Jan 24 2024 at 20:15):

badeend requested wasmtime-core-reviewers for a review on PR #7812.

Wasmtime GitHub notifications bot (Jan 24 2024 at 20:15):

badeend requested wasmtime-default-reviewers for a review on PR #7812.

Wasmtime GitHub notifications bot (Jan 24 2024 at 21:45):

github-actions[bot] commented on PR #7812:

Subscribe to Label Action

cc @peterhuene

<details>
This issue or pull request has been labeled: "wasi", "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Jan 24 2024 at 22:01):

fitzgen requested sunfishcode for a review on PR #7812.

Wasmtime GitHub notifications bot (Jan 24 2024 at 22:51):

pchickey requested pchickey for a review on PR #7812.

Wasmtime GitHub notifications bot (Jan 25 2024 at 00:05):

pchickey submitted PR review:

Thanks, this is excellent work. The new internal interface is definitely superior to the old one, and I appreciate the new tests as well. The lease system for resource table is a much better interface than iter_entries.

I believe that all of the Slot and SlotIdentity implementation makes sense and is correct (especially because, surprisingly to me, the only unsafe is in unsafe impl Send), but I wanted to tag @alexcrichton to double-check that part because it is not code I feel super confident in my ability to assess that code. If he is happy with that, this can land.

Wasmtime GitHub notifications bot (Jan 25 2024 at 00:05):

pchickey submitted PR review:

Thanks, this is excellent work. The new internal interface is definitely superior to the old one, and I appreciate the new tests as well. The lease system for resource table is a much better interface than iter_entries.

I believe that all of the Slot and SlotIdentity implementation makes sense and is correct (especially because, surprisingly to me, the only unsafe is in unsafe impl Send), but I wanted to tag @alexcrichton to double-check that part because it is not code I feel super confident in my ability to assess that code. If he is happy with that, this can land.

Wasmtime GitHub notifications bot (Jan 25 2024 at 00:05):

pchickey created PR review comment:

I agree we need this change in the wits, lets just be sure to upstream these to the spec repo as well. We will come up with some process for how we keep the docs evolving and improving while assuring that the interface itself doesn't change.

Wasmtime GitHub notifications bot (Jan 25 2024 at 00:05):

pchickey created PR review comment:

This is a very bikesheddy suggestion, so feel free to disregard it, but is BoxPollable a better name for this? That way Resource<PollableResource> isnt repeating the word.

Wasmtime GitHub notifications bot (Jan 25 2024 at 00:08):

pchickey requested alexcrichton for a review on PR #7812.

Wasmtime GitHub notifications bot (Jan 25 2024 at 01:25):

alexcrichton submitted PR review:

Thanks for this! I've left a few comments but it's getting a bit later here so I'm going to head out. I want to read more about the Lease<T> though as that looks quite subtle and I need to think more about it.

Wasmtime GitHub notifications bot (Jan 25 2024 at 01:25):

alexcrichton submitted PR review:

Thanks for this! I've left a few comments but it's getting a bit later here so I'm going to head out. I want to read more about the Lease<T> though as that looks quite subtle and I need to think more about it.

Wasmtime GitHub notifications bot (Jan 25 2024 at 01:25):

alexcrichton created PR review comment:

There's quite a lot of errors which go through the Err here so I'm a bit wary of using unwrap_err to assert that a particular trap happened, could this instead a particular property about the error, such as its message?

Wasmtime GitHub notifications bot (Jan 25 2024 at 01:25):

alexcrichton created PR review comment:

This I think is actually subtly incorrect because it drops the future after the call to poll which signals that the future should be cancelled rather than keeping it alive until the whole poll is done. I think that means that we may not be guaranteed to get wakeups from cancelled futures, although we might get those for now given how the code is currently constructed. I think that this'll need to be a bit fancier with the adapter instead of having a blanket impl

Wasmtime GitHub notifications bot (Jan 25 2024 at 01:25):

alexcrichton created PR review comment:

Could this, and the functions below, be implemented in terms of poll_ready_fn perhaps?

Wasmtime GitHub notifications bot (Jan 25 2024 at 02:11):

pchickey submitted PR review:

Thanks, this is excellent work. The new internal interface is definitely superior to the old one, and I appreciate the new tests as well. The lease system for resource table is a much better interface than iter_entries.

I believe that all of the Slot and SlotIdentity implementation makes sense and is correct (especially because, surprisingly to me, the only unsafe is in unsafe impl Send), but I wanted to tag @alexcrichton to double-check that part because I do not feel super confident in my ability to assess that code. If he is happy with that, this can land.

Wasmtime GitHub notifications bot (Jan 25 2024 at 08:54):

badeend updated PR #7812.

Wasmtime GitHub notifications bot (Jan 25 2024 at 08:56):

badeend submitted PR review.

Wasmtime GitHub notifications bot (Jan 25 2024 at 08:56):

badeend created PR review comment:

Certainly

Wasmtime GitHub notifications bot (Jan 25 2024 at 08:59):

badeend submitted PR review.

Wasmtime GitHub notifications bot (Jan 25 2024 at 08:59):

badeend created PR review comment:

Check!

Wasmtime GitHub notifications bot (Jan 26 2024 at 19:55):

alexcrichton submitted PR review:

Ok I've gotten a chance now to take a closer look at Lease<T> and the changes ResourceTable. Given this new API design for the pollable trait something along these lines is required (e.g. iter_children can't work any more). The specific implementation here I think has a drawback where it's very "panicky" if you get it wrong. Most other aspects of WASI are "error-y" in that they try to return traps if anything is gotten wrong. This I think is pretty important for not accidentally becoming a DoS vector for embeddings. For example a panicking Drop implementation means that an early-return in an embedder function might accidentally take down the whole process where a wasm trap would only take down a single instance.

Given that my main thought on this is that this should ideally not ever panic and instead should switch to returning errors where possible. I might also recommending going a little bit further perhaps with a scheme such as:

Leave TableEntry::entry as Box<dyn Any>.

Change take to returning Box<T>. This would replace entry with something like Box::new(Tombstone) which is a private type to this module.

Change restore to taking Box<T> and Resource<T>.

That way it's largely up to embedders to "get everything right" but they'd already be required to do so with this current API. Additionally any failed downcasts can additionally add a check for Tombstone to perhaps return a more precise error other than ResourceTableError::WrongType with a new variant such as TakenValue.

Wasmtime GitHub notifications bot (Jan 26 2024 at 20:57):

badeend commented on PR #7812:

Thanks for the feedback.
I agree on the "panicky" point. I'll add an error type and remove the panics.

One thing that's not in this PR, but I assume will most likely be added at some point, are untyped take_any and restore_any variants. The drawback of reverting the Lease & SlotIdentity design, is that the restore(_any) API becomes (even) easier to misuse. Because then the consumer can restore any value at the index of a previously differently typed entry. I'm worried about the developer experience of this, as the place where they encounter the WrongType errors could be miles away from where the problem actually is.

Anyway, I'm fine with your suggestions. I just wanted to make sure the trade-offs are known.

Wasmtime GitHub notifications bot (Jan 26 2024 at 21:00):

badeend edited a comment on PR #7812:

Thanks for the feedback.
I agree on the "panicky" point. I'll add an error type and remove the panics.

One thing that's not in this PR, but I assume will most likely be added at some point, are untyped take_any and restore_any variants. The drawback of reverting the Lease & SlotIdentity design, is that the restore(_any) API becomes (even) easier to misuse. Because then the consumer can restore any value at the index of a previously differently typed entry. I'm worried about the developer experience of this, as the corruption would happen silently and the place where they encounter the WrongType errors could be miles away from where the problem actually is.

Anyway, I'm fine with your suggestions. I just wanted to make sure the trade-offs are known.

Wasmtime GitHub notifications bot (Jan 26 2024 at 21:25):

alexcrichton commented on PR #7812:

Hm ok, for that I think you have a good point about accidentally messing up these APIs. I think this may still be surmountable perhaps with some trickery, but I'd also need to see the usage of take_any and restore to know better. Want to discuss on a future PR with that implemented or hash it out here? (I'm fine either way)

Wasmtime GitHub notifications bot (Feb 01 2024 at 16:37):

badeend updated PR #7812.

Wasmtime GitHub notifications bot (Feb 01 2024 at 16:37):

badeend created PR review comment:

https://github.com/WebAssembly/wasi-io/pull/69

Wasmtime GitHub notifications bot (Feb 01 2024 at 16:37):

badeend submitted PR review.

Wasmtime GitHub notifications bot (Feb 01 2024 at 16:37):

badeend submitted PR review.

Wasmtime GitHub notifications bot (Feb 01 2024 at 16:37):

badeend created PR review comment:

The design has changed in the meantime. Now it's not just a box anymore, so I went with PollableHandle.

Wasmtime GitHub notifications bot (Feb 01 2024 at 16:38):

badeend submitted PR review.

Wasmtime GitHub notifications bot (Feb 01 2024 at 16:38):

badeend created PR review comment:

The tests were failing precisely because of that. To fix it, I ended up rougly back at the Subscribe design, but this time returning a custom WasiFuture type that has an additional WasiView parameter on its poll method. To prevent scope creep of this PR, I kept Subscribe alive for now (as PollableAsync). But in the long run, I don't think there's much value in havin them both, as Subscribe can be trivially converted to Pollable from:
#[async_trait::async_trait]
impl PollableAsync for HostFutureIncomingResponse {
    async fn ready(&mut self) {
        if let Self::Pending(handle) = self {
            *self = Self::Ready(handle.await);
        }
    }
}
to:
impl Pollable for HostFutureIncomingResponse {
    fn ready<'a>(&'a mut self) -> Pin<Box<dyn WasiFuture<Output = ()> + Send + 'a>> {
        Box::pin(async {
            if let Self::Pending(handle) = self {
                *self = Self::Ready(handle.await);
            }
        })
    }
}
One obvious downside is the added visual noise.

Wasmtime GitHub notifications bot (Feb 01 2024 at 16:38):

badeend commented on PR #7812:

I chose to go with a hybrid approach. For the public API, I changed it to what you suggested. Internally, I removed Lease & SlotIdentity. But I kept Slot to perform the resource type check. Also, as part of the updated design (see above) I needed take_any and restore_any so I included those as well.

Wasmtime GitHub notifications bot (Feb 05 2024 at 17:12):

alexcrichton commented on PR #7812:

Reading over this and see how this all turned out, I'm personally starting to get second thoughts on this. We're effectively reimplementing our own Future trait and as I'm sure you've seen we start implementing our own primitive functions (e.g. poll_fn) as well as we can't use standard things like async fn or #[async_trait]. I'm a bit worried that the direction this is taking us is straying off the path of maintainability for async support as things get more advanced over time.

Now that's all easy to say but this PR is still solving a concrete problem which is letting implementations access resources while polling, so I don't think simply closing this PR is an option. That being said after having read over this I wonder if there's perhaps an alternative implementation route that we can take.

Originally when designing Future-the-trait we ran into this issue of situations wanting to pass more context along through the poll method but the context doesn't survive longer than a single call to poll. To do that we ended up creating task-local variables which are like thread locals but instead stick with a task. That doesn't solve the immediate problem at hand though since you want mutable access, not just readable access.

To solve the mutability problem I realized that the take/restore bits look like Option<T> and so they've already got runtime state associated with them. One alternative would be to use a RefCell<T> instead and effectively repurpose that runtime state. That would enable acquiring &mut T from &ResourceTable so long as it's done "correctly" which is basically already the situation we have today (make sure you restore after you take).

How would you feel about something like that? We could still preserve get_mut as a method which has no runtime overhead (apart from storage space) but I'm imagining that a borrow_mut() method would be added. I'll note that get would have to go away in this world and be replaced with borrow() as a consequence, which likely affects code we have today.

While RefCell is unlikely to win any award for being the most ergonomic thing in the world this feels like it might provide a better tradeoff because we wouldn't fall off the well-trodden-path of Rust async into custom traits and such. I would want to make sure it works for your use case though.

I also realize though that you've probably already put in a great deal of work to this PR with 2 versions now so I'm hesitant to ask for a third. I'd be happy to help sketch this out and do some of the refactoring work to see if I feel like it's going to pay off.

Wasmtime GitHub notifications bot (Feb 05 2024 at 22:57):

badeend commented on PR #7812:

I think you mean changing
async fn ready(&mut self);
to
async fn ready(&mut self, table: &ResourceTable);
right?

That doesn;t work because &ResourceTable is not Send as ResourceTable is not Sync.

Wasmtime GitHub notifications bot (Feb 09 2024 at 11:17):

alexcrichton commented on PR #7812:

Good point, yes, I'm more-or-less saying we should do that. (either that or use a task-local but I think that still captures &T).

Mind trying make ResourceTable implement Sync? I think that's probably the addition of a few trait bounds in its internal trait objects. I think everything we put in there is already Sync although if it things aren't currently Sync that'll pose a larger problem.

Wasmtime GitHub notifications bot (Feb 11 2024 at 12:06):

badeend updated PR #7812.

Wasmtime GitHub notifications bot (Feb 11 2024 at 13:23):

badeend commented on PR #7812:

I understand your concerns, yet I'd rather not go for round three right now, which would include reverting https://github.com/bytecodealliance/wasmtime/pull/7802. So instead, I've changed the questionable types to be private to the poll.rs module. That way, all the iffy-ness is contained to just a single file that we can iterate on later. From the outside nothing significant has changed, except that now I can use poll_ready_fn, which is what I personally was after.

Wasmtime GitHub notifications bot (Feb 11 2024 at 14:26):

badeend edited a comment on PR #7812:

I understand your concerns, yet I'd rather not go for round three right now, which would include reverting https://github.com/bytecodealliance/wasmtime/pull/7802. So instead, I've changed the questionable types to be private to the poll.rs module. That way, all the iffy-ness is contained to just a single file that we can iterate on later. From the outside nothing significant has changed, except that now I can use poll_ready_fn, which is what I personally was after.

Hope that's OK for you

Wasmtime GitHub notifications bot (Feb 13 2024 at 21:02):

alexcrichton commented on PR #7812:

Wanted to say I have not forgotten about this, I have been looking for time to write up something longer-form, which I hope to get to by tomorrow. Is this blocking anything though that it would be prudent to land now rather than later? If so I think it's good to go as-is, but otherwise I'd like to take some more time to write up longer-form thoughts.

Wasmtime GitHub notifications bot (Feb 13 2024 at 21:12):

badeend commented on PR #7812:

There's no immediate rush from my side, so feel free to take your time.

Wasmtime GitHub notifications bot (Feb 14 2024 at 18:57):

alexcrichton commented on PR #7812:

Ok thanks again for your patience here, very much appreciated!

I've gotten some time to think and work on this. I was leaning towards merging this, but then I realized that I'd prefer to avoid a situation where we land this and then later revert most of it towards a different strategy. In that sense I wanted, time permitting, to take a moment and figure out if alternative strategies would work. I'm getting a growing sense of unease with this direction as it's more-or-less a custom Future trait and is something we'd ideally avoid.

So assuming that the main goal of this PR is to get access to ResourceTable during async fn ready I originally suggested the borrow/borrow_mut idea above using RefCell. I tried implementing that and turns out it doesn't work. That means that ResourceTable contains RefCell and async fn ready would close over &ResourceTable (e.g. it's a new function argument). In such a situation it means that the returned future, which must be Send, closes over &ResourceTable. That type is not Send because it requires ResourceTable: Sync which is not satisfied with RefCell. So that cans the idea of. using RefCell.

After talking a bit more with @pchickey, however, I'm growing more fond of the idea of using RwLock<T> here instead of RefCell<T>. Not for the actual blocking aspect but instead only for the "it's Sync" aspect. To that end I implemented this on a branch and got tests passing with it. The changes are:

Replace ResourceTable::get with ResourceTable::{borrow, borrow_mut} that return RwLock{Read,Write}Guard<T>.

Remove table methods returning Any

Add &ResourceTable as an argument to the ready async function.

Replace most usage of table().get() with table().get_mut() (avoids locks)

Use u32 indices in Pollable's make_future internals instead of Any

Rewrite headers in wasi-http to avoid needing Any by representing headers as Resource<Resource<hyper::HeaderMap>>

The major consequences of this decision, however, are:

ResourceTable::{borrow, borrow_mut} require atomic manipulations. No blocking, but it's atomics for something that's not contended 99.9% of the time.

The std::sync::RwLock type cannot be used because std::sync::RwLock{Read,Write}Guard is not Send. I temporarily added a tokio dependency to wasmtime-the-crate and used tokio::sync::RwLock instead. Long-term I would like to avoid a tokio dep in the wasmtime crate.

There's a few minor cleanup still to be had in terms of threading a few more errors in a few more places.

Personally I'm inclined to take a route that looks like this, namely threading arguments through async fn rather than threading arguments through fn poll. This is a foundational change to how things work though, especially around a new footgun of not being able to borrow_mut twice. In that sense I'd like to get feedback along the lines of:

@pchickey does this all sound reasonable enough to you?

@badeend does this still solve your original use case, and if so what do you think about this approach vs the poll approach?

Wasmtime GitHub notifications bot (Feb 14 2024 at 18:57):

alexcrichton commented on PR #7812:

also cc @elliottt since you've touched a lot of WASI internals and you probably want to take a look too

Last updated: Apr 18 2025 at 02:29 UTC