badeend requested fitzgen for a review on PR #7812.
badeend opened PR #7812 from badeend:pollable
to bytecodealliance:main
:
Prior discussion: https://bytecodealliance.zulipchat.com/#narrow/stream/217126-wasmtime/topic/Change.20Subscribe.20trait
- Renamed the existing
Pollable
struct toPollableResource
- Reimplemented wasi-io/poll. This introduces a new
Pollable
trait which is lower level, doesn't require heap allocations to poll, has mutable access to the WasiView, and can be used as a standalone resource without a parent. The Subscribe trait is kept intact, but this is now a utility interface, implemented in terms of Pollable.- Eliminate the (now) unnecessary surrogate parent resource of clock pollables
- Added ResourceTable
take
&restore
as a general purpose replacement foriter_entries
. That one was used only by the oldpoll
implementation.
Additionally:
- @pchickey Forbid empty poll list. Fixes: WebAssembly/wasi-io#67
badeend requested wasmtime-core-reviewers for a review on PR #7812.
badeend requested wasmtime-default-reviewers for a review on PR #7812.
github-actions[bot] commented on PR #7812:
Subscribe to Label Action
cc @peterhuene
<details>
This issue or pull request has been labeled: "wasi", "wasmtime:api"Thus the following users have been cc'd because of the following labels:
- peterhuene: wasmtime:api
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
fitzgen requested sunfishcode for a review on PR #7812.
pchickey requested pchickey for a review on PR #7812.
pchickey submitted PR review:
Thanks, this is excellent work. The new internal interface is definitely superior to the old one, and I appreciate the new tests as well. The lease system for resource table is a much better interface than iter_entries.
I believe that all of the
Slot
andSlotIdentity
implementation makes sense and is correct (especially because, surprisingly to me, the only unsafe is inunsafe impl Send
), but I wanted to tag @alexcrichton to double-check that part because it is not code I feel super confident in my ability to assess that code. If he is happy with that, this can land.
pchickey submitted PR review:
Thanks, this is excellent work. The new internal interface is definitely superior to the old one, and I appreciate the new tests as well. The lease system for resource table is a much better interface than iter_entries.
I believe that all of the
Slot
andSlotIdentity
implementation makes sense and is correct (especially because, surprisingly to me, the only unsafe is inunsafe impl Send
), but I wanted to tag @alexcrichton to double-check that part because it is not code I feel super confident in my ability to assess that code. If he is happy with that, this can land.
pchickey created PR review comment:
I agree we need this change in the wits, lets just be sure to upstream these to the spec repo as well. We will come up with some process for how we keep the docs evolving and improving while assuring that the interface itself doesn't change.
pchickey created PR review comment:
This is a very bikesheddy suggestion, so feel free to disregard it, but is
BoxPollable
a better name for this? That wayResource<PollableResource>
isnt repeating the word.
pchickey requested alexcrichton for a review on PR #7812.
alexcrichton submitted PR review:
Thanks for this! I've left a few comments but it's getting a bit later here so I'm going to head out. I want to read more about the
Lease<T>
though as that looks quite subtle and I need to think more about it.
alexcrichton submitted PR review:
Thanks for this! I've left a few comments but it's getting a bit later here so I'm going to head out. I want to read more about the
Lease<T>
though as that looks quite subtle and I need to think more about it.
alexcrichton created PR review comment:
There's quite a lot of errors which go through the
Err
here so I'm a bit wary of usingunwrap_err
to assert that a particular trap happened, could this instead a particular property about the error, such as its message?
alexcrichton created PR review comment:
This I think is actually subtly incorrect because it drops the future after the call to
poll
which signals that the future should be cancelled rather than keeping it alive until the whole poll is done. I think that means that we may not be guaranteed to get wakeups from cancelled futures, although we might get those for now given how the code is currently constructed. I think that this'll need to be a bit fancier with the adapter instead of having a blanket impl
alexcrichton created PR review comment:
Could this, and the functions below, be implemented in terms of
poll_ready_fn
perhaps?
pchickey submitted PR review:
Thanks, this is excellent work. The new internal interface is definitely superior to the old one, and I appreciate the new tests as well. The lease system for resource table is a much better interface than iter_entries.
I believe that all of the
Slot
andSlotIdentity
implementation makes sense and is correct (especially because, surprisingly to me, the only unsafe is inunsafe impl Send
), but I wanted to tag @alexcrichton to double-check that part because I do not feel super confident in my ability to assess that code. If he is happy with that, this can land.
badeend updated PR #7812.
badeend submitted PR review.
badeend created PR review comment:
Certainly
badeend submitted PR review.
badeend created PR review comment:
Check!
alexcrichton submitted PR review:
Ok I've gotten a chance now to take a closer look at
Lease<T>
and the changesResourceTable
. Given this new API design for the pollable trait something along these lines is required (e.g.iter_children
can't work any more). The specific implementation here I think has a drawback where it's very "panicky" if you get it wrong. Most other aspects of WASI are "error-y" in that they try to return traps if anything is gotten wrong. This I think is pretty important for not accidentally becoming a DoS vector for embeddings. For example a panickingDrop
implementation means that an early-return in an embedder function might accidentally take down the whole process where a wasm trap would only take down a single instance.Given that my main thought on this is that this should ideally not ever panic and instead should switch to returning errors where possible. I might also recommending going a little bit further perhaps with a scheme such as:
- Leave
TableEntry::entry
asBox<dyn Any>
.- Change
take
to returningBox<T>
. This would replaceentry
with something likeBox::new(Tombstone)
which is a private type to this module.- Change
restore
to takingBox<T>
andResource<T>
.That way it's largely up to embedders to "get everything right" but they'd already be required to do so with this current API. Additionally any failed downcasts can additionally add a check for
Tombstone
to perhaps return a more precise error other thanResourceTableError::WrongType
with a new variant such asTakenValue
.
badeend commented on PR #7812:
Thanks for the feedback.
I agree on the "panicky" point. I'll add an error type and remove the panics.One thing that's not in this PR, but I assume will most likely be added at some point, are untyped
take_any
andrestore_any
variants. The drawback of reverting the Lease & SlotIdentity design, is that therestore(_any)
API becomes (even) easier to misuse. Because then the consumer can restore any value at the index of a previously differently typed entry. I'm worried about the developer experience of this, as the place where they encounter the WrongType errors could be miles away from where the problem actually is.Anyway, I'm fine with your suggestions. I just wanted to make sure the trade-offs are known.
badeend edited a comment on PR #7812:
Thanks for the feedback.
I agree on the "panicky" point. I'll add an error type and remove the panics.One thing that's not in this PR, but I assume will most likely be added at some point, are untyped
take_any
andrestore_any
variants. The drawback of reverting the Lease & SlotIdentity design, is that therestore(_any)
API becomes (even) easier to misuse. Because then the consumer can restore any value at the index of a previously differently typed entry. I'm worried about the developer experience of this, as the corruption would happen silently and the place where they encounter the WrongType errors could be miles away from where the problem actually is.Anyway, I'm fine with your suggestions. I just wanted to make sure the trade-offs are known.
alexcrichton commented on PR #7812:
Hm ok, for that I think you have a good point about accidentally messing up these APIs. I think this may still be surmountable perhaps with some trickery, but I'd also need to see the usage of
take_any
and restore to know better. Want to discuss on a future PR with that implemented or hash it out here? (I'm fine either way)
badeend updated PR #7812.
badeend created PR review comment:
badeend submitted PR review.
badeend submitted PR review.
badeend created PR review comment:
The design has changed in the meantime. Now it's not just a box anymore, so I went with
PollableHandle
.
badeend submitted PR review.
badeend created PR review comment:
The tests were failing precisely because of that. To fix it, I ended up rougly back at the
Subscribe
design, but this time returning a custom WasiFuture type that has an additionalWasiView
parameter on itspoll
method. To prevent scope creep of this PR, I keptSubscribe
alive for now (asPollableAsync
). But in the long run, I don't think there's much value in havin them both, asSubscribe
can be trivially converted toPollable
from:#[async_trait::async_trait] impl PollableAsync for HostFutureIncomingResponse { async fn ready(&mut self) { if let Self::Pending(handle) = self { *self = Self::Ready(handle.await); } } }
to:
impl Pollable for HostFutureIncomingResponse { fn ready<'a>(&'a mut self) -> Pin<Box<dyn WasiFuture<Output = ()> + Send + 'a>> { Box::pin(async { if let Self::Pending(handle) = self { *self = Self::Ready(handle.await); } }) } }
One obvious downside is the added visual noise.
badeend commented on PR #7812:
I chose to go with a hybrid approach. For the public API, I changed it to what you suggested. Internally, I removed Lease & SlotIdentity. But I kept Slot to perform the resource type check. Also, as part of the updated design (see above) I needed
take_any
andrestore_any
so I included those as well.
alexcrichton commented on PR #7812:
Reading over this and see how this all turned out, I'm personally starting to get second thoughts on this. We're effectively reimplementing our own
Future
trait and as I'm sure you've seen we start implementing our own primitive functions (e.g.poll_fn
) as well as we can't use standard things likeasync fn
or#[async_trait]
. I'm a bit worried that the direction this is taking us is straying off the path of maintainability for async support as things get more advanced over time.Now that's all easy to say but this PR is still solving a concrete problem which is letting implementations access resources while polling, so I don't think simply closing this PR is an option. That being said after having read over this I wonder if there's perhaps an alternative implementation route that we can take.
Originally when designing
Future
-the-trait we ran into this issue of situations wanting to pass more context along through thepoll
method but the context doesn't survive longer than a single call topoll
. To do that we ended up creating task-local variables which are like thread locals but instead stick with a task. That doesn't solve the immediate problem at hand though since you want mutable access, not just readable access.To solve the mutability problem I realized that the
take
/restore
bits look likeOption<T>
and so they've already got runtime state associated with them. One alternative would be to use aRefCell<T>
instead and effectively repurpose that runtime state. That would enable acquiring&mut T
from&ResourceTable
so long as it's done "correctly" which is basically already the situation we have today (make sure yourestore
after youtake
).How would you feel about something like that? We could still preserve
get_mut
as a method which has no runtime overhead (apart from storage space) but I'm imagining that aborrow_mut()
method would be added. I'll note thatget
would have to go away in this world and be replaced withborrow()
as a consequence, which likely affects code we have today.While
RefCell
is unlikely to win any award for being the most ergonomic thing in the world this feels like it might provide a better tradeoff because we wouldn't fall off the well-trodden-path of Rust async into custom traits and such. I would want to make sure it works for your use case though.I also realize though that you've probably already put in a great deal of work to this PR with 2 versions now so I'm hesitant to ask for a third. I'd be happy to help sketch this out and do some of the refactoring work to see if I feel like it's going to pay off.
badeend commented on PR #7812:
I think you mean changing
async fn ready(&mut self);
to
async fn ready(&mut self, table: &ResourceTable);
right?
That doesn;t work because
&ResourceTable
is not Send asResourceTable
is not Sync.
alexcrichton commented on PR #7812:
Good point, yes, I'm more-or-less saying we should do that. (either that or use a task-local but I think that still captures
&T
).Mind trying make
ResourceTable
implementSync
? I think that's probably the addition of a few trait bounds in its internal trait objects. I think everything we put in there is alreadySync
although if it things aren't currentlySync
that'll pose a larger problem.
badeend updated PR #7812.
badeend commented on PR #7812:
I understand your concerns, yet I'd rather not go for round three right now, which would include reverting https://github.com/bytecodealliance/wasmtime/pull/7802. So instead, I've changed the questionable types to be private to the
poll.rs
module. That way, all the iffy-ness is contained to just a single file that we can iterate on later. From the outside nothing significant has changed, except that now I can usepoll_ready_fn
, which is what I personally was after.
badeend edited a comment on PR #7812:
I understand your concerns, yet I'd rather not go for round three right now, which would include reverting https://github.com/bytecodealliance/wasmtime/pull/7802. So instead, I've changed the questionable types to be private to the
poll.rs
module. That way, all the iffy-ness is contained to just a single file that we can iterate on later. From the outside nothing significant has changed, except that now I can usepoll_ready_fn
, which is what I personally was after.Hope that's OK for you
alexcrichton commented on PR #7812:
Wanted to say I have not forgotten about this, I have been looking for time to write up something longer-form, which I hope to get to by tomorrow. Is this blocking anything though that it would be prudent to land now rather than later? If so I think it's good to go as-is, but otherwise I'd like to take some more time to write up longer-form thoughts.
badeend commented on PR #7812:
There's no immediate rush from my side, so feel free to take your time.
alexcrichton commented on PR #7812:
Ok thanks again for your patience here, very much appreciated!
I've gotten some time to think and work on this. I was leaning towards merging this, but then I realized that I'd prefer to avoid a situation where we land this and then later revert most of it towards a different strategy. In that sense I wanted, time permitting, to take a moment and figure out if alternative strategies would work. I'm getting a growing sense of unease with this direction as it's more-or-less a custom
Future
trait and is something we'd ideally avoid.So assuming that the main goal of this PR is to get access to
ResourceTable
duringasync fn ready
I originally suggested theborrow
/borrow_mut
idea above usingRefCell
. I tried implementing that and turns out it doesn't work. That means thatResourceTable
containsRefCell
andasync fn ready
would close over&ResourceTable
(e.g. it's a new function argument). In such a situation it means that the returned future, which must beSend
, closes over&ResourceTable
. That type is notSend
because it requiresResourceTable: Sync
which is not satisfied withRefCell
. So that cans the idea of. usingRefCell
.After talking a bit more with @pchickey, however, I'm growing more fond of the idea of using
RwLock<T>
here instead ofRefCell<T>
. Not for the actual blocking aspect but instead only for the "it'sSync
" aspect. To that end I implemented this on a branch and got tests passing with it. The changes are:
- Replace
ResourceTable::get
withResourceTable::{borrow, borrow_mut}
that returnRwLock{Read,Write}Guard<T>
.- Remove table methods returning
Any
- Add
&ResourceTable
as an argument to theready
async function.- Replace most usage of
table().get()
withtable().get_mut()
(avoids locks)- Use
u32
indices inPollable
'smake_future
internals instead ofAny
- Rewrite headers in wasi-http to avoid needing
Any
by representing headers asResource<Resource<hyper::HeaderMap>>
The major consequences of this decision, however, are:
ResourceTable::{borrow, borrow_mut}
require atomic manipulations. No blocking, but it's atomics for something that's not contended 99.9% of the time.- The
std::sync::RwLock
type cannot be used becausestd::sync::RwLock{Read,Write}Guard
is notSend
. I temporarily added atokio
dependency towasmtime
-the-crate and usedtokio::sync::RwLock
instead. Long-term I would like to avoid atokio
dep in thewasmtime
crate.- There's a few minor cleanup still to be had in terms of threading a few more errors in a few more places.
Personally I'm inclined to take a route that looks like this, namely threading arguments through
async fn
rather than threading arguments throughfn poll
. This is a foundational change to how things work though, especially around a new footgun of not being able toborrow_mut
twice. In that sense I'd like to get feedback along the lines of:
- @pchickey does this all sound reasonable enough to you?
- @badeend does this still solve your original use case, and if so what do you think about this approach vs the
poll
approach?
alexcrichton commented on PR #7812:
also cc @elliottt since you've touched a lot of WASI internals and you probably want to take a look too
Last updated: Dec 23 2024 at 12:05 UTC