@Joel Dice @Roman Volosatovs @Lann Martin continuing our conversation of https://github.com/bytecodealliance/wasmtime/pull/11515 over here.
Sounds like y'all were thinking of leaning towards when a value is taken from Source that you're not allowed to return pending, but I wanted to push back on that. Sounds like Joel's also having second thoughts though so I'll let him copy them here
Yeah, so I've sketched out yet another revision of the public API for managing the read and write ends of streams in Wasmtime here based on feedback from Roman as he's updating wasmtime-wasi's p3 support to use it. Overall, we're pretty happy with it, but still wrestling a bit with what it means to send a COMPLETED event for a stream.write.
There are two possible interpretations of such an event:
If you go with #1, that means that StreamConsumer::poll_consume should return Poll::Ready(_) immediately upon taking one or more items, thereby allowing the runtime to promptly deliver the COMPLETED event to the writer so it knows what happened. Alternatively, if you go with #2, it's reasonable for StreamConsumer::poll_consume to return Poll::Pending after taking one more more items, since the COMPLETED event should not be delivered until the items have been forwarded on to their destination. That opens the possibility for the writer to cancel the write after the items have been consumed but before forwarding has finished. In that case the consumer could just let it complete without interrupting, and that's probably what it should do since there's no way to give any of the items back to the writer at that point.
It gets tricky, though, if not all the items can be forwarded before the destination is closed. At that point we have no way to say "I consumed N items, but was only able to forward a subset M < N to the destination". Or at best, that would need to be communicated out-of-band.
The good news is that we can "cheat" for stream<u8> (and perhaps other primitive types) since we can allow the consumer to "peek" at the writer's buffer without marking the bytes consumed until it knows they've been forwarded. But for other payload types (e.g. resources), the consumer can't get access to the items without irreversibly consuming them.
I feel like this is a bit entangled and I'd like to say instead that Source passed to StreamConsumer is exclusively in charge of keeping track what to return to wasm. I think though that acquiring an item to send and sending back a notification "it's done" should be orthogonal operations. So for example:
Source, that they're solely responsible for it. It's up to the implementation to do its best, even in the face of cancellation from the guest, to send that value along.The exact meaning of what happens during cancellation sort of depends on what happens after it. The host is responsible for sending along the values it took out still, so if the guest continues to do more writes then I'd expect the host to apply backpressure and not read anything further from a Source until the previous write completes. If the guest just closes the stream then the host can sort of do whatever it wants and just shut down everything
Basically I don't think it's possible for us to provide a super strict and/or automatic and/or smooth experience on either side to provide ironclad guarantees about what happened where for every possible aspect. What we're left with is a strict boundary of "the value was sent or not" and that's the most we can say. Further operations on a stream like flush/close/etc can then be used to provide more guarantees about previously sent values
In a sense a higher level protocol and/or more flushing/closing/etc primitives are going to be required to fully learn about what just happened with the sent values
It's like as we've said before, just b/c you write to the kernel doesn't mean it's going to make it to the client on the other end of the world
the guarantee the kernel provides is that if you write more data it's going to come after the prior data
Yeah, I'm thinking the same thing. So, concretely, you're in favor of letting StreamConsumer::poll_consume return Poll::Pending after having taken items from source to apply backpressure to the writer, but regardless of what happens after that (e.g. maybe they're all forwarded to the destination or maybe not), we'll end up sending COMPLETED with the count of items taken from source, correct?
correct
and documenting that using Source is tricky and subtle
And if the writer needs to know how many actually made it to the destination, that will need to be communicated out-of-band, e.g. via another future.
effectively yeah
or we add some sort of flush operation to streams or something like that
where the flush would wait for the item to be fully sent
(but even that's ambiguous, it made it to the kernel, now what?)
so in some sense I think truly knowing what made it where requires something higher level at the protocol/WIT layer that we can't be responsible for at the runtime layer
Yeah, and that's especially true for guest->guest writes, in which case the host can't possibly know what happened after the reader took responsibility for the items.
but we're also reflecting what (future) guests can do where they can acknowledge the write and then later say 'ok I took N items'
or... something like that
Like, "remember those N items I took? Well I just finished forwarding them on", or "I forwarded M of them, but then the upstream destination closed on me, sorry"
right
The other thing to consider is that, by letting StreamConsumer::poll_consume take items from source and return Poll::Pending, we're letting the host do something that a guest can't do: consume items but delay telling the writer they've been consumed. It's kind of a non-virtualizable host superpower and arguably not "fair".
oh that's what I meant about future wasm
ah, I see now
I forget the comment but Dan made it on some repo somewhere
https://github.com/WebAssembly/wasi-cli/issues/65#issuecomment-3169429925
I guess I'm making a distinction between three separate steps:
I think what you suggested above covers 1 and 3, but not 2, correct?
i.e. a guest reader can't separate 1 and 2 into distinct steps with an arbitrary delay between them
not quite, today you're right a guest receiving a write is required to bundle 1/2, and I'm saying let's assume in the future the guest can split 1/2 and I'm saying that the host should be able to split 1/2 today
and (3) is left to higher-level protocols
ok, got it
If we consider TCP for example ultimately I don't know what to even do about (3)
at most we can delay (2) to "ok the kernel syscall completed"
but I don't even know how to implement something further than that
Presumably, e.g. a database client library will wait for an ack from the remote peer before saying e.g. a transaction is complete. So yeah, anything beyond handing bytes to the kernel is solidly an application-level concern, IMO.
Usually TCP ACKs aren't even useful to applications; there are too many middle boxes out there to rely on it for much of anything.
It's buffers all the way down.
yeah, I guess I meant a DB-protocol-level ack
yeah I was asynchronously replying to "what to do about TCP" :slight_smile:
my point there though is that even for the most basic streams we can guarantee nothing about (3)
("the ultimate fate of the items")
so IMO we just can't attempt to do anything beyond providing the guarantee "this is someone else's problem now"
and that's what I mean about something in a higher level protocol, e.g. you wait for a read on something else after you write to confirm your bytes were sent
right, but we can know if the upstream destination explicitly closed on us and definitely did not accept a subset of the items
Jumping late into the conversation!
Personally, I would prefer for consumers to be able to return Poll::Pending after reading items.
Some implementations (like e.g. wasi:filesystem currently) might require to move values, e.g. to operate on them in a separate thread.
With the current design, peeking into byte buffers is allowed, so the consumer can then clone that buffer and eventually "mark" the bytes read once it receives a signal from the thread.
It appears that this covers WASI needs, but I think it's important to have this use case work in general case. I also imagine that it would be the least surprising behavior, which would align well with other similar constructs in Rust. That said, if necessary, I think it also makes sense to only allow this for byte buffers for MVP and relaxing this requirement it in the future.
Joel Dice said:
right, but we can know if the upstream destination explicitly closed on us and definitely did not accept a subset of the items
for specific cases perhaps, but not in the general case?
sure; either way we're agreed that this is an application-level concern
@Roman Volosatovs I agree with you yeah, basically returning pending after "officially" reading an item should be allowed in wasmtime
the consequence is that the implementation is quite a bit more complicated since now you're juggling something the guest thinks is done, but I think that's just a fact of life
Yeah, @Roman Volosatovs I think we're all on board with that; I'll update the gist
Is this adequately addressed in the async explainer?
sort of, but probably not what you're looking for
For TCP we can actually be smarter and use various platform-specific things to query the kernel unsent buffer size to figure out how many bytes can we actually write without blocking. I've left a note about that in the implementation: https://github.com/bytecodealliance/wasmtime/blob/f7ff95747a6e2515c1a5b7a697794e5cc002e6f5/crates/wasi/src/p3/sockets/host/types/tcp.rs#L224-L266
Right its sort of addressed by omission
Lann Martin said:
Is this adequately addressed in the async explainer?
the explainer says what happens between guests, and it's very clear what happens there, which is that it's a "rendezvous" and nothing else.
In that sense it's quite clear about this. It does not dive into the implications of this, however.
arguably that's not what the explainer should do though (but maybe some other place should? unsure)
I guess probably the doc comments of what Joel is working on?
@Roman Volosatovs if we wanted to spec something like a hard guaratnee for WASI or component model things we'd have to be able to guarantee that for all platforms and all WASI things
so what you describe I think might make sense as a WASI-specific method but not a stream thing per-se
And for docs it probably makes sense to thoroughly document in WASI what it means to write/read from streams
the consequence is that the implementation is quite a bit more complicated since now you're juggling something the guest thinks is done, but I think that's just a fact of life
are we talking about cancellation? in case of filesystem and stdio, I would expect for cancellation to only finish once the worker thread is done the I/O operation
as opposed to just saying "here's a stream go ham"
Roman Volosatovs said:
the consequence is that the implementation is quite a bit more complicated since now you're juggling something the guest thinks is done, but I think that's just a fact of life
are we talking about cancellation? in case of filesystem and stdio, I would expect for cancellation to only finish once the worker thread is done the I/O operation
I'm thinking non-byte-based streams mixed with cancellation. So you had to get an item from Source, but upon cancellation you're still responsible for that item
right, but the host is allowed to block the guest until it's done
So if we're thinking in terms of futures::Sink
it would look something like:
start_sendpoll_flush and return Poll::Ready once it's readypoll_consume does not return ready until the sink's "flush" is done - doesn't that address the problem?
and presumably it would also wait for the flush to finish when cancelled, correct?
yes, exactly
ok, I think we're agreed that's the expected behavior for wasmtime-wasi (and other libraries/embeddings that implement StreamConsumer), but Alex might be pointing out that not every implementation is required to behave that way, and indeed guest-based virtualized impls cannot do that currently
it comes down to a sort of "best-effort" guarantee
That makes sense to me. In fact, we should probably just say that it's interface-specific
Yeah, I can include a "best practices" section in the StreamConsumer docs to guide implementers
Some interfaces might expose application-specific flush and might even expect for their writes to be buffered (e.g. for performance reasons)
It seems like just about the only end-to-end guarantee is negative: if a stream write indicates that something wasn't consumed then it definitely wasn't sent
Technically a misbehaving StreamConsumer<u8> could lie about that and never call DirectSource::mark_read, but that's just evil.
Yeah "guarantee" not enforced by code, but something we can definitively say is a bug
oh I also just forget that cancellation can block, but yes if we say that cancellation doesn't actually do anything then there's not really any complication
my thinking is that we don't actually want those semantics though and we want cancellation to cancel
I had a quick chat with @Luke Wagner yesterday about this problem and he pointed out that the host is free to finish any work it has to do before reporting cancellation as "done".
I was under impression that cancellation would immediately abort as well.
allowing the cancellation to "block" certainly makes host's life easier though
e.g. if we have fired a file append off in a thread, we can just await it's completion.
Otherwise, it seems we would need to spawn a dedicated worker thread and use something like message passing to communicate cancellation intent
actually, these kind of operations simply cannot be cancelled :thinking:
so if cancel blocks, the guest will now when the write has been finished - when the cancel returns
if cancel does not block - the guest will either never know about it, or will block until all previous appends (potentially multiple) are done on the next append attempt
Here's the docs section I just added to StreamConsumer which addresses this:
/// ## Backpressure
///
/// As mentioned above, an implementation might choose to return
/// `Poll::Pending` after taking items from `source`, which tells the caller
/// to delay sending a `COMPLETED` event to the writer. This can be used as
/// a form of backpressure when the items are forwarded to an upstream sink
/// asynchronously. Note, however, that it's not possible to "put back"
/// items into `source` once they've been taken out, so if the upstream sink
/// is unable to accept all the items, that cannot be communicated to the
/// writer at this level of abstraction. Just as with application-specific,
/// recoverable errors, information about which items could be forwarded and
/// which could not must be communicated out-of-band, e.g. by writing to an
/// application-specific `future`.
///
/// Similarly, if the writer cancels the write after items have been taken
/// from `source` but before the items have all been forwarded to an
/// upstream sink, `poll_consume` will be called with `finish` set to true,
/// and the implementation may either:
///
/// - Interrupt the forwarding process gracefully. This may be preferrable
/// if there is an out-of-band channel for communicating to the writer how
/// many items were forwarded before being interrupted.
///
/// - Allow the forwarding to complete without interrupting it. This is
/// usually preferable if there's no out-of-band channel for reporting back
/// to the writer how many items were forwarded.
It seems like "cancel" is a little misleading; its more like "stop asap".
with pseudocode like this:
let (tx, rx) = stream();
file.append(rx);
let fut = tx.write("hello");
cancel(fut);
let fut = tx.write(", world");
cancel(fut);
let fut = tx.write("!");
cancel(fut);
IMO the expectation is always that hello, world! will eventually be written to the file
personally, I'd expect each cancel to "flush"
alternatively, we can say that once a stream write has been cancelled, the stream is "corrupted", file.append resolves to an error and the guest has to try again
remember that for stream<u8> you have the option of not calling DirectSource::mark_read until the write is flushed, so you don't have to worry about the cancel-after-taking-from-source-but-before-flushed in that case
Roman Volosatovs said:
I had a quick chat with Luke Wagner yesterday about this problem and he pointed out that the host is free to finish any work it has to do before reporting cancellation as "done".
I was under impression that cancellation would immediately abort as well.
To clarify my thinking on this, I understand the host is allowed to do this and it's going to be required for filesystem things that simply cannot be cancelled. That being said we should avoid this wherever possible as it largely defeats the point of cancellation in the first place.
Rust, for example, will synchronously await cancellation of futures meaning that if we liberally use this on the host then it means that all Rust guests will block quite a lot due to this behavior
Alex Crichton said:
Roman Volosatovs said:
I had a quick chat with Luke Wagner yesterday about this problem and he pointed out that the host is free to finish any work it has to do before reporting cancellation as "done".
I was under impression that cancellation would immediately abort as well.To clarify my thinking on this, I understand the host is allowed to do this and it's going to be required for filesystem things that simply cannot be cancelled. That being said we should avoid this wherever possible as it largely defeats the point of cancellation in the first place.
Rust, for example, will synchronously await cancellation of futures meaning that if we liberally use this on the host then it means that all Rust guests will block quite a lot due to this behavior
Totally agree with that, most implementations should not do that, but some simply have to
For your "write hello world in 3 chunks" example, given Rust bindings and the predicted WASI implementation, yes cancellation means nothing and it'll just pretend that didn't happen
we as a host, however, have the option of doing something fancier here
for example we could immediately say a cancelled write is indeed cancelled
then the next write is blocked until the previous write actually completes
If I write a buffer of two things to a stream, then cancel, and get a result of progress = 1, is that semantically the same as not cancelling and getting a result of progress = 1?
so the first cancellation call above would return immediately but eventually hello would get written. The other two calls wouldn't actually spawn anything b/c the previous write is still ongoing, so cancellation would successfully actually cancel (and report 0 bytes written immediately)
Lann Martin said:
If I write a buffer of two things to a stream, then cancel, and get a result of progress = 1, is that semantically the same as not cancelling and getting a result of progress = 1?
IMO, yes
@Lann Martin you may get different codes (COMPLETED vs. CANCELLED), though; it's up to the guest how it wants to interpret them
That is a different answer :sweat_smile:
well, at an ABI level yes
but semantically they mean the same thing
from a "what was written" perspective
yeah, they'll both have a payload of 1, in any case
Is CANCELLED actually useful?
I think it's supposed to mean "I could have done more, but you interrupted me, dammit"
Sure but the writer knows what it did
it resolves the race of "did this finish due to cancellation or due to the thing completing"
which is not the most meaningful thing, sure, but why not transfer the extra bit?
Lann Martin said:
Sure but the writer knows what it did
But it doesn't know what the reader would have done otherwise
It makes the semantics of cancellation more ambiguous
but yeah, I agree that it's not very useful in practice
for that I'd recommend an issue on the CM repo for possible 0.3.x adjustments
Alex Crichton said:
it resolves the race of "did this finish due to cancellation or due to the thing completing"
for this to work, runtime would need to do something like this, right?
poll_consume(finish: false)
consumer_cx.wake()
cancel()
poll_consume(finish: false)
if pending {
poll_consume(finish: true)
}
i.e. if guest called cancel after the consumer had called wake, but before poll_consume was called (since these things are concurrent), runtime might need to call poll_consume twice
I'm not sure that cancellation affects that?
but yes, that happens
one event, the "thing completing", and another event "cancel", causing two polls seems normal to me?
e.g. it's "two events generate two polls" and cancellation doesn't seem special here
For the runtime to know whether return CANCELLED or COMPLETED it needs to know if poll_consume after wake would result in Poll::Ready
if it gives finish: true, consumer is forced to return ASAP and the runtime cannot know which one it is anymore
another difference between COMPLETED and CANCELLED: the former can only have a zero payload for a zero-length read/write, while the latter can have a zero payload any time. Whether that justifies the existence of CANCELLED is debateable
Roman Volosatovs said:
if it gives
finish: true, consumer is forced to return ASAP and the runtime cannot know which one it is anymore
yeah, good point
Roman Volosatovs said:
For the runtime to know whether return
CANCELLEDorCOMPLETEDit needs to know ifpoll_consumeafterwakewould result inPoll::Ready
I think this depends on poll_consume, if the first poll_consume(finish: false) after the cancel() above says "I'm ready" the host would say COMPLETED and that's not a spec violation
We can always add a Cancelled variant to the StreamState enum if desired
Yeah again, the writer knows it cancelled; I think it's even supposed to trap if it gets CANCELLED otherwise
I guess I'm just trying to make sure I/we aren't missing anything semantically
Joel Dice said:
We can always add a
Cancelledvariant to theStreamStateenum if desired
And rename StreamState::Open to Completed, and Closed to Dropped for consistency
that seems reasonable to me
ok, I'm sick of editing that gist, but I've made those changes locally in my Wasmtime clone
I like these changes, awesome work!
Last updated: Dec 06 2025 at 06:05 UTC