badeend opened PR #9129 from badeend:file-hostinputstream
to bytecodealliance:main
:
Fixes #9058
All input streams are implemented using the
HostInputStream
trait. Except files. They got special treatment. This can be seen in the definition ofInputStream
:The special case was introduced to work around the fact that OS'es don't actually provide any _true_ async APIs for files. A more detailed explanation can be read in the PR that introduced this setup: https://github.com/bytecodealliance/wasmtime/pull/6556
This PR properly implements the
HostInputStream
&HostOutputStream
traits for files. And with "properly" I mean:
- methods that shouldn't block (e.g.
read
&write
) now really don't block.- methods that are allowed to block (e.g.
blocking_read
&blocking_write_and_flush
) continue to block and still take advantage ofallow_blocking_current_thread
.Primary changes:
InputStream
has been changed from an enum to a type alias, just likeOutputstream
already is.- The Host(Input/Output)Stream traits have been extended to allow implementors to provide specialized
blocking_*
implementations. In this PR, only the File streams actually override these.- Added an
async fn cancel(&mut self) {}
method to the Host(Input/Output)Stream traits. This is called right before the stream is dropped and can be used to wait for asynchronous cleanup.
- On File streams, this is used to wait on any in flight read or write. Operating systems don't provide _real_ asynchronous APIs for files, so we end up spawning uncancellable blocking background tasks. In the new
cancel
method we wait for that read/write to run to completion. From the guest's point of view,input/output-stream::drop
then appears to block. Certainly less than ideal, but arguably still better than letting the guest rack up an unbounded number of background tasks. Also, the guest is only blocked if the stream was dropped mid-read or mid-write, which is not expected to occur frequently.- I also implemented it for various other stream types. Unlike Files, the background tasks of these stream types are truly async. So aborting _without_ awaiting them was probably already good enough in practice. Nonetheless, waiting for the background task to actually shut down feels more "hygienic" to me in terms of resource management.
Slightly related changes:
- In Preview1 adapter: ignore BlockingMode and always perform blocking I/O. That's what preview1 did anyway.
- Remove
[method]output-stream.forward
from bindgen config. That method does not exist.- Refactor
blocking_splice
to take advantage of specializedblocking_read
&blocking_write_and_flush
implementations- Rename filesystem's
spawn_blocking
->run_blocking
badeend requested pchickey for a review on PR #9129.
badeend requested wasmtime-core-reviewers for a review on PR #9129.
alexcrichton submitted PR review:
Thanks for taking on this refactoring, I really like how it's turned out! I think that this will work quite well.
One thing I can note which I was originally worried about but am no longer is that it's still possible for guests to queue up a lot of tasks. For example if a guest queues a write and then cancels it that'll block the guest waiting to cancel that write. The host may then cancel the host guest which will cancel-the-cancel in a way meaning that the write's just sitting in the ether on a blocking thread waiting to be completed. That initially sounded bad to me but I think this is ok because Tokio has a fixed size of blocking calls which can be "in the ether" and otherwise cancellation of a job in the queue that hasn't actually run yet is successful. This means that it's not possible to get an unbounded number of writes pending at any point with this cancellation.
I've got some minor stylistic thoughts below but otherwise this looks good-to-go to me :+1:
alexcrichton created PR review comment:
The implementation of this seems a little subtle and while I see no issue with it I think it would be best to share as much as possible with the above code. Could something like this work?
- In
blocking_read
first resolve theWaiting
state if any with a.await
- Next handle the
Idle
state by performing arun_blocking
but the result of that is handled in a shared way with whatwait_ready
does (settingself.state
internall)- Next handle the results of the read in the same way as
fn read
above (perhaps even just callingself.read(size)
?)That I think will keep the
blocking_read
bits here deduplicated with the rest and the only responsibility ofblocking_read
vsread
would be to filter the states thatread
can see by doing blocking work up front.
alexcrichton created PR review comment:
Should this perhaps
await
the result of the read here inline?
alexcrichton created PR review comment:
I'm a little confused by this, is it required? I would expect that reads are only initiated from the
read
method above (or blocking-read) and theready
bits here only wait for completion of an already pending read.
badeend submitted PR review.
badeend created PR review comment:
is it required?
Not sure, but the current wasmtime-wasi implementation would push me towards a hesitant "Yes?"
I kinda stole the idea from the stdin implementation that also initiates a new read if there's no data available:This, combined with the fact that there exist multiple places in the code that blindly assume that calling
.ready().await
will yield readable data:https://github.com/bytecodealliance/wasmtime/blob/c7756bd2654fe1ac795f9dd9d13de41a31cafbac/crates/wasi/src/host/io.rs#L178-L182
https://github.com/bytecodealliance/wasmtime/blob/c7756bd2654fe1ac795f9dd9d13de41a31cafbac/crates/wasi/src/host/io.rs#L211-L214
https://github.com/bytecodealliance/wasmtime/blob/c7756bd2654fe1ac795f9dd9d13de41a31cafbac/crates/wasi/src/host/io.rs#L231-L234
https://github.com/bytecodealliance/wasmtime/blob/c7756bd2654fe1ac795f9dd9d13de41a31cafbac/crates/wasi/src/stream.rs#L153-L154
badeend created PR review comment:
Hmm, this should've actually been
unreachable!()
as this state is awaited higher up in the function.
badeend submitted PR review.
badeend updated PR #9129.
badeend submitted PR review.
badeend created PR review comment:
Something like this?
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Perfect!
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Hm ok, I'm not sure if this will survive the test of time but following suit with stdin doesn't seem unreasonable. This shouldn't get hit most of the time for the time being as WASIp1 shouldn't use this. Let's go ahead and land it :+1:
alexcrichton merged PR #9129.
pchickey commented on PR #9129:
@badeend sorry I didn't get time to review this before it landed, but thanks very much for working on it!
Last updated: Jan 24 2025 at 00:11 UTC