Stream: git-wasmtime

Topic: wasmtime / PR #6556 WASI Preview 2: rewrite streams and p...


view this post on Zulip Wasmtime GitHub notifications bot (Jun 28 2023 at 21:30):

pchickey edited PR #6556:

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Jun 28 2023 at 21:53):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 01:43):

pchickey edited PR #6556:

When we landed the WASI Preview 2 support in wasmtime-wasi we knew the streams/poll/scheduler implementation was going to get torn out and rewritten: see task 4 in https://github.com/bytecodealliance/wasmtime/issues/6370. This is that rewrite.

Features

wasmtime-wasi Preview 2 support is now built on top of tokio, and the host implementations for streams and pollable always use async Rust. It now passes all of the poll_oneoff tests, including on windows.

The new HostInputStream, HostOutputStream, and HostPollable mechanisms are the extension mechanism for wasmtime_wasi, as well as other crates, to provide implementations of the wasi:io/streams.{input-stream,output-stream} and wasi:poll/poll.pollable resources.

Wasmtime-wasi provides a built in AsyncReadStream to wrap any tokio::io::AsyncRead to impl HostInputStream, AsyncWriteStream to adapt AsyncWrite to HostOutputStream, and a AsyncFdStream wrapper for AsynFd that provides both stream impls (only available for cfg(unix)).

Additionally, we have provided an implementation that correctly handles waiting on stdin readines s

Sync and Async uses

Despite using async for its implementation, wasmtime-wasi will still work for synchronous Wasmtime embeddings. A new test-programs suite wasi-preview2-components-sync is identical to wasi-preview2-components in all regards except that it uses wasmtime's synchronous APIs, and the appropriate wasmtime_wasi::command::sync::add_to_linker, rather than wasmtime_wasi::command::add_to_linker which requires an async Config. The synchronous implementation creates its own Tokio runtime (if one does not already exist) and then invokes the async implementation inside of a block_on.

The wasi-common Preview 1 poll_oneoff implementation allowed for a synchronous implementation to work without any sort of dependency on tokio, by abstracting it behind a pluggable WasiSched. This PR eliminates that option for our Preview 2 users. If there is a compelling need, and labor available to implement and maintain it, we could collaborate on re-introducing a fully synchronous tokio-free option in the future. For this implementation, I decided that it was too challenging

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 01:43):

pchickey edited PR #6556:

When we landed the WASI Preview 2 support in wasmtime-wasi we knew the streams/poll/scheduler implementation was going to get torn out and rewritten: see task 4 in https://github.com/bytecodealliance/wasmtime/issues/6370. This is that rewrite.

Features

wasmtime-wasi Preview 2 support is now built on top of tokio, and the host implementations for streams and pollable always use async Rust. It now passes all of the poll_oneoff tests, including on windows.

The new HostInputStream, HostOutputStream, and HostPollable mechanisms are the extension mechanism for wasmtime_wasi, as well as other crates, to provide implementations of the wasi:io/streams.{input-stream,output-stream} and wasi:poll/poll.pollable resources.

Wasmtime-wasi provides a built in AsyncReadStream to wrap any tokio::io::AsyncRead to impl HostInputStream, AsyncWriteStream to adapt AsyncWrite to HostOutputStream, and a AsyncFdStream wrapper for AsynFd that provides both stream impls (only available for cfg(unix)).

Additionally, we have provided an implementation that correctly handles waiting on stdin readines s

Sync and Async uses

Despite using async for its implementation, wasmtime-wasi will still work for synchronous Wasmtime embeddings. A new test-programs suite wasi-preview2-components-sync is identical to wasi-preview2-components in all regards except that it uses wasmtime's synchronous APIs, and the appropriate wasmtime_wasi::command::sync::add_to_linker, rather than wasmtime_wasi::command::add_to_linker which requires an async Config. The synchronous implementation creates its own Tokio runtime (if one does not already exist) and then invokes the async implementation inside of a block_on.

The wasi-common Preview 1 poll_oneoff implementation allowed for a synchronous implementation to work without any sort of dependency on tokio, by abstracting it behind a pluggable WasiSched. This PR eliminates that option for our Preview 2 users. If there is a compelling need, and labor available to implement and maintain it, we could collaborate on re-introducing a fully synchronous tokio-free option in the future. For this implementation, I decided that we just need a tokio backend in order to build a path towards supporting both wasi-sockets and wasi-http with the same modular stream and pollable host traits.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 02:02):

pchickey edited PR #6556:

When we landed the WASI Preview 2 support in wasmtime-wasi we knew the streams/poll/scheduler implementation was going to get torn out and rewritten: see task 4 in https://github.com/bytecodealliance/wasmtime/issues/6370. This is that rewrite.

Features

wasmtime-wasi Preview 2 support is now built on top of tokio, and the host implementations for streams and pollable always use async Rust. It now passes all of the poll_oneoff tests, including on windows.

The new HostInputStream, HostOutputStream, and HostPollable traits are used to implement the wasi:io/streams.{input-stream,output-stream} and wasi:poll/poll.pollable resources.

Host{Input, Output}Stream design

HostPollable design

Implementations

wasmtime-wasi provides a struct AsyncReadStream which wraps a tokio::io::AsyncRead to provide an impl of HostInputStream, AsyncWriteStream to wraps AsyncWrite to provide HostOutputStream, and, in cfg(unix), AsyncFdStream wraps tokio::io::unix::AsynFd to provide both read and write stream impls.

AsyncReadStream and AsyncWriteStream will manage their own buffer in order for the HostInputStream::read and HostOutputStream::write methods to be non-blocking. These methods call the AsyncRead::poll_read and AsyncWrite::poll_write methods with a noop-waker Context (for now, this trivial implementation has been copied out of std sources until it becomes available in stable Rust) as a mechanism for providing a non-blocking read/write from synchronous code - @alexcrichton showed us that design.

Additionally, we have provided an implementation that correctly handles waiting on stdin readiness on both windows and unix. In Unix, this is built on top of AsyncFdStream, allows tokio to register stdin with epoll(7). Because stdin is a process-wide singleton, and also because epoll will return an error if the same fd is registered multiple times, the stdin resource is a global singleton, created lazily and guarded by a mutex. wasmtime_wasi::preview2::stdio::stdin() returns a struct Stdin which takes a lock on that singleton in the HostImputStream::{read, ready} methods. This means that, if for some reason you have granted the process stdin to multiple wasi streams (in the same, or different, stores) you can get all sorts of strange behavior because the same global resource backs them all behind the scenes - but this is always the case, the implementation in wasmtime_wasi just allows you to do so without hanging or panicking.

On Windows, we use the same approach as in the wasi-common preview 1 scheduler implementation - a (global singleton, lazily created, guarded by a mutex) worker thread running std::io::stdin().lock().fill_buf() as a blocking function. The communication scheme from the old implementation was changed up to use tokio::sync as the messaging primitives. This implementation has never been executed outside of CI because I don't have a windows machine, and I don't think our test coverage of this code is particularly great, so I basically expect someone who tries this on windows to report bugs.

Sync and Async uses

Despite using async for its implementation, wasmtime-wasi will still work for synchronous Wasmtime embeddings. A new test-programs suite wasi-preview2-components-sync is identical to wasi-preview2-components in all regards except that it uses wasmtime's synchronous APIs, and the appropriate wasmtime_wasi::command::sync::add_to_linker, rather than wasmtime_wasi::command::add_to_linker which requires an async Config. The synchronous implementation creates its own Tokio runtime (if one does not already exist) and then invokes the async implementation inside of a block_on.

The wasi-common Preview 1 poll_oneoff implementation allowed for a synchronous implementation to work without any sort of dependency on tokio, by abstracting it behind a pluggable WasiSched. This PR eliminates that option for our Preview 2 users. If there is a compelling need, and labor available to implement and maintain it, we could collaborate on re-introducing a fully synchronous tokio-free option in the future. For this implementation, I decided that we just need a tokio backend in order to build a path towards supporting both wasi-sockets and wasi-http with the same modular stream and pollable host traits.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 14:31):

dicej created PR review comment:

I wonder if there's a way to optimize this by reusing FuturesUnordered here. From the docs:

This structure is optimized to manage a large number of futures. Futures managed by FuturesUnordered will only be polled when they generate wake-up notifications. This reduces the required amount of work needed to poll large numbers of futures.

The implementation of FuturesUnordered interposes a custom waker which precisely tracks which futures have been woken, queuing them up to be polled at the next opportunity, and not wasting time polling the futures which haven't been woken. This can make a big difference when the number of pollables is large.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 14:44):

dicej created PR review comment:

I should mention that we wouldn't get the full benefit of FuturesUnordered here given the one-off nature of poll-oneoff, so it might not be worth it until/unless we replace poll-oneoff with a resource that's intended to be polled repeatedly.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 15:24):

alexcrichton created PR review comment:

I definitely don't disagree about the benefits of FuturesUnordered, but I would personally say we should start off simpler here and see how things go. My guess is that a large number of futures would have many other inefficiencies other than the polling behavior here, and in the meantime, at least to me personally, I feel that a manual "poll the list" is easier to understand than the FuturesUnordered abstraction

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 15:54):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 20:11):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 21:05):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 21:15):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 22:24):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 22:33):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 22:59):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 23:05):

pchickey edited PR #6556:

When we landed the WASI Preview 2 support in wasmtime-wasi we knew the streams/poll/scheduler implementation was going to get torn out and rewritten: see task 4 in https://github.com/bytecodealliance/wasmtime/issues/6370. This is that rewrite. Most of this was pair-programmed with @elliottt.

Features

wasmtime-wasi Preview 2 support is now built on top of tokio, and the host implementations for streams and pollable always use async Rust. It now passes all of the poll_oneoff tests, including on windows.

The new HostInputStream, HostOutputStream, and HostPollable traits are used to implement the wasi:io/streams.{input-stream,output-stream} and wasi:poll/poll.pollable resources.

Host{Input, Output}Stream design

input-stream and output-stream resources have a significant number of methods. For the host traits, we managed to boil this down to two required methods, as well as some optional methods to cover vectored IO, splice, and write-zeroes. The default implementations provide an unoptimized but correct implementation - the user may override them if an optimization is desired.

HostInputStream has just two methods: fn read(&mut self, ...) and async fn ready(&mut self). The most important invariant is that read must be non-blocking, and ready must block until the stream is ready for reading.

HostPollable design

The trick to HostPollable is to create a Rust Future indicating whether a stream (or some other resource, e.g. http bodies) is ready without requiring both the stream and pollable hold some Arc<Mutex<...>> to the resource itself. @alexcrichton pointed out the right design here: HostPollable is an enum with two variants:

Implementations

wasmtime-wasi provides a struct AsyncReadStream which wraps a tokio::io::AsyncRead to provide an impl of HostInputStream, AsyncWriteStream to wraps AsyncWrite to provide HostOutputStream, and, in cfg(unix), AsyncFdStream wraps tokio::io::unix::AsynFd to provide both read and write stream impls.

AsyncReadStream and AsyncWriteStream will manage their own buffer in order for the HostInputStream::read and HostOutputStream::write methods to be non-blocking. These methods call the AsyncRead::poll_read and AsyncWrite::poll_write methods with a noop-waker Context (for now, this trivial implementation has been copied out of std sources until it becomes available in stable Rust) as a mechanism for providing a non-blocking read/write from synchronous code - @alexcrichton showed us that design.

Additionally, we have provided an implementation that correctly handles waiting on stdin readiness on both windows and unix. In Unix, this is built on top of AsyncFdStream, allows tokio to register stdin with epoll(7). Because stdin is a process-wide singleton, and also because epoll will return an error if the same fd is registered multiple times, the stdin resource is a global singleton, created lazily and guarded by a mutex. wasmtime_wasi::preview2::stdio::stdin() returns a struct Stdin which takes a lock on that singleton in the HostImputStream::{read, ready} methods. This means that, if for some reason you have granted the process stdin to multiple wasi streams (in the same, or different, stores) you can get all sorts of strange behavior because the same global resource backs them all behind the scenes - but this is always the case, the implementation in wasmtime_wasi just allows you to do so without hanging or panicking.

On Windows, we use the same approach as in the wasi-common preview 1 scheduler implementation - a (global singleton, lazily created, guarded by a mutex) worker thread running std::io::stdin().lock().fill_buf() as a blocking function. The communication scheme from the old implementation was changed up to use tokio::sync as the messaging primitives. This implementation has never been executed outside of CI because I don't have a windows machine, and I don't think our test coverage of this code is particularly great, so I basically expect someone who tries this on windows to report bugs.

Sync and Async uses

Despite using async for its implementation, wasmtime-wasi will still work for synchronous Wasmtime embeddings. A new test-programs suite wasi-preview2-components-sync is identical to wasi-preview2-components in all regards except that it uses wasmtime's synchronous APIs, and the appropriate wasmtime_wasi::command::sync::add_to_linker, rather than wasmtime_wasi::command::add_to_linker which requires an async Config. The synchronous implementation creates its own Tokio runtime (if one does not already exist) and then invokes the async implementation inside of a block_on.

The wasi-common Preview 1 poll_oneoff implementation allowed for a synchronous implementation to work without any sort of dependency on tokio, by abstracting it behind a pluggable WasiSched. This PR eliminates that option for our Preview 2 users. If there is a compelling need, and labor available to implement and maintain it, we could collaborate on re-introducing a fully synchronous tokio-free option in the future. For this implementation, I decided that we just need a tokio backend in order to build a path towards supporting both wasi-sockets and wasi-http with the same modular stream and pollable host traits.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 23:05):

pchickey has marked PR #6556 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 23:05):

pchickey requested fitzgen for a review on PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 23:05):

pchickey requested wasmtime-core-reviewers for a review on PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 23:05):

pchickey requested wasmtime-default-reviewers for a review on PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 29 2023 at 23:07):

pchickey requested alexcrichton for a review on PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 02:07):

pchickey requested dicej for a review on PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 02:11):

pchickey edited PR #6556:

When we landed the WASI Preview 2 support in wasmtime-wasi we knew the streams/poll/scheduler implementation was going to get torn out and rewritten: see task 4 in https://github.com/bytecodealliance/wasmtime/issues/6370. This is that rewrite. Most of this was pair-programmed with @elliottt.

Features

wasmtime-wasi Preview 2 support is now built on top of tokio, and the host implementations for streams and pollable always use async Rust. It now passes all of the poll_oneoff tests, including on windows.

The new HostInputStream, HostOutputStream, and HostPollable traits are used to implement the wasi:io/streams.{input-stream,output-stream} and wasi:poll/poll.pollable resources.

Host{Input, Output}Stream design

input-stream and output-stream resources have a significant number of methods. For the host traits, we managed to boil this down to two required methods, as well as some optional methods to cover vectored IO, splice, and write-zeroes. The default implementations provide an unoptimized but correct implementation - the user may override them if an optimization is desired.

HostInputStream has just two methods: fn read(&mut self, ...) and async fn ready(&mut self). The most important invariant is that read must be non-blocking, and ready must block until the stream is ready for reading.

HostPollable design

The trick to HostPollable is to create a Rust Future indicating whether a stream (or some other resource, e.g. http bodies) is ready without requiring both the stream and pollable hold some Arc<Mutex<...>> to the resource itself. @alexcrichton pointed out the right design here: HostPollable is an enum with two variants:

Implementations

wasmtime-wasi provides a struct AsyncReadStream which wraps a tokio::io::AsyncRead to provide an impl of HostInputStream, AsyncWriteStream to wraps AsyncWrite to provide HostOutputStream, and, in cfg(unix), AsyncFdStream wraps tokio::io::unix::AsynFd to provide both read and write stream impls.

AsyncReadStream and AsyncWriteStream will manage their own buffer in order for the HostInputStream::read and HostOutputStream::write methods to be non-blocking. These methods call the AsyncRead::poll_read and AsyncWrite::poll_write methods with a noop-waker Context (for now, this trivial implementation has been copied out of std sources until it becomes available in stable Rust) as a mechanism for providing a non-blocking read/write from synchronous code - @alexcrichton showed us that design.

Additionally, we have provided an implementation that correctly handles waiting on stdin readiness on both windows and unix. In Unix, this is built on top of AsyncFdStream, allows tokio to register stdin with epoll(7). Because stdin is a process-wide singleton, and also because epoll will return an error if the same fd is registered multiple times, the stdin resource is a global singleton, created lazily and guarded by a mutex. wasmtime_wasi::preview2::stdio::stdin() returns a struct Stdin which takes a lock on that singleton in the HostImputStream::{read, ready} methods. This means that, if for some reason you have granted the process stdin to multiple wasi streams (in the same, or different, stores) you can get all sorts of strange behavior because the same global resource backs them all behind the scenes - but this is always the case, the implementation in wasmtime_wasi just allows you to do so without hanging or panicking.

On Windows, we use the same approach as in the wasi-common preview 1 scheduler implementation - a (global singleton, lazily created, guarded by a mutex) worker thread running std::io::stdin().lock().fill_buf() as a blocking function. The communication scheme from the old implementation was changed up to use tokio::sync as the messaging primitives. This implementation has never been executed outside of CI because I don't have a windows machine, and I don't think our test coverage of this code is particularly great, so I basically expect someone who tries this on windows to report bugs.

Sync and Async uses

Despite using async for its implementation, wasmtime-wasi will still work for synchronous Wasmtime embeddings. A new test-programs suite wasi-preview2-components-sync is identical to wasi-preview2-components in all regards except that it uses wasmtime's synchronous APIs, and the appropriate wasmtime_wasi::command::sync::add_to_linker, rather than wasmtime_wasi::command::add_to_linker which requires an async Config. The synchronous implementation creates its own Tokio runtime (if one does not already exist) and then invokes the async implementation inside of a block_on.

The wasi-common Preview 1 poll_oneoff implementation allowed for a synchronous implementation to work without any sort of dependency on tokio, by abstracting it behind a pluggable WasiSched. This PR eliminates that option for our Preview 2 users. If there is a compelling need, and labor available to implement and maintain it, we could collaborate on re-introducing a fully synchronous tokio-free option in the future. For this implementation, I decided that we just need a tokio backend in order to build a path towards supporting both wasi-sockets and wasi-http with the same modular stream and pollable host traits.

Bindings traits

After living with the bindings generated underneath crate::preview2::wasi and crate::preview2::wasi::command for a while, I decided that the interface traits belong under crate::preview2::bindings and command belongs under crate::preview2::command.

Beyond that code motion, we now generate async traits for the wasi:io/streams and wasi:poll/poll interfaces, and sync traits for everything else. If you want to use the bindings from an async embedding (i.e. with an async Config), the crate::preview2::bindings::{interface}::add_to_linker will do it, and if you want to use them from a synchronous embedding, use crate::preview2::bindings::sync::{interface}::add_to_linker.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton submitted PR review:

Looking reasonable so far, but the high level things which I think need more looking into are:

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton submitted PR review:

Looking reasonable so far, but the high level things which I think need more looking into are:

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

I think this'd be good to have a doc-block explaining when it's intended to be used, basically exclusively as a bridge between the async implementation and the desire to have a synchronous implementation instead.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

One possible update in the future is to use bytes::Bytes here since we're already buying in tokio stuff, but this also may not be used that heavily in which case it wouldn't matter too too much.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

I think this is fine for now, but this I think is violating the contract of read where it should be non-blocking right? I think that block_in_place will correctly not block other tasks but from the point of view of wasm this will still be blocking and "possibly taking awhile" since the host call is blocked in reading.

Fixing this though I think will be somewhat involved and require similar adapters for async read/write where a read queues up a request for a read and then during async fn ready it asynchronously blocks on a thread performing the read.

That gets sort of funky though because then you have to say that a file is ready for reading but not actually queue up a read until a read is requested (or something like that).

Anyway that's just an idea, but probably not a good one. Something will need to change here though I think either at the WASI spec level or here or something though to ensure it's all consistent.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

If you get a chance I think it'd be good to open an issue for avoiding this _internal_io wrapper module in the future. It seems like it's probably here to avoid clashes in two bindgen! macros or you're hiding stuff generated by bindgen!, but we can probably add more options to the macro in the future to fix that and avoid the need.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

I might recommend self.buffer.drain(..read_from_buffer) for this perhaps

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

I think this'll want to check the length of self.buffer and only block in await if it's zero

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Like above I think this would be good to have a doc block, but this ties in with my concerns on the file implementation in that I'm not sure we can actually support a function such as this since it's used in context which aren't supposed to block but this causes them to block?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

I think the "fix" here might be to ignore this error actually? If an error happens I think the only error is disconnection, which means that all future attempts to write will also continue to see the same error, so this is otherwise swallowing the bytes that were previously written and reported as written but they ended up not being received in the end.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

One thing I might recommend to simplify this a bit is to remove the Writable state since because this is a spsc channel once a reservation is done to acquire an OwnedPermit then it'll forever be writable immediately by the InputStream. Effectively in ready after reserve_owned I think it can call release to avoid the need to store channel-or-permit.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

I think this may not be quite right because technically read always succeeds, right?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

I think it'd be worth mirroring what files/sockets do in this case. Well ok not files since they just extend the file but for sockets I think it's an EPIPE return (or equivalent thereof)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

I think it'd be good to document what bound here is perhaps? I think that this ends up being a bound on "chunks" though rather than bytes which is probably what would be expected? I think it's ok to switch from chunks-to-bytes as a follow-up though.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

I think that this is the method where the bound should be checked from above because otherwise this can infinitely be called and it'll never return "wouldblock" and it'll just keep buffering data.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

One way to get rid of Clone, Arc, and Mutex here might be to require users of this to lookup the pipe from within the original table that it was inserted into? Although I suppose that runs afoul of WASI closing it which would then drop this, so perhaps not. It feels a bit weird though to have Clone for this type which shouldn't really be cloned and rather it's intended to be more of a "one end writes one end reads" sort of style.

Anyway fine to handle this later.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Could you have a comment for why there's a hash map involved here?

(basically we can only mutably borrow each table entry at most once so all requests for the same table entry must be "canonicalized")

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Unlike my thinking way up above about block_in_place for read/write on files I do think that it's the correct API to use here. This is fundamentally a blocking API (even as WASI defines it) so this is saying "hey tokio go do something as necessary as I'm about to block".

In the context of an async server though since it locks up a whole thread where the "correct" thing would be to leave this as an async fn which does a spawn_blocking. Given that I'm not sure how best to handle this. From a CLI point of view either way works, and from an async server point of view the actual real filesystem probably isn't used at all.

Although are the underlying file objects here trait-objects as opposed to "always cap_std::fs::File"? If that's the case then the trait-object I would presume wants async methods, so the async-ness would need to be propagated here to the bindings?

Hm ok I started out this comment thinking "ok block_in_place is right here" but now I'm less certain...

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

I don't think that this is quite right for a few reasons:

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Thinking about this again, I'm not sure that this is actually the desired semantics of poll_oneoff that we want? At least with systems like epoll/select/poll/etc when you wait for a set of operations the wait doesn't fail if one of the operations fails, but rather the wait says "that operation is ready" and when you got to get the result it returns the failure.

In that sense this may actually want to change the ready function to be async fn ready(&self) { ... } with no return value and then if an error is to be returned it's buffered up within the object to get returned by read or write afterwards?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Hm ok so this makes me realize that my idea for "just record the table index" is wrong, although subtly so. Basically what you can do is create a pollable and then close the original stream and that would probably cause a panic here.

One possible solution would be to prevent closure while there are active pollable objects. Looking into a resource-based future though that's not going to work since there's no hook to prevent resource.drop on resources.

Instead this may need to record "this is being polled on" so when it's closed we don't actually fully close the object but instead put it in an area of "this is only here for pollable" and then that gets cleaned up when the pollable is closed. (or something like that, not entirely sure)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

I commented on this above for something else as well, but I think the implementation here should be Ok(()) instead of pending

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

I think it might be good to split this impl block to a "sink" type rather than hanging it on EmptyStream since for writing I think it's reasonable to say that either all writes are eaten without doing anything or no writes are accepted and it always returns Ok(0)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Should this return StreamState::Closed?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Same comments as way above, but I think this which is supposed to be non-blocking is actually blocking now?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Ah this is where I think the I/O split I mentioned above would work better. For example bytes may have been successfully written into the buffer but then ignored due to this error happening. I think it'd be better to split apart the operations so that if any bytes are ever succesfully read they're guaranteed to be read from this stream.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

I think this may not be required due to the usage of dest.write above? I think that may update the buffer in-place to be shorter.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

This doesn't feel quite right to me in that this shouldn't continuously ressize the buffer but instead if the buffer is non-empty then it's always ready for reading but a read is only done if the buffer is empty (in which case we could for example know it's already been initialized)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

This is a bit tricky because the futures returned from ready need to all be "cancel safe", so if there's pending data in this buffer then it'll get lost if the await point below is cancelled.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Sort of like above with pipes but I think this'll want a form of backpressure to avoid the internal buffer from growing too large

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

The error here is tricky to handle because data has been buffered up from the beginning of the function but this is also reporting an error, so if the application re-wrote the data then it'd be written twice

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

In terms of "cancel-safety" this needs to be aware of partial writes that succeed, so I don't think this will be able to use write_all but instead will have to loop and incrementally update the internal buffer.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Or put another way, each buf provided to this method I think should become a separate message on the channel and then at most one Vec<u8> is buffered internally.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Alternatively files here might need Arc<Mutex<...>> or something similar. For example a read would spawn a task to do the actual read and then the non-blocking read is basically checking if the task is done, and ready is additionally checking if the task is done. That I think would make the append stream below make more sense since an append would spawn a task doing the write and then further writes/ready would wait for that first task to be done.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

(also perhaps check self.state and skip the recv if it's closed already, but that's more minor since it'll still work otherwise without that)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Or put another way, I would expec this type to be "always ready"

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Although personally I think it's also reasonable to instead structure this entire method as:

if self.buffer.is_empty() {
    // do I/O
} else {
    // do buffer stuff but no I/O
}

to avoid mixing and matching the I/O branches

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2023 at 18:19):

alexcrichton created PR review comment:

Or put another way (again, sorry), the TrySendError::Full case I think should return EWOULDBLOCK so callers know to go call poll_oneoff afterwards.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 07 2023 at 18:31):

dicej submitted PR review:

Looks like a great start overall.

Echoing Alex's concerns about block_in_place:

So we can use spawn_blocking instead, and store the future as part of the stream, but at that point we're kind of reinventing tokio::fs. The tokio folks have already provided a battle-tested implementation, and tokio::fs::File implements AsyncRead and AsyncWrite, so I think we should reuse it wherever possible. I realize there are some cases where we need to use cap-std to enforce sandboxing, but once cap-std gives us a file handle, we should be able to convert it to a tokio::fs::File and use that to handle the details of bridging the async and sync worlds. Does that sounds plausible?

Regarding buffers: I don't think we should ever need to grow buffers beyond their original sizes (and certainly not without bound). I would expect that we would flush (for writes) or fill (for reads) a buffer before moving on to the next one (or reusing it without necessarily changing its size). If you believe there's a scenario where extending a buffer is unavoidable (or at least desirable), I'd like to have a conversation to understand that better.

Generally speaking, I would expect that neither HostInputStream::read nor HostOutputStream::write implementations would do _any_ I/O and would instead leave that to ready, which would fill or drain self.buffer, respectively. That would avoid the need for the no-op waker and any kind of block_in_place trickery. Or put another way, I/O should only be done in an async context where a real waker is available, thus guaranteeing wake events make it back to poll_oneoff.

Hope that's somewhat clear. Happy to discuss in more detail as necessary.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 07 2023 at 18:31):

dicej submitted PR review:

Looks like a great start overall.

Echoing Alex's concerns about block_in_place:

So we can use spawn_blocking instead, and store the future as part of the stream, but at that point we're kind of reinventing tokio::fs. The tokio folks have already provided a battle-tested implementation, and tokio::fs::File implements AsyncRead and AsyncWrite, so I think we should reuse it wherever possible. I realize there are some cases where we need to use cap-std to enforce sandboxing, but once cap-std gives us a file handle, we should be able to convert it to a tokio::fs::File and use that to handle the details of bridging the async and sync worlds. Does that sounds plausible?

Regarding buffers: I don't think we should ever need to grow buffers beyond their original sizes (and certainly not without bound). I would expect that we would flush (for writes) or fill (for reads) a buffer before moving on to the next one (or reusing it without necessarily changing its size). If you believe there's a scenario where extending a buffer is unavoidable (or at least desirable), I'd like to have a conversation to understand that better.

Generally speaking, I would expect that neither HostInputStream::read nor HostOutputStream::write implementations would do _any_ I/O and would instead leave that to ready, which would fill or drain self.buffer, respectively. That would avoid the need for the no-op waker and any kind of block_in_place trickery. Or put another way, I/O should only be done in an async context where a real waker is available, thus guaranteeing wake events make it back to poll_oneoff.

Hope that's somewhat clear. Happy to discuss in more detail as necessary.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 07 2023 at 18:31):

dicej created PR review comment:

According to the docs, it should be considered closed when read returns 0 when dest.len() > 0.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 07 2023 at 18:31):

dicej created PR review comment:

Nits: "poisoned" is misspelled, and per https://doc.rust-lang.org/std/result/enum.Result.html#recommended-message-style, this should probably read "mutex should not be poisoned".

view this post on Zulip Wasmtime GitHub notifications bot (Jul 07 2023 at 18:31):

dicej created PR review comment:

In addition to separating the buffer and I/O cases, you can put the whole thing in a loop that runs until either dest is full or the buffer is empty and I/O would block.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 07 2023 at 18:31):

dicej created PR review comment:

This comment doesn't seem to be relevant anymore.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 13 2023 at 00:15):

elliottt updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 13 2023 at 22:32):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 14 2023 at 00:19):

elliottt updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 14 2023 at 19:38):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 14 2023 at 19:38):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 14 2023 at 22:22):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 15 2023 at 00:07):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2023 at 21:09):

elliottt updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2023 at 22:02):

elliottt updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2023 at 22:32):

elliottt updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2023 at 22:32):

pchickey submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2023 at 22:33):

pchickey created PR review comment:

Yeah, we'd have to do some challenging stuff to destroy the ctx, which is harder because the methods need both the ctx and table via WasiView, so we decided against changing this now.

We really only expect this impl to be used for tests.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2023 at 22:33):

pchickey submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2023 at 22:33):

pchickey created PR review comment:

An io error that occurs is buffered up, but we wanted to leave it open for ready() to trap execution if something is horribly wrong.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2023 at 22:44):

elliottt updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2023 at 22:46):

pchickey submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2023 at 22:46):

pchickey created PR review comment:

Yes this is a pretty significant complication. Right now we have decided to punt on it by adding a big warning in the source code, land this PR, then see if any of the assumptions involved change as we switch over to resources, and then fix it. It would be a crash vector for the whole runtime, but it would also require some pretty major work, so I want to defer it to a future PR. We will fix it before we mark the preview2 implementation as production ready.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2023 at 00:02):

elliottt updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2023 at 16:21):

elliottt updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2023 at 19:09):

elliottt updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2023 at 19:27):

elliottt updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2023 at 20:44):

dicej submitted PR review:

Looks like a great start overall.

Echoing Alex's concerns about block_in_place:

So we can use spawn_blocking instead, and store the future as part of the stream, but at that point we're kind of reinventing tokio::fs. The tokio folks have already provided a battle-tested implementation, and tokio::fs::File implements AsyncRead and AsyncWrite, so I think we should reuse it wherever possible. I realize there are some cases where we need to use cap-std to enforce sandboxing, but once cap-std gives us a file handle, we should be able to convert it to a tokio::fs::File and use that to handle the details of bridging the async and sync worlds. Does that sound plausible?

Regarding buffers: I don't think we should ever need to grow buffers beyond their original sizes (and certainly not without bound). I would expect that we would flush (for writes) or fill (for reads) a buffer before moving on to the next one (or reusing it without necessarily changing its size). If you believe there's a scenario where extending a buffer is unavoidable (or at least desirable), I'd like to have a conversation to understand that better.

Generally speaking, I would expect that neither HostInputStream::read nor HostOutputStream::write implementations would do _any_ I/O and would instead leave that to ready, which would fill or drain self.buffer, respectively. That would avoid the need for the no-op waker and any kind of block_in_place trickery. Or put another way, I/O should only be done in an async context where a real waker is available, thus guaranteeing wake events make it back to poll_oneoff.

Hope that's somewhat clear. Happy to discuss in more detail as necessary.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2023 at 20:46):

elliottt updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2023 at 21:19):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2023 at 22:04):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2023 at 22:38):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2023 at 22:42):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2023 at 22:51):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2023 at 22:58):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2023 at 00:03):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2023 at 19:56):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2023 at 21:36):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2023 at 21:40):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2023 at 23:36):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2023 at 23:42):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 20 2023 at 23:46):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 18:32):

alexcrichton submitted PR review:

Nice :+1:

For the WIT changes, those are from the upstream proposals? Or local changes made temporarily for this PR?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 18:32):

alexcrichton submitted PR review:

Nice :+1:

For the WIT changes, those are from the upstream proposals? Or local changes made temporarily for this PR?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 18:32):

alexcrichton created PR review comment:

I find the semantics of this type a bit dubious, but is this necessary? Would it be possible to remove this?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 18:32):

alexcrichton created PR review comment:

Given the semantics of that this stream is never ready for reading, would a better default be the ClosedInputStream?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 18:32):

alexcrichton created PR review comment:

Should this case be unreachable! since if the state is "ready" then the task should always be trying to receive one buffer?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 18:32):

alexcrichton created PR review comment:

Instead of an Option here perhaps add Closed to WriteState?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 18:32):

alexcrichton created PR review comment:

I think that this may always want to be Ok(()) since read never blocks?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 18:32):

alexcrichton created PR review comment:

Had a good chuckle reading this :)

Regardless I agree we can't make this truly async and sync close here is the right thing to do IMO

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:04):

pchickey submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:04):

pchickey created PR review comment:

Correct.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:05):

pchickey submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:05):

pchickey created PR review comment:

yeah, we did it because it corresponded to one of the output streams but now that you point it out, its really not necessary at all

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:07):

pchickey edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:07):

pchickey edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:12):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:13):

pchickey submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:13):

pchickey created PR review comment:

correct- that was left over from a prior factoring of this code, but its unreachable now

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:15):

pchickey submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:15):

pchickey created PR review comment:

we had it factored that way, then we were taking advantage of Option::take() so changing the repr made sense, but now I think I can switch it back...

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:16):

pchickey edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:16):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:31):

pchickey edited PR #6556:

When we landed the WASI Preview 2 support in wasmtime-wasi we knew the streams/poll/scheduler implementation was going to get torn out and rewritten: see task 4 in https://github.com/bytecodealliance/wasmtime/issues/6370. This is that rewrite. Most of this was pair-programmed with @elliottt.

Features

wasmtime-wasi Preview 2 support is now built on top of tokio, and the host implementations for streams and pollable always use async Rust. It now passes all of the poll_oneoff tests, including on windows.

The new HostInputStream, HostOutputStream, and HostPollable traits are used to implement the wasi:io/streams.{input-stream,output-stream} and wasi:poll/poll.pollable resources.

Host{Input, Output}Stream design

input-stream and output-stream resources have a significant number of methods. For the host traits, we managed to boil this down to two required methods, as well as some optional methods to cover vectored IO, splice, and write-zeroes. The default implementations provide an unoptimized but correct implementation - the user may override them if an optimization is desired.

HostInputStream has just two methods: fn read(&mut self, ...) and async fn ready(&mut self). The most important invariant is that read must be non-blocking, and ready must block until the stream is ready for reading.

HostPollable design

The trick to HostPollable is to create a Rust Future indicating whether a stream (or some other resource, e.g. http bodies) is ready without requiring both the stream and pollable hold some Arc<Mutex<...>> to the resource itself. @alexcrichton pointed out the right design here: HostPollable is an enum with two variants:

Implementations

wasmtime-wasi provides a struct AsyncReadStream which wraps a tokio::io::AsyncRead to provide an impl of HostInputStream, AsyncWriteStream to wraps AsyncWrite to provide HostOutputStream, and, in cfg(unix), AsyncFdStream wraps tokio::io::unix::AsynFd to provide both read and write stream impls.

AsyncReadStream and AsyncWriteStream will manage their own buffer in order for the HostInputStream::read and HostOutputStream::write methods to be non-blocking. These methods call the AsyncRead::poll_read and AsyncWrite::poll_write methods with a noop-waker Context (for now, this trivial implementation has been copied out of std sources until it becomes available in stable Rust) as a mechanism for providing a non-blocking read/write from synchronous code - @alexcrichton showed us that design.

Additionally, we have provided an implementation that correctly handles waiting on stdin readiness on both windows and unix. In Unix, this is built on top of AsyncFdStream, allows tokio to register stdin with epoll(7). Because stdin is a process-wide singleton, and also because epoll will return an error if the same fd is registered multiple times, the stdin resource is a global singleton, created lazily and guarded by a mutex. wasmtime_wasi::preview2::stdio::stdin() returns a struct Stdin which takes a lock on that singleton in the HostImputStream::{read, ready} methods. This means that, if for some reason you have granted the process stdin to multiple wasi streams (in the same, or different, stores) you can get all sorts of strange behavior because the same global resource backs them all behind the scenes - but this is always the case, the implementation in wasmtime_wasi just allows you to do so without hanging or panicking.

On Windows, we use the same approach as in the wasi-common preview 1 scheduler implementation - a (global singleton, lazily created, guarded by a mutex) worker thread running std::io::stdin().lock().fill_buf() as a blocking function. The communication scheme from the old implementation was changed up to use tokio::sync as the messaging primitives. This implementation has never been executed outside of CI because I don't have a windows machine, and I don't think our test coverage of this code is particularly great, so I basically expect someone who tries this on windows to report bugs.

Sync and Async uses

Despite using async for its implementation, wasmtime-wasi will still work for synchronous Wasmtime embeddings. A new test-programs suite wasi-preview2-components-sync is identical to wasi-preview2-components in all regards except that it uses wasmtime's synchronous APIs, and the appropriate wasmtime_wasi::command::sync::add_to_linker, rather than wasmtime_wasi::command::add_to_linker which requires an async Config. The synchronous implementation creates its own Tokio runtime (if one does not already exist) and then invokes the async implementation inside of a block_on.

The wasi-common Preview 1 poll_oneoff implementation allowed for a synchronous implementation to work without any sort of dependency on tokio, by abstracting it behind a pluggable WasiSched. This PR eliminates that option for our Preview 2 users. If there is a compelling need, and labor available to implement and maintain it, we could collaborate on re-introducing a fully synchronous tokio-free option in the future. For this implementation, I decided that we just need a tokio backend in order to build a path towards supporting both wasi-sockets and wasi-http with the same modular stream and pollable host traits.

Bindings traits

After living with the bindings generated underneath crate::preview2::wasi and crate::preview2::wasi::command for a while, I decided that the interface traits belong under crate::preview2::bindings and command belongs under crate::preview2::command.

Beyond that code motion, we now generate async traits for the wasi:io/streams, wasi:poll/poll, and wasi:filesystem/filesystem interfaces, and sync traits for everything else. If you want to use the bindings from an async embedding (i.e. with an async Config), the crate::preview2::bindings::{interface}::add_to_linker will do it, and if you want to use them from a synchronous embedding, use crate::preview2::bindings::sync::{interface}::add_to_linker.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 20:53):

pchickey edited PR #6556:

When we landed the WASI Preview 2 support in wasmtime-wasi we knew the streams/poll/scheduler implementation was going to get torn out and rewritten: see task 4 in https://github.com/bytecodealliance/wasmtime/issues/6370. This is that rewrite. Most of this was pair-programmed with @elliottt.

Features

wasmtime-wasi Preview 2 support is now built on top of tokio, and the host implementations for streams and pollable always use async Rust. It now passes all of the poll_oneoff tests, including on windows.

The new HostInputStream, HostOutputStream, and HostPollable traits are used to implement the wasi:io/streams.{input-stream,output-stream} and wasi:poll/poll.pollable resources.

Host{Input, Output}Stream design

input-stream and output-stream resources have a significant number of methods. For the host traits, we managed to boil this down to two required methods, as well as some optional methods to cover vectored IO, splice, and write-zeroes. The default implementations provide an unoptimized but correct implementation - the user may override them if an optimization is desired.

HostInputStream has just two methods: fn read(&mut self, ...) and async fn ready(&mut self). The most important invariant is that read must be non-blocking, and ready must block until the stream is ready for reading.

Nonblocking IO is actually impossible on regular files, but we hide that detail from WASI programs

There's a huge lie in the description above: internally, wasmtime_wasi uses InternalHost{Input,Output}Stream, which are enums that contain either a Host{Input,Output}Stream, or a private File{Input,Output}Stream. These file streams always perform a blocking read/write, whether or not the WASI program has called the nonblocking read or the blocking blocking_read method. And, any pollable associated with file read or write readiness always returns ready immediately.

This is required because there is no way to implement streams on files with epoll(7) (or on macos or windows using mio, afaik) based readiness - poll(2) will lie and always tell you files are ready for reading of writing, but the actual syscall may end up blocking the thread due to disk IO. And, rather than stick the work on a background task, we must wait until the OS read(2) or write(2) for a file has completed before returning from wasi stream (nonblocking) read or write, because the userlands in WASI expect (reasonably!) that any errors in regular file IO are reported synchronously.

So, we put an escape hatch into wasmtime-wasi which is just for regular file IO. It awaits for a tokio spawn_blocking to complete in both the blocking and non-blocking implementations of the stream reads and writes. Like all the other syscalls in the filesystem implementation, we need the async rust to put any blocking syscalls in a spawn_blocking, in order to not block the executor.

We spent a the better part of a week trying to square this circle and this is the best we could come up to make files work like we expect, but also expose the right interfaces from wasmtime-wasi so that all other streams have the proper non-blocking semantics.

HostPollable design

The trick to HostPollable is to create a Rust Future indicating whether a stream (or some other resource, e.g. http bodies) is ready without requiring both the stream and pollable hold some Arc<Mutex<...>> to the resource itself. @alexcrichton pointed out the right design here: HostPollable is an enum with two variants:

Note that we discovered a major design problem with the TableEntry variant that exposes a crash vector if the parent resource of a pollable is destroyed before the pollable itself. This is a big enough problem that we are setting it aside to be solved in a follow-up PR.

Implementations

wasmtime-wasi provides a struct AsyncReadStream which wraps a tokio::io::AsyncRead to provide an impl of HostInputStream, AsyncWriteStream to wraps AsyncWrite to provide HostOutputStream.

AsyncReadStream and AsyncWriteStream will manage their own buffer in order for the HostInputStream::read and HostOutputStream::write methods to be non-blocking. Each spawns a helper tokio::task to perform async operations in the background. This requires some buffering.

Note, we discussed ensuring these background tasks are reaped properly - right now we believe that the implementations are correct and they will exit on their own when the foreground side gets dropped. However, in the future to implement e.g. splice and forward correctly, we may need to allow these tasks to (optionally!) live longer than the WASI program's execution.

Additionally, we have provided an implementation that correctly handles waiting on stdin readiness on both windows and unix. In Unix, this is built on top of tokio::os::unix::AsyncFd, allows tokio to register stdin with epoll(7). Because stdin is a process-wide singleton, and also because epoll will return an error if the same fd is registered multiple times, the stdin resource is a global singleton, created lazily and guarded by a mutex. wasmtime_wasi::preview2::stdio::stdin() returns a struct Stdin which takes a lock on that singleton in the HostInputStream::{read, ready} methods. This means that, if for some reason you have granted the process stdin to multiple wasi streams (in the same, or different, stores) you can get all sorts of strange behavior because the same global resource backs them all behind the scenes - but this is always the case, the implementation in wasmtime_wasi just allows you to do so without hanging or panicking.

On Windows, the implementation of stdin is broken, and inheriting stdin will panic. We have a path to fixing it, but we are leaving that out of this PR, because it has been open for far too long, and far too much work is predicated on landing it.

Sync and Async uses

Despite using async for its implementation, wasmtime-wasi will still work for synchronous Wasmtime embeddings. A new test-programs suite wasi-preview2-components-sync is identical to wasi-preview2-components in all regards except that it uses wasmtime's synchronous APIs, and the appropriate wasmtime_wasi::command::sync::add_to_linker, rather than wasmtime_wasi::command::add_to_linker which requires an async Config. The synchronous implementation creates its own Tokio runtime (if one does not already exist) and then invokes the async implementation inside of a block_on.

The wasi-common Preview 1 poll_oneoff implementation allowed for a synchronous implementation to work without any sort of dependency on tokio, by abstracting it behind a pluggable WasiSched. This PR eliminates that option for our Preview 2 users. If there is a compelling need, and labor available to implement and maintain it, we could collaborate on re-introducing a fully synchronous tokio-free option in the future. For this implementation, I decided that we just need a tokio backend in order to build a path towards supporting both wasi-sockets and wasi-http with the same modular stream and pollable host traits.

Bindings traits

After living with the bindings generated underneath crate::preview2::wasi and crate::preview2::wasi::command for a while, I decided that the interface traits belong under crate::preview2::bindings and command belongs under crate::preview2::command.

Beyond that code motion, we now generate async traits for the wasi:io/streams, wasi:poll/poll, and wasi:filesystem/filesystem interfaces, and sync traits for everything else. If you want to use the bindings from an async embedding (i.e. with an async Config), the crate::preview2::bindings::{interface}::add_to_linker will do it, and if you want to use them from a synchronous embedding, use crate::preview2::bindings::sync::{interface}::add_to_linker.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 21:02):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 21:36):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 22:10):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 22:16):

pchickey has enabled auto merge for PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 21 2023 at 22:42):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2023 at 00:10):

pchickey updated PR #6556.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2023 at 04:26):

pchickey merged PR #6556.


Last updated: Dec 23 2024 at 12:05 UTC