Async Wasm reading · wasi · Zulip Chat Archive

Is there something we can read about the 0.3 async plans (even if it is just source), in terms of the ABI the host would provide to a guest to allow the guest to import async functions?

In particular, I wonder if the guest will be allowed to make multiple async calls to its host from a single task, and then allow various forms of await: await for any, await for all, await for all as long as there hasn't been an error ... And from the host side, if there is anything to look at when the host is written in Rust and uses Futures to bridge the guest await state with a runtime.

I'm interested in mapping out how guests would get access to Linux io-uring rings that the host could grant access to, presumably as resources, and wanted to see that the guest would be allowed to create multiple points of starting work before then deciding how it wanted to await them. If an io-uring instance were to be represented as a host provided Resource, the host could allow any number of async operations on it from the one guest, and from other guests too for that matter - if the host were giving a name to the Resource such that more than one guest could name the same Resource.

And then being able to see how Futures (are we allowed to call them that in a language agnostic forum) could be cancelled - by dropping their instance(s) within the guest. For the sake of argument, how a guest, written in Rust, could use a select! like macro, or a package like futures-concurrency for batching asynchronous work and then deciding how to proceed as results start to roll in.

Am also curious whether a guest function, an exported function, can spawn a Future based on the runtime the host is providing. I think it's all related to what the ABI is making available.

GitHub - yoshuawuyts/futures-concurrency: Structured concurrency operations for async Rust

Structured concurrency operations for async Rust. Contribute to yoshuawuyts/futures-concurrency development by creating an account on GitHub.

Lann Martin (May 24 2024 at 17:47):

Lann Martin (May 24 2024 at 17:50):

GitHub - dicej/component-async-demo: Demo of async support for wit-bindgen, wasm-tools, and wasmtime

Demo of async support for wit-bindgen, wasm-tools, and wasmtime - dicej/component-async-demo

Joel Dice (May 24 2024 at 17:54):

Comparing main...async-pt1 · WebAssembly/component-model

Repository for design and specification of the Component Model - Comparing main...async-pt1 · WebAssembly/component-model

Joel Dice (May 24 2024 at 17:56):

We've discussed cancellation quite a bit, and I sketched out some loose (but largely untested) ideas in isyswasfa, but we're currently focused on getting everything else working before we add cancellation to the mix.

GitHub - dicej/isyswasfa: I sync, you sync, we all sync for async!

I sync, you sync, we all sync for async! Contribute to dicej/isyswasfa development by creating an account on GitHub.

Joel Dice (May 24 2024 at 18:02):

Regarding io-uring: I think the general idea is that the buffer(s) to be read from or written to would always be allocated in the guest, and the associated guest->host async call(s) would complete once the kernel tells the host about completion (causing the host to tell the guest about completion). I'll admit I don't have any direct experience with io-uring yet, though, so I don't know for sure how the CM async ABI will map to the io-uring rules for buffer ownership (and especially how guest-allocated buffers might be reused). @Luke Wagner can perhaps comment on that.

Joel Dice (May 24 2024 at 18:04):

component-async-demo/http-echo/src/lib.rs at main · dicej/component-async-demo

Demo of async support for wit-bindgen, wasm-tools, and wasmtime - dicej/component-async-demo

Luke Wagner (May 24 2024 at 18:07):

Yep, that's a good summary @Joel Dice. Adding a bit more detail: the guest registers a buffer with a stream returned by an async operation (with the buffer represented as a dynamic i32 pointer and the memory identified by a static memidx), and then the runtime can an operation to the io_uring's request queue, storing a translated pointer into the guest's linear memory, and then when the corresponding completion event is read, the runtime can notify the guest that it's buffer has been filled in.

Joel Dice (May 24 2024 at 18:11):

I should note that the ABI I'm using for streams in https://github.com/dicej/component-async-demo is entirely made up and subject to change -- particularly regarding when buffers are allocated. The overall control flow is probably close to where we'll end up, though.

GitHub - dicej/component-async-demo: Demo of async support for wit-bindgen, wasm-tools, and wasmtime

Demo of async support for wit-bindgen, wasm-tools, and wasmtime - dicej/component-async-demo

Frank Rehwinkel (May 24 2024 at 21:12):

Well, I wouldn't expect that support for io-uring would be baked into the ABI, I just mentioned it because I am interested in exploring it when the tools here are available and it is a good example of what makes async interesting for platform developers and end users.

I don't know that the model described above where the buffer belongs to the guest's linear memory will fly with most rust/io-uring folks, unless you are planning more control over when the guest's memory might be returned if the guest is terminated early. Most io-uring frameworks written in Rust go to great pains to pass ownership of the buffer to the io-uring driver, and hence, the kernel, so there is no chance the kernel might be writing to memory that has already been repurposed through other heap allocations. The buffer ownership model is what has kept io-uring support from being slipped into the existing Tokio runtime. Tokio io has been built on the idea that a reference to the buffer could be passed to reads and writes, because the buffer is only actually used once the OS has indicated the fd is ready so the read or write is actually done synchronously. The io-uring interface is very different; whether or not a syscall is used to give the buffer address to the kernel, there is nothing that a thread blocks on while the kernel reads or writes to the buffer, so the application could be doing anything with that memory if it hadn't given up ownership - for example the Future it was used by could be cancelled and now the buffer's drop function is called. With a design that isn't careful, a Wasm guest could be dropped, or the guest's Future could be cancelled.

And that is only one of the ways the io-uring would use a buffer - the io-uring operation is handed a user space buffer address. But this doesn't scale well for servers that may want to service thousands of connections at a time. An even better buffer model that io-uring provides is that it is preloaded with a pool of buffers and the pool is assigned a pool-id, and another form of the read operations take a pool-id as input. So the kernel already owns the memory it is going to write into (for the read operation) and when it is done, the buffer id of the pool is passed back to the user process so the user process can know which buffer from the pool was picked by the kernel. Once the user process has finished with the buffer, the buffer id is returned to the kernel's pool. In this way, the kernel can have thousands of TCP connections open but only use a pool of tens or hundreds of preallocated buffers. Here the buffer would likely be copied into the guest so it can be freed, or would be made available as a Resource. I've wondered if the buffer is known to contain a serialized piece of data, like an Rkyv archive message, if the deserialize code could be on the guest side but that might not square with the Wasm memory model. So a copy of the io-uring buffer in host space to an allocated buffer in the guest space - it's not the worse solution.

And there are many reasons for making a copy, so the trick for a given application with a particular kind of TCP or UDP message coming in from the line will be to pick the model that suits them best.

So looking through the links provided above, to see how Futures, or whatever they are called around scheduled work, on the guest and host sides should go a long way in understanding what will be possible and how Futures are represented across the ABI.

No-one has said the system being evolved will preclude multiple async operations from being run concurrently by the guest to the same Resource, so that's nice.

Luke Wagner (May 24 2024 at 23:28):

@Frank Rehwinkel Regarding "ownership" of the memory during the async op, I think there are different meanings of the word "ownership" here and we might actually be agreeing on your core point. When I said the memory is "owned" by the guest, I just meant that the memory passed into the io_uring operation could be a pointer directly into the guest's linear memory (as opposed to host memory that then needs a subsequent copy into guest linear memory). But once a pointer into this linear memory is passed into the kernel (via io_uring or, on Windows, Overlapped IO), then a separate question is: what happens if the guest wants to cancel the I/O operation -- can the guest do this synchronously or does the guest have to wait until the cancellation is acknowledged by the kernel. I think the latter is what's required by io_uring and Overlapped I/O and other async frameworks so that's what we should do in the design of the C-M cancel. For languages or contexts (say a C++ dtor) where cancellation must be sync, there would still be an option to synchronously wait for cancellation to finish (which will still allow tasks in other components to make progress). With this async cancellation design, you could sortof say that the guest-selected memory is "owned" by the runtime (and kernel) for the duration of async operation (although it's important to note that the wasm guest would still be physically able to read and write to this memory in the interim without trapping, even if they "shouldn't").

Regarding the buffer pools: yes that makes sense as an optimization and I've pondered it a bit in the background. Although probably not in the initial release, I've been imagining the stream ABI could allow a component to eagerly create pools of buffers that could then be read into from one or more streams that the component is reading from. That being said, if a host has a large number of components all independently reading from a large number of (necessarily-independent) streams, the host wouldn't be able to have a single pool of all the components' buffers since it's necessary for a read from a stream to only write into a buffer of the component reading that specific stream. I guess an alternative is to have a global pool of host-owned buffers and do the extra copy into guest memory... it's a question of what is net better performance.

Stream: wasi

Topic: Async Wasm reading

Frank Rehwinkel (May 24 2024 at 17:19):

Lann Martin (May 24 2024 at 17:47):

Lann Martin (May 24 2024 at 17:50):

Joel Dice (May 24 2024 at 17:54):

Joel Dice (May 24 2024 at 17:56):

Joel Dice (May 24 2024 at 18:02):

Joel Dice (May 24 2024 at 18:04):

Luke Wagner (May 24 2024 at 18:07):

Joel Dice (May 24 2024 at 18:11):

Frank Rehwinkel (May 24 2024 at 21:12):

Luke Wagner (May 24 2024 at 23:28):