Is there something we can read about the 0.3 async plans (even if it is just source), in terms of the ABI the host would provide to a guest to allow the guest to import async functions?
In particular, I wonder if the guest will be allowed to make multiple async calls to its host from a single task, and then allow various forms of await: await for any, await for all, await for all as long as there hasn't been an error ... And from the host side, if there is anything to look at when the host is written in Rust and uses Futures to bridge the guest await state with a runtime.
I'm interested in mapping out how guests would get access to Linux io-uring rings that the host could grant access to, presumably as resources, and wanted to see that the guest would be allowed to create multiple points of starting work before then deciding how it wanted to await them. If an io-uring instance were to be represented as a host provided Resource, the host could allow any number of async operations on it from the one guest, and from other guests too for that matter - if the host were giving a name to the Resource such that more than one guest could name the same Resource.
And then being able to see how Futures (are we allowed to call them that in a language agnostic forum) could be cancelled - by dropping their instance(s) within the guest. For the sake of argument, how a guest, written in Rust, could use a select! like macro, or a package like futures-concurrency for batching asynchronous work and then deciding how to proceed as results start to roll in.
Am also curious whether a guest function, an exported function, can spawn a Future based on the runtime the host is providing. I think it's all related to what the ABI is making available.
@Luke Wagner's wasm i/o talk is probably the best conceptual overview: YouTube - A stream of consciousness on the future of async in the Component Model by Luke Wagner @ Wasm I/O 24 (the actual 0.3 async part starts around ~12:30 but its all helpful context imo)
For code, see @Joel Dice's async demo using actual working (but incomplete) implementations from forked repos: https://github.com/dicej/component-async-demo
Also, @Luke Wagner is working on a draft design spec, including a detailed (and runnable!) Python program illustrating control flow: https://github.com/WebAssembly/component-model/compare/main...async-pt1
We've discussed cancellation quite a bit, and I sketched out some loose (but largely untested) ideas in isyswasfa, but we're currently focused on getting everything else working before we add cancellation to the mix.
Regarding io-uring
: I think the general idea is that the buffer(s) to be read from or written to would always be allocated in the guest, and the associated guest->host async call(s) would complete once the kernel tells the host about completion (causing the host to tell the guest about completion). I'll admit I don't have any direct experience with io-uring
yet, though, so I don't know for sure how the CM async ABI will map to the io-uring
rules for buffer ownership (and especially how guest-allocated buffers might be reused). @Luke Wagner can perhaps comment on that.
See e.g. https://github.com/dicej/component-async-demo/blob/main/http-echo/src/lib.rs for an example of spawning a task in the guest which continues to execute after the function that spawned it returns.
Yep, that's a good summary @Joel Dice. Adding a bit more detail: the guest registers a buffer with a stream returned by an async operation (with the buffer represented as a dynamic i32
pointer and the memory identified by a static memidx
), and then the runtime can an operation to the io_uring's request queue, storing a translated pointer into the guest's linear memory, and then when the corresponding completion event is read, the runtime can notify the guest that it's buffer has been filled in.
I should note that the ABI I'm using for stream
s in https://github.com/dicej/component-async-demo is entirely made up and subject to change -- particularly regarding when buffers are allocated. The overall control flow is probably close to where we'll end up, though.
Well, I wouldn't expect that support for io-uring would be baked into the ABI, I just mentioned it because I am interested in exploring it when the tools here are available and it is a good example of what makes async interesting for platform developers and end users.
I don't know that the model described above where the buffer belongs to the guest's linear memory will fly with most rust/io-uring folks, unless you are planning more control over when the guest's memory might be returned if the guest is terminated early. Most io-uring frameworks written in Rust go to great pains to pass ownership of the buffer to the io-uring driver, and hence, the kernel, so there is no chance the kernel might be writing to memory that has already been repurposed through other heap allocations. The buffer ownership model is what has kept io-uring support from being slipped into the existing Tokio runtime. Tokio io has been built on the idea that a reference to the buffer could be passed to reads and writes, because the buffer is only actually used once the OS has indicated the fd is ready so the read or write is actually done synchronously. The io-uring interface is very different; whether or not a syscall is used to give the buffer address to the kernel, there is nothing that a thread blocks on while the kernel reads or writes to the buffer, so the application could be doing anything with that memory if it hadn't given up ownership - for example the Future it was used by could be cancelled and now the buffer's drop function is called. With a design that isn't careful, a Wasm guest could be dropped, or the guest's Future could be cancelled.
And that is only one of the ways the io-uring would use a buffer - the io-uring operation is handed a user space buffer address. But this doesn't scale well for servers that may want to service thousands of connections at a time. An even better buffer model that io-uring provides is that it is preloaded with a pool of buffers and the pool is assigned a pool-id, and another form of the read operations take a pool-id as input. So the kernel already owns the memory it is going to write into (for the read operation) and when it is done, the buffer id of the pool is passed back to the user process so the user process can know which buffer from the pool was picked by the kernel. Once the user process has finished with the buffer, the buffer id is returned to the kernel's pool. In this way, the kernel can have thousands of TCP connections open but only use a pool of tens or hundreds of preallocated buffers. Here the buffer would likely be copied into the guest so it can be freed, or would be made available as a Resource. I've wondered if the buffer is known to contain a serialized piece of data, like an Rkyv archive message, if the deserialize code could be on the guest side but that might not square with the Wasm memory model. So a copy of the io-uring buffer in host space to an allocated buffer in the guest space - it's not the worse solution.
And there are many reasons for making a copy, so the trick for a given application with a particular kind of TCP or UDP message coming in from the line will be to pick the model that suits them best.
So looking through the links provided above, to see how Futures, or whatever they are called around scheduled work, on the guest and host sides should go a long way in understanding what will be possible and how Futures are represented across the ABI.
No-one has said the system being evolved will preclude multiple async operations from being run concurrently by the guest to the same Resource, so that's nice.
@Frank Rehwinkel Regarding "ownership" of the memory during the async op, I think there are different meanings of the word "ownership" here and we might actually be agreeing on your core point. When I said the memory is "owned" by the guest, I just meant that the memory passed into the io_uring operation could be a pointer directly into the guest's linear memory (as opposed to host memory that then needs a subsequent copy into guest linear memory). But once a pointer into this linear memory is passed into the kernel (via io_uring or, on Windows, Overlapped IO), then a separate question is: what happens if the guest wants to cancel the I/O operation -- can the guest do this synchronously or does the guest have to wait until the cancellation is acknowledged by the kernel. I think the latter is what's required by io_uring and Overlapped I/O and other async frameworks so that's what we should do in the design of the C-M cancel. For languages or contexts (say a C++ dtor) where cancellation must be sync, there would still be an option to synchronously wait for cancellation to finish (which will still allow tasks in other components to make progress). With this async cancellation design, you could sortof say that the guest-selected memory is "owned" by the runtime (and kernel) for the duration of async operation (although it's important to note that the wasm guest would still be physically able to read and write to this memory in the interim without trapping, even if they "shouldn't").
Regarding the buffer pools: yes that makes sense as an optimization and I've pondered it a bit in the background. Although probably not in the initial release, I've been imagining the stream ABI could allow a component to eagerly create pools of buffers that could then be read into from one or more streams that the component is reading from. That being said, if a host has a large number of components all independently reading from a large number of (necessarily-independent) streams, the host wouldn't be able to have a single pool of all the components' buffers since it's necessary for a read from a stream to only write into a buffer of the component reading that specific stream. I guess an alternative is to have a global pool of host-owned buffers and do the extra copy into guest memory... it's a question of what is net better performance.
Last updated: Dec 23 2024 at 12:05 UTC