Stream: wasmtime

Topic: Component Model: Passing Large Buffers


view this post on Zulip Andrew Fields (Feb 03 2024 at 14:46):

Hello all,

I am using Rust and WASM to implement a simulation engine. In short, the Rust host process handles graphics, config, etc. and WASM modules are doing the processing. My current architecture would call for modules to process many "large" (many MB) buffers as quickly as possible.

My problem therefore is passing these buffers around between the WASM modules. Resource types solve the main problem of being able to pass a buffer around by reference. But now my problem is: how do you get the data out of a Resource in an efficient manner? It seems you either have to play nice with the type system (get one record at a time or return an _owned_ list, which requires copying) or you return a memory offset and calculate the pointer in the module.

To highlight this, here's a toy Buffer WIT definition showing a few different ways to get at the data in a resource:

interface buffer-resource {
  resource buffer {
    constructor();
    get-single: func() -> float32;
    get-chunk: func() -> list<float32>;
    get-chunk-pointer: func() -> u32;
  }
}

If you try to implement this on the Host, you get something like this:

impl HostBuffer for BufferManager {
    fn new(&mut self) -> wasmtime::Result<Resource<Buffer>> {
        todo!()
    }

    fn get_single(
        &mut self,
        self_: Resource<Buffer>,
    ) -> wasmtime::Result<f32> {
        todo!()
    }

    fn get_chunk(
        &mut self,
        self_: Resource<Buffer>,
    ) -> wasmtime::Result<Vec<f32>> {
        todo!()
    }

    fn get_chunk_pointer(
        &mut self,
        self_: Resource<Buffer>,
    ) -> wasmtime::Result<u32> {
        todo!()
    }

    fn drop(&mut self, rep: Resource<Buffer>) -> wasmtime::Result<()> {
        todo!()
    }
}

Again, I'm calling out the owned Vec<f32>.

If trying to implement this in the guest, you get this:

impl GuestBuffer for Buffer {
    fn new() -> Self {
        todo!()
    }

    fn get_single(&self,) -> f32 {
        todo!()
    }

    fn get_chunk(&self,) -> wit_bindgen::rt::vec::Vec::<f32> {
        todo!()
    }

    fn get_chunk_pointer(&self,) -> u32 {
        todo!()
    }
}

What I _think_ the "best" way for my goals is to do the following:

  1. implement a buffer resource type on the Host
  2. allocate data from linear memory
  3. return memory offsets
  4. make a library that all of the processing modules can use to do the offset -> slice unsafe conversion

However, I would really like to play nicer with the type system in the Component Model and avoid unsafe code. I would love to get some feedback on whether this is the right way or if there is some other approach that would be more idiomatic.

Thank you!

view this post on Zulip Alex Crichton (Feb 03 2024 at 17:59):

If your use case requires quickly sending MB of arrays and the current implementations aren't fast enough then the component model may not be a great fit for your use case right now. You've already found the "main" way of doing this which is to use resource handles, but as you've seen the component model has no primitive notion of a buffer at this time so the only option is to take/return list<f32>. Also as you've seen bindings generators change that to Vec<f32> which is an owned allocation which implies copies.

It's technically possible to achieve what you want with the component model right now, but it will require you to not use bindings generators for the function that transfers things. For example Wasmtime has wasmtime::component::WasmList which is a list that lives in wasm and isn't copied. In Rust you'd have to export your own function which does not have a post-return to clean up the allocation because you wouldn't return an owned allocation. In that sense it's technically feasible to achieve zero-copy transfers, but it's not easy today and bindings generators are not currently built to enable this.

That's why I say that the component model may not be a great fit for this use case right now. If you're hesitant to hack on bindings generators a copy will be required today. If you feel ok diving into all the details here and hacking on bindings generators and/or writing code that's at the bindings generator level, then you can probably achieve this. It'll require a nontrivial amount of knowledge about how lifting/lowering all interact in the component model.

All that being said this is a use case I'd love to at least personally see enabled, so I'd be happy to help out with questions and guidance if you'd like. One thing I'll point out though:

What I _think_ the "best" way for my goals is to do the following:

One point to keep in mind is that the Component Model as an abstraction doesn't allow embedders to see the raw whole linear memory of the guest module. In that sense there's not even an unsafe way to implement what you outlined above. The "unsafe" way is WasmList<f32> (plus adding an as_le_slice method for f32 which doesn't currently exist) and then plumbing that through the host bindings. The guest side will need to be handwritten at this time

view this post on Zulip Andrew Fields (Feb 03 2024 at 21:58):

Thanks Alex for the detailed response.

I'm definitely interested in figuring out a way to do this and document it. If I get deep enough in the weeds I wouldn't mind contributing to the bindgen projects to make it work.

With that being said, it sounds like the things I need to be looking at are:

  1. Unpacking how the bindgen function references actually call WASM code
  2. Implement as_le_slice for WasmList<f32>
  3. Copying the above bindgen mechanism and returning a WasmList<f32> that does a mem::forget or similar

One point to keep in mind is that the Component Model as an abstraction doesn't allow embedders to see the raw whole linear memory of the guest module.

I was not aware of this. In my prototyping I must have just got lucky when passing either the pointer directly as a u64 or a memory offset as a u32.

view this post on Zulip Christof Petig (Feb 04 2024 at 06:56):

I feel this is the use case I tried to address by proposing borrow<list<T>> as a return type.
This would enable reusing pre-allocated buffers in the guest without the convention to free the buffer after use.

While this is equivalent to manually returning an address (s32), plus a length here, it still enables other guest languages to make sense of the return value.

Of course a non-owned list is an alien data type in non-system languages. Sadly there also is no good way to express the lifetime of this buffer in wit.

Writing into a list provided by the guest as an argument (a guest side array as the "out" argument to the function) is a more memory safe way to express this, but now the host can't predict the address before the call. The function could still return the number of valid elements.

So perhaps a guest side resource is needed controlling the buffer lifetime (the host could retrieve the address via a method returning a borrow<list>, see above) and possibly a pollable would indicate new data valid, but then there is no good way to avoid overwriting data still used on the guest side without a locking mechanism.

It feels like this quickly evolves into a complex mechanism, perhaps working on preview3's stream<T> is the most elegant and near term solution. :thinking:

view this post on Zulip Christof Petig (Feb 04 2024 at 08:27):

:thinking: perhaps iceorix2 provides the right abstraction to model this after, I am going to take a closer look there

view this post on Zulip Christof Petig (Feb 04 2024 at 08:31):

For reference, here is a link to the previous discussion: https://bytecodealliance.zulipchat.com/#narrow/stream/327223-wit-bindgen/topic/borrowing.20records.3F.20.28shared.20data.29/near/379740418

view this post on Zulip Christof Petig (Feb 04 2024 at 08:54):

It looks like iceoryx2 is different from iceorix 2.x (which I wrongly linked to), iceoryx2 (the faster Rust rewrite of iceoryx) lives at https://github.com/eclipse-iceoryx/iceoryx2

Eclipse iceoryx2™ - true zero-copy inter-process-communication in pure Rust - GitHub - eclipse-iceoryx/iceoryx2: Eclipse iceoryx2™ - true zero-copy inter-process-communication in pure Rust

view this post on Zulip Alex Crichton (Feb 05 2024 at 14:45):

With that being said, it sounds like the things I need to be looking at are:

That seems about right! Note that you in theory should not need mem::forget since buffers can still be properly managed even without it (e.g. no copies made). I have not yet implemented this, however, so I can't say that for sure.

I was not aware of this. In my prototyping I must have just got lucky when passing either the pointer directly as a u64 or a memory offset as a u32.

I'll note that there's a distinction between core wasm and components here. Core wasms inside of a component can indeed export their memories, it's components themselves that can't export memories (only primitives in the component model which does not include memories).

view this post on Zulip Alex Crichton (Feb 05 2024 at 14:48):

It feels like this quickly evolves into a complex mechanism, perhaps working on preview3's stream<T> is the most elegant and near term solution.

This is definitely something AFAIK that stream<T> is intended to help solve. I'll note that stream<T> doesn't currently have a concrete design in the ways of "here's what you would do to solve this exact problem", however, but now's the right time to feed in design constraints!


Last updated: Nov 22 2024 at 16:03 UTC