Stream: general

Topic: Canonical way to read host memory without copy cost?


view this post on Zulip Xinyu Zeng (Nov 15 2024 at 04:24):

Hi, I am wondering what is the canonical way in wasmtime to read host memory without the copy cost? I think one way is using imported function/callback but section 6 (page8,right half) of this paper says it is also costly because of the conversion of function parameters and return value. It also mentions some tricks to let different virtual memory (like one in host and the other in wasm) point to the same physical memory so the copy is saved. Is it even possible in wasmtime?

view this post on Zulip Lann Martin (Nov 15 2024 at 13:45):

I'm not sure exactly what you mean by "read host memory"; the wasm sandbox very intentionally only allows guests to read their own memories. If you are looking for a way to avoid copies between guests and the host, currently the only really viable option for most guest languages / toolchains is for the guest to allocate a buffer, pass the buffer offset ("pointer", from the guest's perspective) to the host, and for the host to operate on that buffer directly via e.g. Memory::data_mut.

view this post on Zulip Chris Fallin (Nov 15 2024 at 17:15):

@Xinyu Zeng the straightforward answer to your question would be that providing accessors as imports avoids the copy -- and it is asymptotically more efficient if you're reading only a small part of a buffer. This pattern occurs today in ad-hoc ways already -- for example wasi-http has a notion of handles to pieces of requests/responses (like sets of headers) and provides an API to query and mutate these; the whole body doesn't need to cross the boundary.

The cost issue of that (many tiny accessor calls) is a separate question and one that could be optimized if needed. A comment on the paper you link: it's important to understand what's "fundamental" and what's simply a property of an implementation, especially when discussing research papers. In that section they seem to be describing a limitation of V8 and/or the particular ABI they have chosen. But there's no reason that a Wasm engine in theory couldn't inline "accessor" functions across module boundaries, or the host/guest boundary, so that the access is as cheap as a local memory access. One could imagine an opaque reference as a handle in an API and accessors that provide access to parts of it. Wasmtime doesn't support such inlining today, either, but at least in the cross-module case, this is theoretically pretty clear (inlining IR across functions) and probably could be done.

view this post on Zulip Xinyu Zeng (Nov 16 2024 at 06:22):

Thanks a lot for the reply! I will try both approaches. Given the info I guess the "host operating via data_mut" method is the less costly one but it is hard to implement in my use cases.

view this post on Zulip Xinyu Zeng (Nov 26 2024 at 07:50):

Hi @Chris Fallin Could you refer me to the wasi-http code that uses imports to avoid the copy?

view this post on Zulip Chris Fallin (Nov 26 2024 at 08:27):

Look at the handling of headers for example — the “fields” resource type is a handle to data on the host side with individual get/set accessors. That’s a general pattern that is useful when you have “sparse” access to external data.

view this post on Zulip Xinyu Zeng (Nov 26 2024 at 10:07):

I am not very familiar with the component model...Maybe a direct question, IIUC, this way of avoiding copy using imports is only possible for small data, e.g., two i32, because the data has to be passed via function return values. Is that correct? I still do not understand how to zero-copy a whole buffer using imports...

view this post on Zulip Lann Martin (Nov 26 2024 at 13:40):

Guest components cannot read each other's memories by design. The only way to do zero-copy in the networking sense today is for the host to directly write to one guest's memory.

view this post on Zulip Joel Dice (Nov 26 2024 at 15:23):

In other words, zero-copy in this case would mean that the first time the data is written to memory (e.g. from the network), it should go straight into an already-allocated buffer that lives in the guest's linear memory.

  1. Allocate a buffer in guest memory (e.g. by calling some kind of malloc-style function exported by the guest). In the component model, the function would be cabi_realloc.
  2. Pass the pointer and length of that buffer to e.g. read(2), recv(2), or whatever
  3. Call a guest-exported function with the buffer pointer and length of data that was read
  4. Deallocate the buffer (e.g. by calling some kind of free-style function) or reuse the buffer to read more data.

view this post on Zulip Chris Fallin (Nov 26 2024 at 16:46):

The approach I was trying to sketch above -- poorly, sorry, so I'll expand it more here! -- is kind of a path to zero-copy, but to be very clear, it's not something that would be efficient today without more work. @Xinyu Zeng the idea is that even for large buffers, you can expose them only through accessors. Consider, e.g., a "resource handle" for a buffer or string; and an imported function that, given that handle and an index, returns the byte at that index. That gives you access to all the data without copying it in wholesale.

There are a number of tradeoffs though:

What I'm describing is in the same spirit as the "string imports" idea floating around in core Wasm spec discussions, and the idea is visible in a coarse granularity today (I mentioned headers in wasi-http because these are not copied wholesale into the guest; you use accessors to get pieces of them); but it's more of a direction we could follow in building out engine optimizations to make this efficient, and very much not something one can take off the shelf today. Sorry if this wasn't clear before!

view this post on Zulip Xinyu Zeng (Nov 27 2024 at 02:08):

Thanks @Lann Martin and @Joel Dice ! I understand this approach but sometimes it is not easy, e.g., the API of the "producer" of the data allocates its memory in host already e.g. return a Bytes.

view this post on Zulip Xinyu Zeng (Nov 27 2024 at 02:23):

Thanks @Chris Fallin a lot for the explanation. I kind of get what you mean. In an ideal world with perfect inlining this imported function call (the "get byte" accessor) behaves exactly like a raw memory read. Because it does not have "native view", if in guest we want to operate on a (large) buffer in host, we may need some wrapper code to turn every byte read to the buffer into that "get byte" accessor.

view this post on Zulip raskyld (Dec 15 2024 at 09:33):

A bit late to the thread but you may be interested by those issues:

@Christof Petig may have some insights in that as well since he has been actively working on building a Component Model abstraction over mapping part of the linear memory of guests to host memory IIUC the approach.

Introduction and Context There have been a number of conversations and questions raised about the possibility to share subsets of memory between multiple WebAssembly instances and the host environm...
This all started with defining zero copy shared memory over a WIT interface (channel is WIT resource, inspired by iceoryx2): let channel = Channel_u32::new("topic"); loop { let message = channel.al...

Last updated: Dec 23 2024 at 12:05 UTC