Safety when borrowing a subset of the wasm memory · wasmtime

Stream: wasmtime

Topic: Safety when borrowing a subset of the wasm memory

Benjamin Bouvier (Aug 10 2022 at 15:06):

In our embedding, we're using our own bindgen-like macro to generate bindings for our wasm modules to call into the host. It often happens that the host will read function parameters from the wasm memory, then write a result somewhere else in the wasm memory. In some situations, the wasm memory is thus both mutably borrowed and immutably borrowed (think for instance: streaming results as they're computed in the wasm memory, instead of stashing them before writing all of them at once in the wasm memory result sub-region).

We're trying to do this safely, but that basically having several mutable references to the underlying wasm memory, as these get/set happen at different places in the code. It seems hard to safely as this likely means that we'd need to implement some kind of dynamic checking ourselves to track which different subregions of the wasm memory are borrowed at the same time, and panic whenever a region written-to is borrowed more than once.

Is this a problem others have encountered in practice, and if so, how have you dealt with it?

Pat Hickey (Aug 10 2022 at 15:07):

Wiggle was designed to solve this problem

Pat Hickey (Aug 10 2022 at 15:08):

It has a dynamic borrow checker

Pat Hickey (Aug 10 2022 at 15:08):

It won’t panic, but rather return an error if the borrowing rules are broken, because the input pointers are untrusted (controlled by Wasm)

Pat Hickey (Aug 10 2022 at 15:10):

If you are using interfaces that aren’t (or can’t be) defined with witx, you can steal the bits of wiggle that does this work, but it should be relatively easy to reuse the GuestPtr parts of the crate without using the proc macro code generator

Benjamin Bouvier (Aug 10 2022 at 15:11):

Thanks, will read more about it.
Ok, so we could use wiggle, directly via rewriting our bindings with wit, or indirectly via integrating wiggle in our code base.

Benjamin Bouvier (Aug 10 2022 at 15:11):

Ah, here we go :-)

Pat Hickey (Aug 10 2022 at 15:22):

Wiggle is witx, not wit

Pat Hickey (Aug 10 2022 at 15:22):

But yeah

Pat Hickey (Aug 10 2022 at 15:23):

(I say that because it’s at a dead end and we likely won’t do any more real work on it, unless it’s to make concessions for adapting legacy stuff to wit)

Pat Hickey (Aug 10 2022 at 15:24):

its dead as in complete and proven in production and red team tests to be solid, not as in bad, though :)

Benjamin Bouvier (Aug 10 2022 at 16:22):

Ah interesting. Does wit have a similar mechanism built-in?

Alex Crichton (Aug 10 2022 at 19:10):

The wit-bindgen generator currently is able to largely sidestep this since memory is never both simultaneously mutably and immutably borrowed, with the component model no one ever gets a mutable view into memroy and it's the glue that manages writing to memory which means this alias checking is all bypassed

Alex Crichton (Aug 10 2022 at 19:10):

it does mean, however, that host APIs tend to look different

Jamey Sharp (Aug 10 2022 at 19:44):

one of the principles of the component model is that a component shouldn't be able to tell whether the other side of an interface is implemented by the host or by another component, and since components can't share memory with each other, the component model prohibits the kind of zero-copy optimization that Benjamin is doing—do I have that right?

Peter Huene (Aug 10 2022 at 19:49):

I believe so.

Alex Crichton (Aug 10 2022 at 19:53):

Well it's somewhat subtle, while you're right about inter-component communication this I think has to do with host APIs which are very different. The wit-bindgen bindings do in fact give zero-copy views into strings/list<u8>/etc, it's just that you don't get mutable windows even with the component model since there's basically no safe way to do that.

Peter Huene (Aug 10 2022 at 19:53):

although I will say it doesn't prohibit it, per se, just leaves such machinations up to the embedder to come up with. there's no representation of an "address" in the value types (unlike with the witx attributes), but a number is just a number; the embedder could figure out what linear memory to interact with (unsafely); another component can't access the same memory unless imported (e.g. a "shared everything libc" scheme)

Dan Gohman (Aug 10 2022 at 19:57):

Alternatively, with a stream type, hosts can read from and write to the buffers directly without special conventions.

Peter Huene (Aug 10 2022 at 20:07):

Actually, reading over the shared everything libc explainer again, I guess it's up to the adapter to see that the source and destination memories are the same and pass the lowered pointers straight though without a realloc/copy?

Peter Huene (Aug 10 2022 at 20:08):

for component-to-component (anyway, off topic for what Benjamin is talking about)

Last updated: Apr 09 2025 at 09:03 UTC