Stream: git-wasmtime

Topic: wasmtime / issue #7946 `LinearMemory` thread-safety clari...


view this post on Zulip Wasmtime GitHub notifications bot (Feb 15 2024 at 18:00):

V0ldek opened issue #7946:

I am trying to write my own LinearMemory implementation. The issue that I am facing is that it appears the as_ptr function is inherently incompatible with the Sync trait bound.

In essence, as_ptr allows interior mutability, since it's a &self function that returns a mut pointer to data owned by the memory. Interior mutability is explicitly not Sync, though.

Assume T: LinearMemory. It must also be Sync, so &T must be safely shareable between threads. However, then two threads can call as_ptr, _which is a safe function_, and obtain *mut T. At that point we're sharing mutable pointers concurrently and all bets are off. So any useful T cannot be Sync in my mind.

As far as I can grok what the runtime is doing with the memory, it only uses LinearMemory implementations from a single thread as long as wasm threading and SharedMemory is not involved. So it kinda seems like I am required to lie to the Rust compiler that my type is Sync and then trust wasmtime that it will not do anything nasty with it.

I looked at the implementation of MmapMemory used by the actual runtime, and underneath in sys::unix::mmap::Mmap you just wrap the raw pointer in SendSyncPtr and pretend it's Sync, so I am assuming that this is what would be expected from someone implementing LinearMemory from the outside as well.

Here are my questions:

  1. First, am I even correct above or is there something I'm missing?
  2. Could we get this documented in LinearMemory to guide implementors? For example, is it correct to assume that as_ptr will never be called from different threads on the same instance if wasm threads and shared memory are not involved? If yes, then at least I can rest assured that as long as I use my T: LinearMemory only in a single-threaded context with single-threaded wasmtime nothing bad will happen.
  3. Can this Sync requirement be lifted altogether? It seems that since LinearMemory inherently requires interior mutability it'd make sense for it to _not_ be Sync, and instead have wasmtime handle it unsafely itself, for example by wrapping it in a type with unsafe impl Sync that is internal to the runtime. That way it'd guaranteed that thread madness can only happen inside wasmtime.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 15 2024 at 18:00):

V0ldek edited issue #7946:

I am trying to write my own LinearMemory implementation. The issue that I am facing is that it appears the as_ptr function is inherently incompatible with the Sync trait bound. In essence, as_ptr allows interior mutability, since it's a &self function that returns a mut pointer to data owned by the memory. Interior mutability is explicitly not Sync, though.

Assume T: LinearMemory. It must also be Sync, so &T must be safely shareable between threads. However, then two threads can call as_ptr, _which is a safe function_, and obtain *mut T. At that point we're sharing mutable pointers concurrently and all bets are off. So any useful T cannot be Sync in my mind.

As far as I can grok what the runtime is doing with the memory, it only uses LinearMemory implementations from a single thread as long as wasm threading and SharedMemory is not involved. So it kinda seems like I am required to lie to the Rust compiler that my type is Sync and then trust wasmtime that it will not do anything nasty with it.

I looked at the implementation of MmapMemory used by the actual runtime, and underneath in sys::unix::mmap::Mmap you just wrap the raw pointer in SendSyncPtr and pretend it's Sync, so I am assuming that this is what would be expected from someone implementing LinearMemory from the outside as well.

Here are my questions:

  1. First, am I even correct above or is there something I'm missing?
  2. Could we get this documented in LinearMemory to guide implementors? For example, is it correct to assume that as_ptr will never be called from different threads on the same instance if wasm threads and shared memory are not involved? If yes, then at least I can rest assured that as long as I use my T: LinearMemory only in a single-threaded context with single-threaded wasmtime nothing bad will happen.
  3. Can this Sync requirement be lifted altogether? It seems that since LinearMemory inherently requires interior mutability it'd make sense for it to _not_ be Sync, and instead have wasmtime handle it unsafely itself, for example by wrapping it in a type with unsafe impl Sync that is internal to the runtime. That way it'd guaranteed that thread madness can only happen inside wasmtime.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 15 2024 at 18:08):

V0ldek commented on issue #7946:

BTW the "Memory Safety and Threads" section starts with

Currently the wasmtime crate does not implement the wasm threads proposal, but it is planned to do so.

I think this is outdated now, there is threads support :)

view this post on Zulip Wasmtime GitHub notifications bot (Feb 15 2024 at 19:42):

alexcrichton commented on issue #7946:

First, am I even correct above or is there something I'm missing?

You're both correct and incorrect a bit, I can try to help clear this up a bit. I'll note that I'm no expert in unsafe Rust so what follows is my own personal understanding. I may have some exact specifics slightly off, but I think the high-level is right.

To Rust *const T and *mut T are the same in terms of semantic guarantees. You can read from both and while the compiler requires *mut T to write you can safely cast *const T to *mut T so you can sort of write through a *const T as well. In that sense when you say:

At that point we're sharing mutable pointers concurrently and all bets are off

I believe that this is incorrect. If we were talking about &mut T then I believe your statement is correct, but *mut T is different in this regard.

The way I sort of think of it is that &mut T is statically safe and *mut T must be "runtime safe". When a *mut T is mutated it must, at that time, be the only mutator. When *mut T isn't actually being accessed though you can have as many floating around as you'd like.

Could we get this documented in LinearMemory to guide implementors?

Definitely makes sense to me to improve the documentation here!

For example, is it correct to assume that as_ptr will never be called from different threads on the same instance if wasm threads and shared memory are not involved?

To answer this: probably not. The Memory::data API only requires &Store<T> which means that it can be called concurrently on many threads. This exact API does not literally call LinearMemory::as_ptr but it theoretically could from the runtime's perspective. So you shouldn't rely on being only called on one thread at a time, even when wasm threads aren't involved.

Note, though, that you can create *mut u8 from &[u8] via foo.as_ptr().cast_mut() in safe Rust.

Can this Sync requirement be lifted altogether?

No.

The reasoning here is a little nuanced, but the main idea is that everything about Send and Sync is required to make anything about wasmtime::Store<T> both Send and Sync. Note, however, that Sync does not mean "no interior mutability" nor "no mutation". For example Vec<T> is Sync despite allowing mutation, explicitly because all mutation requires &mut T. Also note that types like Mutex<T> and AtomicUsize are both Sync while allowing interior mutation.


That's a lot of words, but my hunch is it won't be the most satisfying answer to you. I'm more than happy to keep answering questions though! We can also chat on Zulip for something a little less async if you'd like too.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 16 2024 at 09:28):

V0ldek commented on issue #7946:

Hey, thanks for the response :) I prefer async comms for now since it takes me quite a bit of time to formulate these points coherently - this stuff is hard!

I think you're technically right about raw pointers being special. They're not Sync/Send more as a lint than anything else, simply because actually doing anything with a pointer (read/write) requires unsafe anyway.

Note, however, that Sync does not mean "no interior mutability" nor "no mutation". For example Vec<T> is Sync despite allowing mutation, explicitly because all mutation requires &mut T. Also note that types like Mutex<T> and AtomicUsize are both Sync while allowing interior mutation.

Yes, of course, but that is precisely my point -- the thing that makes these types safe is that they don't have an API that's just "turn a &self into a mutable pointer". For example, Mutex gives you a guard whose lifetime is the same as &self that ensures thread-safety. But doing something like that with LinearMemory is impossible. I cannot, for example, have struct T { mem: RwLock<*mut u8> } implement LinearMemory, because the as_ptr function won't allow me to take a lock and return the guard.

Let me rephrase my questions to hopefully make this discussion productive :) I'm mostly interested in being able to be reasonable sure that my implementation of LinearMemory isn't going to cause undefined behaviour. Here is what _I believe_ to be the complete list of things that might cause multiple threads to mutably access my T: LinearMemory:

  1. I use T in my code myself and share it between threads and then do nasty stuff with it.
  2. I use wasmtime with multi-threaded wasm code, in which case obviously thread-unsafety in the wasm being ran will result in thread-unsafety in the shared memory.
  3. I misuse the Memory API in my own code, for example holding mutable accesses across wasm calls, or other things that are already listed in the Memory docs as unsafe.

Crucially, note that 1. and 3. are completely "my problems", i.e. I can audit my own code to detect violations of those. Point 2. is a natural consequence of running arbitrary wasm code, but it also carries an implication that if I were to audit all wasm code that runs I would prevent violations.

In other words it'd be good to get some strong guarantee from wasmtime about how exactly the memory will be used, and put it explicitly on LinearMemory docs. Then I could audit my code for violations of the contract and be relatively certain there won't be UB creeping up.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 16 2024 at 09:29):

V0ldek edited a comment on issue #7946:

Hey, thanks for the response :) I prefer async comms for now since it takes me quite a bit of time to formulate these points coherently - this stuff is hard!

I think you're technically right about raw pointers being special. They're not Sync/Send more as a lint than anything else, simply because actually doing anything with a pointer (read/write) requires unsafe anyway.

Note, however, that Sync does not mean "no interior mutability" nor "no mutation". For example Vec<T> is Sync despite allowing mutation, explicitly because all mutation requires &mut T. Also note that types like Mutex<T> and AtomicUsize are both Sync while allowing interior mutation.

Yes, of course, but that is precisely my point -- the thing that makes these types safe is that they don't have an API that's just "turn a &self into a mutable pointer". For example, Mutex gives you a guard whose lifetime is the same as &self that ensures thread-safety. But doing something like that with LinearMemory is impossible. I cannot, for example, have struct T { mem: RwLock<*mut u8> } implement LinearMemory, because the as_ptr function won't allow me to take a lock and return the guard.

Let me rephrase my questions to hopefully make this discussion productive :) I'm mostly interested in being able to be reasonable sure that my implementation of LinearMemory isn't going to cause undefined behaviour. Here is what _I believe_ to be the complete list of things that might cause multiple threads to mutably access my T: LinearMemory:

  1. I use T in my code myself and share &T between threads and then do nasty stuff with it.
  2. I use wasmtime with multi-threaded wasm code, in which case obviously thread-unsafety in the wasm being ran will result in thread-unsafety in the shared memory.
  3. I misuse the Memory API in my own code, for example holding mutable accesses across wasm calls, or other things that are already listed in the Memory docs as unsafe.

Crucially, note that 1. and 3. are completely "my problems", i.e. I can audit my own code to detect violations of those. Point 2. is a natural consequence of running arbitrary wasm code, but it also carries an implication that if I were to audit all wasm code that runs I would prevent violations.

In other words it'd be good to get some strong guarantee from wasmtime about how exactly the memory will be used, and put it explicitly on LinearMemory docs. Then I could audit my code for violations of the contract and be relatively certain there won't be UB creeping up.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 16 2024 at 17:03):

alexcrichton commented on issue #7946:

These are good points, and if you're up for it I'd be happy to review a PR of docs for LinearMemory! Reviewing some code we may not actually end up using LinearMemory implementations for wasm shared memory with wasm threads, but that's sort of a bug in Wasmtime where the intention is that they're still used. I think this is just an oversight.

Otherwise though you're correct that (2) is the main Wasmtime-related thing to guarantee here, and yes custom memories when used with wasm threads can be mutated by wasm in multiple threads. It might be worth clarifying though that as_ptr(&self) -> *mut T, when called, does not represent an intent-to-mutate. It's possible to mutate with unsafe code but part of that contract of the unsafe is that it's done safely with respect to other mutations.

What I can say, though, is that if you're not dealing with shared memory then, yes, Wasmtime guarantees that mutations to memory will happen in at most one thread at a time. That's guaranteed by the nature of requiring &mut Store<T> whenever the memory is mutated and that mutable borrow serves as proof of "I'm the thread allowed to mutate right now"


Last updated: Oct 23 2024 at 20:03 UTC