Stream: git-wasmtime

Topic: wasmtime / issue #5054 wiggle: allow wiggle to use shared...


view this post on Zulip Wasmtime GitHub notifications bot (Oct 13 2022 at 22:36):

abrown commented on issue #5054:

This was originally a part of #4949 but I could use this separately for experimenting with wasi-threads.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 13 2022 at 22:52):

github-actions[bot] commented on issue #5054:

Subscribe to Label Action

cc @kubkon

<details>
This issue or pull request has been labeled: "wasi"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Oct 17 2022 at 14:48):

alexcrichton commented on issue #5054:

Ah thanks for splitting this out, I missed this reading the other patch!

Unfortunately though I don't think this can be done, the unsafe here truly is unsafe and this is exposing a memory-vulnerabilty-in-waiting due to violating Rust's model of ownership. Or well it's not really specific to Rust per-se, but it's easiest to explain through Rust.

Wiggle aggressively takes advantage of an internal borrow checker and handing out slices which point raw into wasm linear memory. This means that when the host gets &mut [u8] or &[u8] or &str those are all pointers into raw wasm memory. While this is valid for single-threaded instances because the contents cannot change none of these Rust-based views are valid for shared memories because the contents can change at any time.

All host-based processing of data stored in linear memory needs to change with shared memories. All loads/stores need to be atomic and additionally nothing can ever reside in the linear memory itself. For example if a wiggle API gives a string to the host then the host needs to copy out the string and then validate utf-8, whereas I think the opposite order happens today.

I believe that solving this would require a fair bit of work and design within wiggle itself. Given the cost of atomic operations we probably can't do them by default. Additionally it may be too expensive to check "is this shared memory" at all interactions with linear memory from wiggle so having a runtime flag to process this may or may not be the best option as well.

Overall I haven't really thought much about how to solve this problem. We've known about this for quite some time now but it's been a moot issue due to the lack of support for the threads proposal in Wasmtime. Also FWIW many of the same issues apply to supporting memory64 in WASI and additionally supporting either threads or memory64 in the component model support within Wasmtime.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 17 2022 at 15:35):

abrown commented on issue #5054:

Overall I haven't really thought much about how to solve this problem. We've known about this for quite some time now but it's been a moot issue due to the lack of support for the threads proposal in Wasmtime.

One thought that I have had about this is to sequentialize host access from the WebAssembly threads, either by locking on _every_ host call. If this were done, then the Rust views to the underlying memory would be protected from concurrent modification. I looked for a way to do this here but am still struggling to figure out how to tell a Linker to "wrap every call with a Mutex::lock" while still passing the expected WASI context type through get_cx without dropping the lock. It seems to me like some refactoring may be needed.

The other idea I had was to lock calls within a WASI proposal, so that one could make concurrent calls to wasi-common and wasi-nn, e.g., but not within wasi-common itself. The motivation for this is that the contexts/states of each WASI proposal are distinct and there is no reason (along those lines) to lock everything. The point you raise above, however, suggests that locking is not just for protecting WASI contexts/states, but also to preserve the Rust expectations of the memory views used by host calls. So maybe I need to abandon this "lock calls within a WASI proposal."

view this post on Zulip Wasmtime GitHub notifications bot (Oct 17 2022 at 15:43):

alexcrichton commented on issue #5054:

Unfortunately I don't think either of those strategies would be sufficient. You would need something akin to "stop the world" semantics of GCs because threads not in hostcalls can stomp over memory that one hostcall is working with. This means that a single hostcall is all that's needed for things to go wrong, so locking hostcalls themselves won't be sufficient.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 03 2022 at 21:02):

penzn commented on issue #5054:

All host-based processing of data stored in linear memory needs to change with shared memories. All loads/stores need to be atomic and additionally nothing can ever reside in the linear memory itself. For example if a wiggle API gives a string to the host then the host needs to copy out the string and then validate utf-8, whereas I think the opposite order happens today.

Sorry to barge in :big_smile: Is this always necessary, for example, when host call involves memory area that only the instance initiating the call is using? In JS world accesses from outside are not atomic, unless they are explicitly made atomic.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 02:11):

alexcrichton commented on issue #5054:

Unfortunately, yes, all accesses need to be atomic in Rust. They can be a Relaxed atomic load, for example, but they need to be tagged as "something else can racily write to this location at any time", which Rust only has the ability to do so with atomics.

In JS world accesses from outside are not atomic

I don't think that this is correct because in JS the backing memory is in a SharedArrayBuffer and all reads/writes from that are done with an atomic ordering. If that's wrapped up in a Uint32Array and you read an index then it doesn't look atomic but I believe that under the hood it's implicitly doing an Unordered atomic ordering (at least according to my reading of the spec)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2022 at 00:06):

abrown commented on issue #5054:

@alexcrichton, I rebased this PR on top of all of the previous Wiggle work (https://github.com/bytecodealliance/wasmtime/pull/5225, https://github.com/bytecodealliance/wasmtime/pull/5229,
https://github.com/bytecodealliance/wasmtime/pull/5264) since Wiggle seems safe enough at this point. Optionally, we could merge this once the "thread safety of WASI contexts" story is figured out, but I don't know if that is necessary: if this were merged, users who attempted to use shared memory without taking into account it's "shared-ness" would see a panic error pointing them to https://github.com/bytecodealliance/wasmtime/issues/5235.

A side note: WasmtimeGuestMemory expects to have a &mut [u8]; to get there, this latest change uses a transmute. Since the only external access to this slice is from GuestMemory::base(&self) -> (*mut u8, u32), perhaps we just store those types instead? E.g.:

pub struct WasmtimeGuestMemory<'a> {
    ptr: *mut u8,
    len: u32,
    bc: BorrowChecker,
    shared: bool,
}

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2022 at 00:06):

abrown edited a comment on issue #5054:

@alexcrichton, I rebased this PR on top of all of the previous Wiggle work (https://github.com/bytecodealliance/wasmtime/pull/5225, https://github.com/bytecodealliance/wasmtime/pull/5229,
https://github.com/bytecodealliance/wasmtime/pull/5264) since Wiggle seems safe enough at this point. Optionally, we could merge this once the "thread safety of WASI contexts" story is figured out, but I don't know if that is necessary: if this were merged, users who attempted to use shared memory without taking into account it's "shared-ness" would see a panic error pointing them to https://github.com/bytecodealliance/wasmtime/issues/5235.

A side note: WasmtimeGuestMemory expects to have a &mut [u8]; to get there, this latest change uses a transmute. Since the only external access to this slice is from GuestMemory::base(&self) -> (*mut u8, u32), perhaps we just store those types instead? E.g.:

pub struct WasmtimeGuestMemory<'a> {
    ptr: *mut u8,
    len: u32,
    bc: BorrowChecker,
    shared: bool,
}

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2022 at 00:06):

abrown edited a comment on issue #5054:

@alexcrichton, I rebased this PR on top of all of the previous Wiggle work (https://github.com/bytecodealliance/wasmtime/pull/5225, https://github.com/bytecodealliance/wasmtime/pull/5229, https://github.com/bytecodealliance/wasmtime/pull/5264) since Wiggle seems safe enough at this point. Optionally, we could merge this once the "thread safety of WASI contexts" story is figured out, but I don't know if that is necessary: if this were merged, users who attempted to use shared memory without taking into account it's "shared-ness" would see a panic error pointing them to https://github.com/bytecodealliance/wasmtime/issues/5235.

A side note: WasmtimeGuestMemory expects to have a &mut [u8]; to get there, this latest change uses a transmute. Since the only external access to this slice is from GuestMemory::base(&self) -> (*mut u8, u32), perhaps we just store those types instead? E.g.:

pub struct WasmtimeGuestMemory<'a> {
    ptr: *mut u8,
    len: u32,
    bc: BorrowChecker,
    shared: bool,
}

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2022 at 02:27):

alexcrichton commented on issue #5054:

Reading over this I had some lingering reservations, even with using a *mut u8/usize combo. Rebasing this on https://github.com/bytecodealliance/wasmtime/pull/5268 would make me comfortable landing this, however.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2022 at 06:09):

penzn commented on issue #5054:

I don't think that this is correct because in JS the backing memory is in a SharedArrayBuffer and all reads/writes from that are done with an atomic ordering. If that's wrapped up in a Uint32Array and you read an index then it doesn't look atomic but I believe that under the hood it's implicitly doing an Unordered atomic ordering (at least according to my reading of the spec)

I don't think JavaScript requires all accesses to SAB to be atomic, only the ones that are done via [Atomics][a] object. Outside of that it allows for race conditions.

[a]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Atomics

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2022 at 06:09):

penzn edited a comment on issue #5054:

Sorry, slipped my mind.

I don't think that this is correct because in JS the backing memory is in a SharedArrayBuffer and all reads/writes from that are done with an atomic ordering. If that's wrapped up in a Uint32Array and you read an index then it doesn't look atomic but I believe that under the hood it's implicitly doing an Unordered atomic ordering (at least according to my reading of the spec)

I don't think JavaScript requires all accesses to SAB to be atomic, only the ones that are done via [Atomics][a] object. Outside of that it allows for race conditions.

[a]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Atomics

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2022 at 15:09):

alexcrichton commented on issue #5054:

Well in any case I'm no JS expert and it doesn't seem like either of us are itching to become one digging into JS implementations here. What I can say is that for Rust all accesses need to be atomic. Otherwise this is a data race, or UB, which the purpose of Wasmtime is to prevent.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2022 at 19:23):

penzn commented on issue #5054:

Well in any case I'm no JS expert and it doesn't seem like either of us are itching to become one digging into JS implementations here. What I can say is that for Rust all accesses need to be atomic. Otherwise this is a data race, or UB, which the purpose of Wasmtime is to prevent.

I'd say what JS engines exactly do is relevant only to a degree, the question is what can be implemented here both in terms of Rust semantics and what the project aims to achieve. I am slightly concerned about performance if every access to shared memory becomes atomic.

Just as an aside, JS spec mandates three things: (a) atomics to be honored in all circumstances, (b) writes to be observed exactly once, and (c) reads to be observed exactly once. It does not require a specific ordering of non-atomic accesses or that all bits are written or read for those.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2022 at 21:21):

alexcrichton commented on issue #5054:

Sorry I don't really have anything to add over that this is the absolute bare minimum required to be safe in Rust. There's simply no other option. If you're concerned about performance then the only "fix" I know of would be to make an RFC in upstream rust-lang/rust to add LLVM's "unordered" memory ordering to Rust's Ordering enum for atomic operations.


Last updated: Oct 23 2024 at 20:03 UTC