wasmtime / PR #3128 Re-implement atomic load and stores · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #3128 Re-implement atomic load and stores

Wasmtime GitHub notifications bot (Jul 29 2021 at 15:01):

sparker-arm opened PR #3128 from aarch64-atomics to main:

This patch has two components to it:

Introduce new cranelift nodes for uextend atomic loads and
truncating atomic stores.

Reimplement AArch64 support and update RMW and CAS implementations.

New nodes were created because it makes sense to try to make these
memory operations similar to the rest. The lowering from wasm to
cranelift sometimes introduced an uxtend but, it seems, all the
backends were generating the uextend with their loads for the
smaller data types and, I think, conversely for the stores.

The AArch64 support was a bit broken and was using Armv7 style
barriers, which aren't required with Armv8 acquire-release
load/stores.

The fallback CAS loops and RMW, for AArch64, have also been updated
to use acquire-release, exclusive, instructions which, again, remove
the need for barriers. The CAS loop has also been further optimised
by using the extending form of the cmp instruction.

Copyright (c) 2021, Arm Limited.

Wasmtime GitHub notifications bot (Jul 29 2021 at 15:57):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2021 at 15:57):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2021 at 15:57):

bjorn3 created PR review comment:

Same here.

Wasmtime GitHub notifications bot (Jul 29 2021 at 15:57):

bjorn3 created PR review comment:

Please don't change this. In rustc_codegen_cranelift I use atomic_load to load i8 and i16 values.

Wasmtime GitHub notifications bot (Jul 29 2021 at 15:57):

bjorn3 created PR review comment:

Would it make sense to keep ARMv7 support and only emit the new sequence for ARMv8?

Wasmtime GitHub notifications bot (Jul 29 2021 at 15:58):

bjorn3 edited PR review comment.

Wasmtime GitHub notifications bot (Jul 29 2021 at 16:52):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2021 at 16:52):

cfallin created PR review comment:

We certainly will need a similar sequence when we complete the ARM32 (ARMv7) backend, but that doesn't share code with the aarch64 (aka ARMv8) one we're in here, so I think it's fine/expected that we avoid ARMv7-isms here.

Wasmtime GitHub notifications bot (Jul 29 2021 at 16:53):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2021 at 16:53):

cfallin created PR review comment:

+1, it's better to be consistent with normal load instructions here and accept any type, I think.

Wasmtime GitHub notifications bot (Jul 29 2021 at 17:10):

uweigand submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2021 at 17:10):

uweigand submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2021 at 17:10):

uweigand created PR review comment:

This is wrong, because the lrvh instruction does not perform any extension. To support this, you'd have to do the same that is currently done for a Uload16 and emit an explicit Extend after the load-reverse.

Also, I think you need to support the same set of operations for the little-endian case as for the big-endian case, which means the 16->64 and 32->64 extension cases also should be added. They need to be implemented via the same explicit Extend after the load.

Wasmtime GitHub notifications bot (Jul 29 2021 at 17:10):

uweigand created PR review comment:

The test case is likewise wrong. You can look at load-little.clif for a complete set of load test cases. Those should be usable for atomic loads as well (aligned loads are atomic by default on the architecture).

Wasmtime GitHub notifications bot (Jul 29 2021 at 17:52):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2021 at 17:52):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Jul 29 2021 at 17:52):

cfallin created PR review comment:

Is there any reason we can't instead pattern-match a uextend on an atomic_load result?

The general rule we've tried to stick to (see some recent discussion here) is that if we have a behavior that is just the combination of two opcodes, we should pattern-match on the primitives rather than introduce a new combo op at the CLIF level. There are a number of advantages to this approach in terms of simplicity, generality, approachability, and maintainability; the main downside is that it's a little more work at lowering time.

Wasmtime GitHub notifications bot (Jul 30 2021 at 07:43):

sparker-arm submitted PR review.

Wasmtime GitHub notifications bot (Jul 30 2021 at 07:43):

sparker-arm created PR review comment:

I feel if we have instructions that virtually every backend will perform the same pattern matching and combinations on, then it makes sense to have it in the IR. It originally seems to have been a big enough downside that no backend was pattern matching and instead just made assumptions about how atomic load was used, which isn't great! But if not separate instructions, is there a way that we can attach some form of extension/truncation attribute on the load and store instructions? This could be a nice way of refactoring all the memory operations.

Wasmtime GitHub notifications bot (Jul 30 2021 at 07:45):

sparker-arm submitted PR review.

Wasmtime GitHub notifications bot (Jul 30 2021 at 07:45):

sparker-arm created PR review comment:

Okay, thanks.

Wasmtime GitHub notifications bot (Jul 30 2021 at 07:51):

sparker-arm edited PR review comment.

Wasmtime GitHub notifications bot (Jul 30 2021 at 08:15):

sparker-arm submitted PR review.

Wasmtime GitHub notifications bot (Jul 30 2021 at 08:15):

sparker-arm created PR review comment:

Okay.

Wasmtime GitHub notifications bot (Aug 03 2021 at 15:40):

sparker-arm submitted PR review.

Wasmtime GitHub notifications bot (Aug 03 2021 at 15:40):

sparker-arm created PR review comment:

So I'm attempting to revert the IR changes, so we have the atomic_load + uextend. So during lowering, I'm trying to pattern match from the uextend, which I think works fine. But then I'm assuming that the AtomicLoad node is also being lowered so my output code ends up with two loads. Is there a way to explicitly replace a value or remove an instruction?

Wasmtime GitHub notifications bot (Aug 03 2021 at 17:43):

cfallin created PR review comment:

Ah, interesting! So the lowering framework keeps track of reference counts for values; instructions are lowered in a backward pass where inputs are only produced if something depends on them, or if they have side-effects. However in this case the load is considered to have a side effect (it can trap). This actually needs slightly different handling; it's not legal to duplicate the load (imagine if e.g. a store interleaved).

In cases where we merge an effectful op (like a load), we need to call sink_inst() on the context as well; see this comment. An example of how this is used is here in the x64 backend.

Sorry this is a bit subtle!

Wasmtime GitHub notifications bot (Aug 03 2021 at 17:43):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Aug 03 2021 at 18:21):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Aug 03 2021 at 18:21):

bjorn3 created PR review comment:

is_mergeable_load returns None for ty.bits() < 32.

Wasmtime GitHub notifications bot (Aug 03 2021 at 18:39):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Aug 03 2021 at 18:39):

cfallin created PR review comment:

That's specific to the x64 backend, because of the way that narrow loads become wider in some cases; I imagine that doesn't apply here since the widening loads don't reach beyond their width.

Wasmtime GitHub notifications bot (Aug 05 2021 at 09:26):

sparker-arm updated PR #3128 from aarch64-atomics to main.

Last updated: Apr 17 2025 at 08:04 UTC