Stream: git-wasmtime

Topic: wasmtime / PR #3128 Re-implement atomic load and stores


view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 15:01):

sparker-arm opened PR #3128 from aarch64-atomics to main:

This patch has two components to it:

New nodes were created because it makes sense to try to make these
memory operations similar to the rest. The lowering from wasm to
cranelift sometimes introduced an uxtend but, it seems, all the
backends were generating the uextend with their loads for the
smaller data types and, I think, conversely for the stores.

The AArch64 support was a bit broken and was using Armv7 style
barriers, which aren't required with Armv8 acquire-release
load/stores.

The fallback CAS loops and RMW, for AArch64, have also been updated
to use acquire-release, exclusive, instructions which, again, remove
the need for barriers. The CAS loop has also been further optimised
by using the extending form of the cmp instruction.

Copyright (c) 2021, Arm Limited.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 15:57):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 15:57):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 15:57):

bjorn3 created PR review comment:

Same here.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 15:57):

bjorn3 created PR review comment:

Please don't change this. In rustc_codegen_cranelift I use atomic_load to load i8 and i16 values.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 15:57):

bjorn3 created PR review comment:

Would it make sense to keep ARMv7 support and only emit the new sequence for ARMv8?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 15:58):

bjorn3 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 16:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 16:52):

cfallin created PR review comment:

We certainly will need a similar sequence when we complete the ARM32 (ARMv7) backend, but that doesn't share code with the aarch64 (aka ARMv8) one we're in here, so I think it's fine/expected that we avoid ARMv7-isms here.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 16:53):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 16:53):

cfallin created PR review comment:

+1, it's better to be consistent with normal load instructions here and accept any type, I think.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 17:10):

uweigand submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 17:10):

uweigand submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 17:10):

uweigand created PR review comment:

This is wrong, because the lrvh instruction does not perform any extension. To support this, you'd have to do the same that is currently done for a Uload16 and emit an explicit Extend after the load-reverse.

Also, I think you need to support the same set of operations for the little-endian case as for the big-endian case, which means the 16->64 and 32->64 extension cases also should be added. They need to be implemented via the same explicit Extend after the load.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 17:10):

uweigand created PR review comment:

The test case is likewise wrong. You can look at load-little.clif for a complete set of load test cases. Those should be usable for atomic loads as well (aligned loads are atomic by default on the architecture).

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 17:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 17:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 29 2021 at 17:52):

cfallin created PR review comment:

Is there any reason we can't instead pattern-match a uextend on an atomic_load result?

The general rule we've tried to stick to (see some recent discussion here) is that if we have a behavior that is just the combination of two opcodes, we should pattern-match on the primitives rather than introduce a new combo op at the CLIF level. There are a number of advantages to this approach in terms of simplicity, generality, approachability, and maintainability; the main downside is that it's a little more work at lowering time.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 30 2021 at 07:43):

sparker-arm submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 30 2021 at 07:43):

sparker-arm created PR review comment:

I feel if we have instructions that virtually every backend will perform the same pattern matching and combinations on, then it makes sense to have it in the IR. It originally seems to have been a big enough downside that no backend was pattern matching and instead just made assumptions about how atomic load was used, which isn't great! But if not separate instructions, is there a way that we can attach some form of extension/truncation attribute on the load and store instructions? This could be a nice way of refactoring all the memory operations.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 30 2021 at 07:45):

sparker-arm submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 30 2021 at 07:45):

sparker-arm created PR review comment:

Okay, thanks.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 30 2021 at 07:51):

sparker-arm edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 30 2021 at 08:15):

sparker-arm submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 30 2021 at 08:15):

sparker-arm created PR review comment:

Okay.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 03 2021 at 15:40):

sparker-arm submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 03 2021 at 15:40):

sparker-arm created PR review comment:

So I'm attempting to revert the IR changes, so we have the atomic_load + uextend. So during lowering, I'm trying to pattern match from the uextend, which I think works fine. But then I'm assuming that the AtomicLoad node is also being lowered so my output code ends up with two loads. Is there a way to explicitly replace a value or remove an instruction?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 03 2021 at 17:43):

cfallin created PR review comment:

Ah, interesting! So the lowering framework keeps track of reference counts for values; instructions are lowered in a backward pass where inputs are only produced if something depends on them, or if they have side-effects. However in this case the load is considered to have a side effect (it can trap). This actually needs slightly different handling; it's not legal to duplicate the load (imagine if e.g. a store interleaved).

In cases where we merge an effectful op (like a load), we need to call sink_inst() on the context as well; see this comment. An example of how this is used is here in the x64 backend.

Sorry this is a bit subtle!

view this post on Zulip Wasmtime GitHub notifications bot (Aug 03 2021 at 17:43):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 03 2021 at 18:21):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 03 2021 at 18:21):

bjorn3 created PR review comment:

is_mergeable_load returns None for ty.bits() < 32.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 03 2021 at 18:39):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 03 2021 at 18:39):

cfallin created PR review comment:

That's specific to the x64 backend, because of the way that narrow loads become wider in some cases; I imagine that doesn't apply here since the widening loads don't reach beyond their width.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2021 at 09:26):

sparker-arm updated PR #3128 from aarch64-atomics to main.


Last updated: Nov 22 2024 at 16:03 UTC