sparker-arm opened PR #3128 from aarch64-atomics
to main
:
This patch has two components to it:
Introduce new cranelift nodes for uextend atomic loads and
truncating atomic stores.Reimplement AArch64 support and update RMW and CAS implementations.
New nodes were created because it makes sense to try to make these
memory operations similar to the rest. The lowering from wasm to
cranelift sometimes introduced an uxtend but, it seems, all the
backends were generating the uextend with their loads for the
smaller data types and, I think, conversely for the stores.The AArch64 support was a bit broken and was using Armv7 style
barriers, which aren't required with Armv8 acquire-release
load/stores.The fallback CAS loops and RMW, for AArch64, have also been updated
to use acquire-release, exclusive, instructions which, again, remove
the need for barriers. The CAS loop has also been further optimised
by using the extending form of the cmp instruction.Copyright (c) 2021, Arm Limited.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
bjorn3 submitted PR review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Same here.
bjorn3 created PR review comment:
Please don't change this. In rustc_codegen_cranelift I use
atomic_load
to load i8 and i16 values.
bjorn3 created PR review comment:
Would it make sense to keep ARMv7 support and only emit the new sequence for ARMv8?
bjorn3 edited PR review comment.
cfallin submitted PR review.
cfallin created PR review comment:
We certainly will need a similar sequence when we complete the ARM32 (ARMv7) backend, but that doesn't share code with the aarch64 (aka ARMv8) one we're in here, so I think it's fine/expected that we avoid ARMv7-isms here.
cfallin submitted PR review.
cfallin created PR review comment:
+1, it's better to be consistent with normal load instructions here and accept any type, I think.
uweigand submitted PR review.
uweigand submitted PR review.
uweigand created PR review comment:
This is wrong, because the lrvh instruction does not perform any extension. To support this, you'd have to do the same that is currently done for a Uload16 and emit an explicit Extend after the load-reverse.
Also, I think you need to support the same set of operations for the little-endian case as for the big-endian case, which means the 16->64 and 32->64 extension cases also should be added. They need to be implemented via the same explicit Extend after the load.
uweigand created PR review comment:
The test case is likewise wrong. You can look at load-little.clif for a complete set of load test cases. Those should be usable for atomic loads as well (aligned loads are atomic by default on the architecture).
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Is there any reason we can't instead pattern-match a
uextend
on anatomic_load
result?The general rule we've tried to stick to (see some recent discussion here) is that if we have a behavior that is just the combination of two opcodes, we should pattern-match on the primitives rather than introduce a new combo op at the CLIF level. There are a number of advantages to this approach in terms of simplicity, generality, approachability, and maintainability; the main downside is that it's a little more work at lowering time.
sparker-arm submitted PR review.
sparker-arm created PR review comment:
I feel if we have instructions that virtually every backend will perform the same pattern matching and combinations on, then it makes sense to have it in the IR. It originally seems to have been a big enough downside that no backend was pattern matching and instead just made assumptions about how atomic load was used, which isn't great! But if not separate instructions, is there a way that we can attach some form of extension/truncation attribute on the load and store instructions? This could be a nice way of refactoring all the memory operations.
sparker-arm submitted PR review.
sparker-arm created PR review comment:
Okay, thanks.
sparker-arm edited PR review comment.
sparker-arm submitted PR review.
sparker-arm created PR review comment:
Okay.
sparker-arm submitted PR review.
sparker-arm created PR review comment:
So I'm attempting to revert the IR changes, so we have the atomic_load + uextend. So during lowering, I'm trying to pattern match from the uextend, which I think works fine. But then I'm assuming that the AtomicLoad node is also being lowered so my output code ends up with two loads. Is there a way to explicitly replace a value or remove an instruction?
cfallin created PR review comment:
Ah, interesting! So the lowering framework keeps track of reference counts for values; instructions are lowered in a backward pass where inputs are only produced if something depends on them, or if they have side-effects. However in this case the load is considered to have a side effect (it can trap). This actually needs slightly different handling; it's not legal to duplicate the load (imagine if e.g. a store interleaved).
In cases where we merge an effectful op (like a load), we need to call
sink_inst()
on the context as well; see this comment. An example of how this is used is here in the x64 backend.Sorry this is a bit subtle!
cfallin submitted PR review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
is_mergeable_load
returnsNone
forty.bits() < 32
.
cfallin submitted PR review.
cfallin created PR review comment:
That's specific to the x64 backend, because of the way that narrow loads become wider in some cases; I imagine that doesn't apply here since the widening loads don't reach beyond their width.
sparker-arm updated PR #3128 from aarch64-atomics
to main
.
Last updated: Dec 23 2024 at 12:05 UTC