julian-seward1 opened PR #2149 from atomics-x64-CL
to main
:
… on newBE/x64. It does
this by providing an implementation of the CLIF instructions
AtomicRmw
,AtomicCas
,
AtomicLoad
,AtomicStore
andFence
.The translation is straightforward.
AtomicCas
is translated into x64cmpxchg
,AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing,AtomicStore
becomes a
normal store followed bymfence
, andFence
becomesmfence
.AtomicRmw
is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries tocmpxchg
it back to memory, and repeats if necessary.This is a minimum-effort initial implementation.
AtomicRmw
could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.The x64 emitter has been updated to emit the new instructions, obviously. The
LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
julian-seward1 updated PR #2149 from atomics-x64-CL
to main
:
… on newBE/x64. It does
this by providing an implementation of the CLIF instructions
AtomicRmw
,AtomicCas
,
AtomicLoad
,AtomicStore
andFence
.The translation is straightforward.
AtomicCas
is translated into x64cmpxchg
,AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing,AtomicStore
becomes a
normal store followed bymfence
, andFence
becomesmfence
.AtomicRmw
is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries tocmpxchg
it back to memory, and repeats if necessary.This is a minimum-effort initial implementation.
AtomicRmw
could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.The x64 emitter has been updated to emit the new instructions, obviously. The
LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
julian-seward1 updated PR #2149 from atomics-x64-CL
to main
:
… on newBE/x64. It does
this by providing an implementation of the CLIF instructions
AtomicRmw
,AtomicCas
,
AtomicLoad
,AtomicStore
andFence
.The translation is straightforward.
AtomicCas
is translated into x64cmpxchg
,AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing,AtomicStore
becomes a
normal store followed bymfence
, andFence
becomesmfence
.AtomicRmw
is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries tocmpxchg
it back to memory, and repeats if necessary.This is a minimum-effort initial implementation.
AtomicRmw
could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.The x64 emitter has been updated to emit the new instructions, obviously. The
LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
julian-seward1 requested bnjbvr for a review on PR #2149.
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Can you expand the names here, or at least add comments? I have no clues what M/L/S mean.
bnjbvr created PR Review Comment:
Maybe hoist the aarch64 version then, and hoist it in the
machinst
code or a newisa/common
directory?
bnjbvr created PR Review Comment:
nit: here and below, can you use doc comments instead, so they show up in LSP hovers/docs.rs, etc?
bnjbvr created PR Review Comment:
nit: we do not use
/*
comments in general, can you use//
instead?
bnjbvr created PR Review Comment:
With the new
load
helper introduced in the SIMD PR (that should land soonish, probably), you might be able to just useInst::load
here.
bnjbvr created PR Review Comment:
nit: can you expand
insn
+ precise that the second really means second in the loop?
bnjbvr created PR Review Comment:
Can you commonize all these match arms together?
=> Inst::alu_rmi_r(true, AluRmiROpcode::from(op), r10_rmi, r11_w)
to avoid code duplication
bnjbvr created PR Review Comment:
nit: here too, please don't use shorthands for read/write
bnjbvr created PR Review Comment:
nit (twice): what does
rd
mean?
bnjbvr created PR Review Comment:
nit: please expand
CAS
at least once, with acronym in parenthesis.
bnjbvr created PR Review Comment:
Can you open an issue for the improvements, please, and refer to it with a
TODO
comment mentioning the issue number?
bnjbvr created PR Review Comment:
ditto insn
bnjbvr created PR Review Comment:
Could we remove this comment? It's a bit weird to read this here, since there's a sequence of vcode insts, and the vcode inst actually trashing this register already mentions it does that by the code in
get_regs
.
bnjbvr created PR Review Comment:
Can you remove the
r_
prefixes, for consistency with the rest of this file?
bnjbvr created PR Review Comment:
I've tried to keep "Seq" in the name for synthetic sequences of instruction, can you put it as a suffix here please?
bnjbvr created PR Review Comment:
nit: here and below,
"mfence".to_string()
bnjbvr created PR Review Comment:
nit: expand insn
bnjbvr created PR Review Comment:
nit: can you expand
mod
too?
bnjbvr created PR Review Comment:
nit: expand instruction
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Well those names are what Intel calls them:
mfence
,sfence
andlfence
. I could expand them to what Intel describes them as: Memory Fence, Store Fence and Load Fence respectively. Or maybe I should just add comments on the enum?
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Yeah, I just spotted that.
abrown submitted PR Review.
abrown created PR Review Comment:
Should be in
main
now.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Done. Although the result is actually longer:
let i3 = if op == AtomicRMWOp::Xchg { AtomicRMWOp::Xchg => Inst::mov_r_r(true, r10, r11_w), } else { let alu_op = match op { AtomicRMWOp::Add => AluRmiROpcode::Add, AtomicRMWOp::Sub => AluRmiROpcode::Sub, AtomicRMWOp::And => AluRmiROpcode::And, AtomicRMWOp::Or => AluRmiROpcode::Or, AtomicRMWOp::Xor => AluRmiROpcode::Xor, AtomicRMWOp::Xchg => unreachable!(), } Inst::alu_rmi_r(true, alu_op, r10_rmi, r11_w) };
I tend to assume that rustc/LLVM will do tail-merging and hence cause all the duplication to disappear in the final machine code. I don't know that that's true, though.
julian-seward1 created PR Review Comment:
Read; I fixed all of these, and the wr and mod too.
julian-seward1 submitted PR Review.
julian-seward1 edited PR Review Comment.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Filed as PR #2153.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I moved it into a new file
src/machinst/inst_common.rs
.
julian-seward1 updated PR #2149 from atomics-x64-CL
to main
:
… on newBE/x64. It does
this by providing an implementation of the CLIF instructions
AtomicRmw
,AtomicCas
,
AtomicLoad
,AtomicStore
andFence
.The translation is straightforward.
AtomicCas
is translated into x64cmpxchg
,AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing,AtomicStore
becomes a
normal store followed bymfence
, andFence
becomesmfence
.AtomicRmw
is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries tocmpxchg
it back to memory, and repeats if necessary.This is a minimum-effort initial implementation.
AtomicRmw
could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.The x64 emitter has been updated to emit the new instructions, obviously. The
LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
julian-seward1 updated PR #2149 from atomics-x64-CL
to main
:
… on newBE/x64. It does
this by providing an implementation of the CLIF instructions
AtomicRmw
,AtomicCas
,
AtomicLoad
,AtomicStore
andFence
.The translation is straightforward.
AtomicCas
is translated into x64cmpxchg
,AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing,AtomicStore
becomes a
normal store followed bymfence
, andFence
becomesmfence
.AtomicRmw
is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries tocmpxchg
it back to memory, and repeats if necessary.This is a minimum-effort initial implementation.
AtomicRmw
could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.The x64 emitter has been updated to emit the new instructions, obviously. The
LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Done.
julian-seward1 updated PR #2149 from atomics-x64-CL
to main
:
… on newBE/x64. It does
this by providing an implementation of the CLIF instructions
AtomicRmw
,AtomicCas
,
AtomicLoad
,AtomicStore
andFence
.The translation is straightforward.
AtomicCas
is translated into x64cmpxchg
,AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing,AtomicStore
becomes a
normal store followed bymfence
, andFence
becomesmfence
.AtomicRmw
is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries tocmpxchg
it back to memory, and repeats if necessary.This is a minimum-effort initial implementation.
AtomicRmw
could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.The x64 emitter has been updated to emit the new instructions, obviously. The
LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
julian-seward1 updated PR #2149 from atomics-x64-CL
to main
:
… on newBE/x64. It does
this by providing an implementation of the CLIF instructions
AtomicRmw
,AtomicCas
,
AtomicLoad
,AtomicStore
andFence
.The translation is straightforward.
AtomicCas
is translated into x64cmpxchg
,AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing,AtomicStore
becomes a
normal store followed bymfence
, andFence
becomesmfence
.AtomicRmw
is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries tocmpxchg
it back to memory, and repeats if necessary.This is a minimum-effort initial implementation.
AtomicRmw
could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.The x64 emitter has been updated to emit the new instructions, obviously. The
LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
julian-seward1 requested bnjbvr for a review on PR #2149.
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
With the approach suggested above for Xchg, we could spare the r10 register here, alleviating register pressure.
bnjbvr created PR Review Comment:
nit: please make this a doc comment
///
bnjbvr created PR Review Comment:
Could we not emit it when the opcode is Xchg? (That is, push it down within the
else
branch below)
Or even better, it seems all the moves could be avoided in the case of Xchg, sincer10
could be passed as the read-only input of thecmpxchg
instruction? (I know x86 chips eats moves for dinner, but seems better to not do any useless decoding work if we can avoid it!)
bnjbvr created PR Review Comment:
Just an idea: if it's generating the same thing as a load, could the lowering be commonized with the rest of the
Load
-related instructions?
bnjbvr created PR Review Comment:
I sympathize with the need to fix this comment, but I think this one is a bit imprecise too: can you write
any virtual regs
instead of justany regs
?
bnjbvr created PR Review Comment:
Here, you can use
let rm = input_to_reg_mem(ctx, inputs[0]);
here, and remove the comment about using 0(addr).
bnjbvr created PR Review Comment:
I think you'll be able to use
lower_to_amode
once #2146 lands; if you happen to land before this, can you add a TODO in my PR please?
bnjbvr created PR Review Comment:
ditto for
lower_to_amode
bnjbvr created PR Review Comment:
nit: can you use the usual rust camelCasing:
AtomicRmwOp
please?
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Well, probably yes; but I'd prefer to keep it separate as it logically belongs to the atomics group. Also there is an atomics-specific assertion and atomics-specific comments there.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
We could do that. I'd prefer to leave such improvements to the followup PR #2153 though. Also, it could be possibly fixed even better, by using
lock xchg ..
in this case. That's definitely PR #2153 territory, though.
julian-seward1 created PR Review Comment:
That doesn't work. It produces
movzbq %v5Jb, %v6J
, but the original wasmovzbq 0(%v5J), %v6J
.
julian-seward1 submitted PR Review.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
lock xchg
is equivalent toxchg
: https://stackoverflow.com/questions/3144335/on-a-multicore-x86-is-a-lock-necessary-as-a-prefix-to-xchg
julian-seward1 updated PR #2149 from atomics-x64-CL
to main
:
… on newBE/x64. It does
this by providing an implementation of the CLIF instructions
AtomicRmw
,AtomicCas
,
AtomicLoad
,AtomicStore
andFence
.The translation is straightforward.
AtomicCas
is translated into x64cmpxchg
,AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing,AtomicStore
becomes a
normal store followed bymfence
, andFence
becomesmfence
.AtomicRmw
is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries tocmpxchg
it back to memory, and repeats if necessary.This is a minimum-effort initial implementation.
AtomicRmw
could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.The x64 emitter has been updated to emit the new instructions, obviously. The
LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
julian-seward1 merged PR #2149.
Last updated: Dec 23 2024 at 12:05 UTC