cfallin requested abrown, bnjbvr and julian-seward1 for a review on PR #2539.
cfallin opened PR #2539 from x64-i128
to main
:
This PR is a followup to #2538 and is dependent on it.
x64 backend: implement 128-bit ops.
This implements all of the ops on I128 that are implemented by the
legacy x86 backend, and includes all that are required by at least one
major use-case (cg_clif rustc backend).The sequences are open-coded where necessary; for e.g. the bit
operations, this can be somewhat complex, but these sequences have been
tested carefully. This PR also includes a drive-by fix of clz/ctz for 8-
and 16-bit cases where they were incorrect previously.
cfallin requested abrown, bnjbvr and julian-seward1 for a review on PR #2539.
cfallin requested abrown, bnjbvr and julian-seward1 for a review on PR #2539.
cfallin updated PR #2539 from x64-i128
to main
:
This PR is a followup to #2538 and is dependent on it.
x64 backend: implement 128-bit ops.
This implements all of the ops on I128 that are implemented by the
legacy x86 backend, and includes all that are required by at least one
major use-case (cg_clif rustc backend).The sequences are open-coded where necessary; for e.g. the bit
operations, this can be somewhat complex, but these sequences have been
tested carefully. This PR also includes a drive-by fix of clz/ctz for 8-
and 16-bit cases where they were incorrect previously.
cfallin edited PR #2539 from x64-i128
to main
:
This PR is a followup to #2538 and is dependent on it.
x64 backend: implement 128-bit ops and misc fixes.
This implements all of the ops on I128 that are implemented by the
legacy x86 backend, and includes all that are required by at least one
major use-case (cg_clif rustc backend).The sequences are open-coded where necessary; for e.g. the bit
operations, this can be somewhat complex, but these sequences have been
tested carefully. This PR also includes a drive-by fix of clz/ctz for 8-
and 16-bit cases where they were incorrect previously.Also includes ridealong fixes developed while bringing up cg_clif
support, because they are difficult to completely separate due to
other refactors that occurred in this PR:
fix REX prefix logic for some 8-bit instructions.
When using an 8-bit register in 64-bit mode on x86-64, the REX prefix
semantics are somewhat subtle: without the REX prefix, register numbers
4--7 correspond to the second-to-lowest byte of the first four registers
(AH, CH, BH, DH), whereas with the REX prefix, these register numbers
correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always
emit a REX byte for instructions with 8-bit cases (this is harmless even
if unneeded), but this would unnecessarily inflate code size; instead,
the usual approach is to emit it only for these registers.This logic was present in some cases but missing for some other
instructions: divide, not, negate, shifts.Fixes #2508.
avoid unaligned SSE loads on some f64 ops.
The implementations of several FP ops, such as fabs/fneg, used SSE
instructions. This is not a problem per-se, except that load-op merging
did not take alignment into account. Specifically, if an op on an f64
loaded from memory happened to merge that load, and the instruction into
which it was merged was an SSE instruction, then the SSE instruction
imposes stricter (128-bit) alignment requirements than the load.f64 did.This PR simply forces any instruction lowerings that could use SSE
instructions to implement non-SIMD operations to take inputs in
registers only, and avoid load-op merging.Fixes #2507.
two bugfixes exposed by cg_clif: urem/srem.i8, select.b1.
urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places
the remainder in AH, not RDX, different from all the other width-forms
of this instruction.select.b1: we were not recognizing selects of boolean values as
integer-typed operations, so we were generating XMM moves instead (!).
cfallin updated PR #2539 from x64-i128
to main
:
This PR is a followup to #2538 and is dependent on it.
x64 backend: implement 128-bit ops and misc fixes.
This implements all of the ops on I128 that are implemented by the
legacy x86 backend, and includes all that are required by at least one
major use-case (cg_clif rustc backend).The sequences are open-coded where necessary; for e.g. the bit
operations, this can be somewhat complex, but these sequences have been
tested carefully. This PR also includes a drive-by fix of clz/ctz for 8-
and 16-bit cases where they were incorrect previously.Also includes ridealong fixes developed while bringing up cg_clif
support, because they are difficult to completely separate due to
other refactors that occurred in this PR:
fix REX prefix logic for some 8-bit instructions.
When using an 8-bit register in 64-bit mode on x86-64, the REX prefix
semantics are somewhat subtle: without the REX prefix, register numbers
4--7 correspond to the second-to-lowest byte of the first four registers
(AH, CH, BH, DH), whereas with the REX prefix, these register numbers
correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always
emit a REX byte for instructions with 8-bit cases (this is harmless even
if unneeded), but this would unnecessarily inflate code size; instead,
the usual approach is to emit it only for these registers.This logic was present in some cases but missing for some other
instructions: divide, not, negate, shifts.Fixes #2508.
avoid unaligned SSE loads on some f64 ops.
The implementations of several FP ops, such as fabs/fneg, used SSE
instructions. This is not a problem per-se, except that load-op merging
did not take alignment into account. Specifically, if an op on an f64
loaded from memory happened to merge that load, and the instruction into
which it was merged was an SSE instruction, then the SSE instruction
imposes stricter (128-bit) alignment requirements than the load.f64 did.This PR simply forces any instruction lowerings that could use SSE
instructions to implement non-SIMD operations to take inputs in
registers only, and avoid load-op merging.Fixes #2507.
two bugfixes exposed by cg_clif: urem/srem.i8, select.b1.
urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places
the remainder in AH, not RDX, different from all the other width-forms
of this instruction.select.b1: we were not recognizing selects of boolean values as
integer-typed operations, so we were generating XMM moves instead (!).
cfallin updated PR #2539 from x64-i128
to main
:
This PR is a followup to #2538 and is dependent on it.
x64 backend: implement 128-bit ops and misc fixes.
This implements all of the ops on I128 that are implemented by the
legacy x86 backend, and includes all that are required by at least one
major use-case (cg_clif rustc backend).The sequences are open-coded where necessary; for e.g. the bit
operations, this can be somewhat complex, but these sequences have been
tested carefully. This PR also includes a drive-by fix of clz/ctz for 8-
and 16-bit cases where they were incorrect previously.Also includes ridealong fixes developed while bringing up cg_clif
support, because they are difficult to completely separate due to
other refactors that occurred in this PR:
fix REX prefix logic for some 8-bit instructions.
When using an 8-bit register in 64-bit mode on x86-64, the REX prefix
semantics are somewhat subtle: without the REX prefix, register numbers
4--7 correspond to the second-to-lowest byte of the first four registers
(AH, CH, BH, DH), whereas with the REX prefix, these register numbers
correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always
emit a REX byte for instructions with 8-bit cases (this is harmless even
if unneeded), but this would unnecessarily inflate code size; instead,
the usual approach is to emit it only for these registers.This logic was present in some cases but missing for some other
instructions: divide, not, negate, shifts.Fixes #2508.
avoid unaligned SSE loads on some f64 ops.
The implementations of several FP ops, such as fabs/fneg, used SSE
instructions. This is not a problem per-se, except that load-op merging
did not take alignment into account. Specifically, if an op on an f64
loaded from memory happened to merge that load, and the instruction into
which it was merged was an SSE instruction, then the SSE instruction
imposes stricter (128-bit) alignment requirements than the load.f64 did.This PR simply forces any instruction lowerings that could use SSE
instructions to implement non-SIMD operations to take inputs in
registers only, and avoid load-op merging.Fixes #2507.
two bugfixes exposed by cg_clif: urem/srem.i8, select.b1.
urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places
the remainder in AH, not RDX, different from all the other width-forms
of this instruction.select.b1: we were not recognizing selects of boolean values as
integer-typed operations, so we were generating XMM moves instead (!).
cfallin updated PR #2539 from x64-i128
to main
:
This PR is a followup to #2538 and is dependent on it.
x64 backend: implement 128-bit ops and misc fixes.
This implements all of the ops on I128 that are implemented by the
legacy x86 backend, and includes all that are required by at least one
major use-case (cg_clif rustc backend).The sequences are open-coded where necessary; for e.g. the bit
operations, this can be somewhat complex, but these sequences have been
tested carefully. This PR also includes a drive-by fix of clz/ctz for 8-
and 16-bit cases where they were incorrect previously.Also includes ridealong fixes developed while bringing up cg_clif
support, because they are difficult to completely separate due to
other refactors that occurred in this PR:
fix REX prefix logic for some 8-bit instructions.
When using an 8-bit register in 64-bit mode on x86-64, the REX prefix
semantics are somewhat subtle: without the REX prefix, register numbers
4--7 correspond to the second-to-lowest byte of the first four registers
(AH, CH, BH, DH), whereas with the REX prefix, these register numbers
correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always
emit a REX byte for instructions with 8-bit cases (this is harmless even
if unneeded), but this would unnecessarily inflate code size; instead,
the usual approach is to emit it only for these registers.This logic was present in some cases but missing for some other
instructions: divide, not, negate, shifts.Fixes #2508.
avoid unaligned SSE loads on some f64 ops.
The implementations of several FP ops, such as fabs/fneg, used SSE
instructions. This is not a problem per-se, except that load-op merging
did not take alignment into account. Specifically, if an op on an f64
loaded from memory happened to merge that load, and the instruction into
which it was merged was an SSE instruction, then the SSE instruction
imposes stricter (128-bit) alignment requirements than the load.f64 did.This PR simply forces any instruction lowerings that could use SSE
instructions to implement non-SIMD operations to take inputs in
registers only, and avoid load-op merging.Fixes #2507.
two bugfixes exposed by cg_clif: urem/srem.i8, select.b1.
urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places
the remainder in AH, not RDX, different from all the other width-forms
of this instruction.select.b1: we were not recognizing selects of boolean values as
integer-typed operations, so we were generating XMM moves instead (!).
cfallin updated PR #2539 from x64-i128
to main
:
This PR is a followup to #2538 and is dependent on it.
x64 backend: implement 128-bit ops and misc fixes.
This implements all of the ops on I128 that are implemented by the
legacy x86 backend, and includes all that are required by at least one
major use-case (cg_clif rustc backend).The sequences are open-coded where necessary; for e.g. the bit
operations, this can be somewhat complex, but these sequences have been
tested carefully. This PR also includes a drive-by fix of clz/ctz for 8-
and 16-bit cases where they were incorrect previously.Also includes ridealong fixes developed while bringing up cg_clif
support, because they are difficult to completely separate due to
other refactors that occurred in this PR:
fix REX prefix logic for some 8-bit instructions.
When using an 8-bit register in 64-bit mode on x86-64, the REX prefix
semantics are somewhat subtle: without the REX prefix, register numbers
4--7 correspond to the second-to-lowest byte of the first four registers
(AH, CH, BH, DH), whereas with the REX prefix, these register numbers
correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always
emit a REX byte for instructions with 8-bit cases (this is harmless even
if unneeded), but this would unnecessarily inflate code size; instead,
the usual approach is to emit it only for these registers.This logic was present in some cases but missing for some other
instructions: divide, not, negate, shifts.Fixes #2508.
avoid unaligned SSE loads on some f64 ops.
The implementations of several FP ops, such as fabs/fneg, used SSE
instructions. This is not a problem per-se, except that load-op merging
did not take alignment into account. Specifically, if an op on an f64
loaded from memory happened to merge that load, and the instruction into
which it was merged was an SSE instruction, then the SSE instruction
imposes stricter (128-bit) alignment requirements than the load.f64 did.This PR simply forces any instruction lowerings that could use SSE
instructions to implement non-SIMD operations to take inputs in
registers only, and avoid load-op merging.Fixes #2507.
two bugfixes exposed by cg_clif: urem/srem.i8, select.b1.
urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places
the remainder in AH, not RDX, different from all the other width-forms
of this instruction.select.b1: we were not recognizing selects of boolean values as
integer-typed operations, so we were generating XMM moves instead (!).
abrown submitted PR Review.
abrown submitted PR Review.
abrown created PR Review Comment:
I don't think I picked up why these need to be
Copy
now?
abrown created PR Review Comment:
You could save yourself some code with:
; run: %ctz(0x00000000_00000000, 0x00000000_00000000) == 0x96 ; run: %ctz(0x00000000_00010000, 0x00000001_00000000) == 0x16 ...
abrown created PR Review Comment:
It seems preferable to me to just have
AluRmiROpcode::And
and avoid this special-casing if possible. Do you agree? If yes, I wonder if there is a way to standardize register sizes throughoutlower.rs
usingOperandSize
and trying to clarify/simplify all of the special logic related to variables likeis64
,is_64
,is_8bit
, etc.
abrown created PR Review Comment:
I think this is the type of logic that is tricky to understand later: where is
is_8bit
coming from? It's sort of hidden that the registers used, if encoded in the range 4..7, should always emit a REX prefix. I'm not suggesting a great alternative here (sorry!), just noting that this is the type of thing that will make the lowering code tricky to troubleshoot and extend.
abrown submitted PR Review.
cfallin updated PR #2539 from x64-i128
to main
:
This PR is a followup to #2538 and is dependent on it.
x64 backend: implement 128-bit ops and misc fixes.
This implements all of the ops on I128 that are implemented by the
legacy x86 backend, and includes all that are required by at least one
major use-case (cg_clif rustc backend).The sequences are open-coded where necessary; for e.g. the bit
operations, this can be somewhat complex, but these sequences have been
tested carefully. This PR also includes a drive-by fix of clz/ctz for 8-
and 16-bit cases where they were incorrect previously.Also includes ridealong fixes developed while bringing up cg_clif
support, because they are difficult to completely separate due to
other refactors that occurred in this PR:
fix REX prefix logic for some 8-bit instructions.
When using an 8-bit register in 64-bit mode on x86-64, the REX prefix
semantics are somewhat subtle: without the REX prefix, register numbers
4--7 correspond to the second-to-lowest byte of the first four registers
(AH, CH, BH, DH), whereas with the REX prefix, these register numbers
correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always
emit a REX byte for instructions with 8-bit cases (this is harmless even
if unneeded), but this would unnecessarily inflate code size; instead,
the usual approach is to emit it only for these registers.This logic was present in some cases but missing for some other
instructions: divide, not, negate, shifts.Fixes #2508.
avoid unaligned SSE loads on some f64 ops.
The implementations of several FP ops, such as fabs/fneg, used SSE
instructions. This is not a problem per-se, except that load-op merging
did not take alignment into account. Specifically, if an op on an f64
loaded from memory happened to merge that load, and the instruction into
which it was merged was an SSE instruction, then the SSE instruction
imposes stricter (128-bit) alignment requirements than the load.f64 did.This PR simply forces any instruction lowerings that could use SSE
instructions to implement non-SIMD operations to take inputs in
registers only, and avoid load-op merging.Fixes #2507.
two bugfixes exposed by cg_clif: urem/srem.i8, select.b1.
urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places
the remainder in AH, not RDX, different from all the other width-forms
of this instruction.select.b1: we were not recognizing selects of boolean values as
integer-typed operations, so we were generating XMM moves instead (!).
cfallin submitted PR Review.
cfallin created PR Review Comment:
Much better, thanks!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Ah, I had added that for convenience in several spots but the
clone()
is no big deal; reverted.
cfallin submitted PR Review.
cfallin created PR Review Comment:
In general I would agree, except that we don't use 8-bit ops elsewhere for CLIF-level 8-bit instructions so I went for the limited-in-scope approach here. I'm totally open to a future refactor that centralizes
OperandSize
for all instructions, though -- that would remove a lot of the ad-hoc conditionals, as you note!
cfallin submitted PR Review.
cfallin created PR Review Comment:
Perhaps the
RexFlags
could take anOperandSize
with its constructor, if we centralize use ofOperandSize
; that would at least enforce the need to think about it. OK if we defer to a later refactor?
cfallin merged PR #2539.
Last updated: Jan 24 2025 at 00:11 UTC