abrown requested bnjbvr for a review on PR #2060.
abrown opened PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
This is because the
dst
operand inxmm_rm_r
is the read-write operand, so we need to generate a move here, at the virtual register level. It doesn't necessarily materialize as a move instruction, because regalloc will try hard to coalesce the two virtual registers (and assign them the same physical register).
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Yeah, as Chris noted, we need a way to "connect" the input and output registers, so we need to generate a move instruction here; again, it won't necessarily materialize as a move instruction.
bnjbvr created PR Review Comment:
pre-existing: can you use
Inst::gen_move
here, instead? (and fill in the right bits ingen_move
to support vector instructions)?
bnjbvr created PR Review Comment:
I expect this to be incomplete, or a typo :-)
bnjbvr created PR Review Comment:
Can we tweak this to add a check that the vector type is 128-bits wide?
abrown submitted PR Review.
abrown created PR Review Comment:
Yup, good catch :big_smile:.
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown submitted PR Review.
abrown created PR Review Comment:
@cfallin I had to switch over to this version of
fneg
due to negating signed 0s (previously I had the "subtract the value from 0.0 version") but I believe that using this temporary register is causing this to fail again in the register allocator?
thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown submitted PR Review.
abrown created PR Review Comment:
I should have different paths for scalar and vector values here at some point but currently I am only running
simd_f32x4_arith.wast
.
abrown submitted PR Review.
abrown created PR Review Comment:
Should there be a
gen_move
here as well (forOpcode::Sqrt
)?
abrown submitted PR Review.
abrown created PR Review Comment:
We don't necessarily have to things this way--an alternate approach would be to have all of the different opcodes for packed shifts live separately under
SseOpcode
. I only did this to match what I saw for scalar shifts.
abrown edited PR Review Comment.
abrown submitted PR Review.
abrown created PR Review Comment:
@cfallin proposed to add a
Inst::InputInvariant
(or perhapsInput::InputClobber
) that would allow us to usePXOR
,CMPEQPS
, etc., to generate certain constants. That newInst
would have a singleWritable<Reg>
and it'sget_reg
would contain one def and no uses.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
@bnjbvr and I discussed this (the use of
PXOR r,r
,CMPEQPS r,r
etc) to generate constants in registers, and the effect on the regalloc. All that is needed to keep regalloc happy is forget_regs
to spot such cases (where the two registers are the same) and report a singledef
for the register, instead of the usualuse+mod
pair. No need for anyInst::InputInvariant
orInput::InputClobber
or any other form of hinting.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I agree with that; shifts are not magically different here, so they should just be part of
SseOpcode
.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Ah, ignore my previous comment. In fact vector shifts by immediates do need a separate
Inst
case because they have a different "shape" from normal XMM insns -- the read-only (src
) operand is an immediate, not a RM. Note thatRegMemImm
has no place in XMM-world as far as I know. Hence I suggest the following:/// XMM shifts by immediates. XmmShiftByImm { opcode: SseOpcode, amount: u8, dst: Writable<Reg> }
Then add the necessary extra entries to
SseOpcode
, and have the emit function panic forInst::XmmShiftByImm
ifopcode
is not one of the few acceptable values.Maybe this should even be called
XMM_I_R
(so as to be consistent with namingXMM_RM_R
). Then we could use it for any XMM insn which takes an immediate and a modifiable register. I can't think of any except the abovementioned shifts though.
julian-seward1 edited PR Review Comment.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
I don't think so, since sqrt doesn't have any read-write operand. Am i missing something?
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
+1 to @julian-seward1's suggestion to reuse the existing instructions for this purpose.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Agreed here; the "shape" is equivalent to the
InstructionFormat
here, so it makes sense to add a new vcode inst.tiny nit re: naming, since we're moving to some more Rusty-like naming, I'd rather go for properly camelCased
XmmImmReg
here (there's going to be a PR to make the names more consistent soonish).
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I agree with @bnjbvr; the insn reads its input register and writes a different output register, but there's no
mod
-ify style behaviour. Note thatget_regs
forXMM_RegMem_Reg
will have to take this into account; that it, it will have to special-caseSseOpcode::Sqrtss
et al accordingly.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
@julian-seward1 Sqrtss uses a different Vcode inst because of this, since it's rather an "unary" opcode (see also
XmmUnaryRmR
).
abrown submitted PR Review.
abrown created PR Review Comment:
Ok, makes sense.
abrown submitted PR Review.
abrown created PR Review Comment:
Note that RegMemImm has no place in XMM-world as far as I know
I think in this case it does. Packed shifts accept either an
imm8
or axmm/m128
as operands. So there should be a newInst
like:... { opcode: SseOpcode, src: RegMemImm, dst: Writable<Reg> }
Right? But what do I call it?
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Ah, ok. So the src operand does have to be
RegMemImm
.As for a name, what it really wants to have is
Xmm_RegMemImm_Reg
. But the Rust naming rules would require jamming all that together,XmmRegMemImmReg
, which is ambiguous: the underscores give meaning. @bnjbvr do you think this is one place where we might apply for an exemption to the normal rules?
abrown submitted PR Review.
abrown created PR Review Comment:
In Zulip @bnjbvr suggested
XmmRmiReg
which is shorter and avoids the ambiguity.
cfallin submitted PR Review.
cfallin created PR Review Comment:
One concern I'd have in using the same instruction format for both e.g.
PXOR
andPAND
(if that's a thing? let's say it exists as a straw-instruction if not), with theget_regs
logic to special-case the same-register case, is that the input-invariant behavior really does depend on the opcode. SoPXOR r1, r1
zeroesr1
, and doesn't use its initial value; butPAND r1, r1
leavesr1
's final value equal to its initial value (so usesr1
).Using a
mod
for the first reg would solve this issue (PAND
would still be correct), but then doesn't give us the behavior we want forPXOR
, becauser1
is still (incorrectly, or rather imprecisely) seen as an input, even though the instruction semantically doesr1 := 0
no matter what.So I think we either need variants of
Inst
, or else we need a flag inside of this variant, or else we need logic inget_regs
to look at the opcode, to know the difference between ops wherea OP b = constant
anda OP b = f(a, b)
. I'd prefer the first (it feels cleanest), but any is fine. Thoughts?
cfallin submitted PR Review.
cfallin created PR Review Comment:
(Actually, I'm warming up to the third option now; basically we want a helper
op_on_self_yields_constant
or somesuch, and call that inget_regs
, which is just a match yielding a bool, which is simple enough.)
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
For me, the simplest and most obvious solution is also the third. In short we want
get_regs
to always tell us the exact truth, and if it requires looking at the opcode and/or operands, so be it. To allow a situation where it doesn't tell us the exact truth strikes me as setting a dangerous precedent.I wouldn't even bother with a helper -- these cases are sufficiently rare to make that unnecessary. I'd just write something like
if opcode == XOR && src.isReg() && src.Reg() == dst { // special case }
It's one extra line (more or less).
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown has marked PR #2060 as ready for review.
abrown requested cfallin for a review on PR #2060.
abrown requested cfallin and bnjbvr for a review on PR #2060.
abrown requested cfallin, bnjbvr and julian-seward1 for a review on PR #2060.
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
cfallin submitted PR Review.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Given that this logic is duplicated, let's pull it out into a helper to ensure the two locations stay in sync.
cfallin created PR Review Comment:
I'd prefer to add a bit more detail here -- something like:
"In certain cases, instructions of this format can act as a definition of an XMM register, producing a value that is independent of its initial value. For example, a vector equality comparison (
cmppd
orcmpps
) that compares a register to itself will generate all ones as a result, regardless of its value. From the register allocator's point of view, we should (i) record the first register, which is normally a mod, as a def instread; and (ii) not record the second register as a use, because it is the same as the first register (already handed)."Maybe also "Any opcode for which
a OP b = constant
is true can be included here; we may add more in the future, such asXOR
."?
cfallin created PR Review Comment:
We should be able to do something like:
match (a_op_b_is_constant(op), src) { (true, RegMem::Reg { reg }) if *reg == dst.to_reg() => { ... } _ => { ... }
to combine the
if
s. (And probably ignore my half-baked name for the helper...)
abrown submitted PR Review.
abrown created PR Review Comment:
Ah, nice, let me try that...
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Can you please add a TODO comment about the fact that it might incur a cross-domain penalty for packed ints, and that we might want to specialize the move later?
bnjbvr created PR Review Comment:
Yeah, it could be at an arbitrary offset. Do you mind implementing it in this PR, please?
bnjbvr created PR Review Comment:
preexisting, but maybe we could remove
I128
andB128
from this list, and have another match arm below with only the v128 test?
bnjbvr created PR Review Comment:
ditto here
bnjbvr created PR Review Comment:
nit: if no other instruction you've implemented in this PR uses this Vcode inst, how about inlining the value of
0x0fc2
where it's used, instead?
bnjbvr created PR Review Comment:
nit: can you add in comments an example of this instruction?
bnjbvr created PR Review Comment:
It's weird to have a commit that introduces this but don't use it directly; no need to worry, we'll just squash before merging.
bnjbvr created PR Review Comment:
Since we also use ucomiss/ucomisd for float comparisons, and those don't have the same set of encodable conditions, could you rename this? Good names are hard to distinguish between other float CCs and these. The main difference is that the cmpps family returns a boolean mask in the dst operand, so how about
CmpMaskPredicate
? (as opposed to the valid set of predicates that can be used in ucomiss asCmpFlagsPredicate
, if there was the need for such an enum)
bnjbvr created PR Review Comment:
Can you add a call to
src.assert_reg_type_is
too, please? Can you also add asserts for valid immediates?
bnjbvr created PR Review Comment:
The way to go here would be to use the
OperandSize
(maybe add variants for lower sizes), and have it in the instruction, instead of having one SSE opcode per size. Since you need a new Inst anyways, how about doing this here?
bnjbvr created PR Review Comment:
Shouldn't this be cmpps or cmppd according to the input type? If not, it might be worth commenting why it's ok to do this here.
bnjbvr created PR Review Comment:
The shift amount should be 1 for andps and andpd, for this to work correctly.
bnjbvr created PR Review Comment:
nit: Can you use
FloatComparisonPredicate::from(FloatCC::Equal)
instead? it doesn't require the explicit type then, which is nice.
bnjbvr created PR Review Comment:
Can you add an assert on the src reg's type here too, please?
bnjbvr created PR Review Comment:
Maybe put a panic with a message explaining which combination has been seen?
bnjbvr created PR Review Comment:
light suggestion: we use
def
fordefinition
/define
in misc places in the code base, so for consistency, it'd be nice to use it here too.
bnjbvr created PR Review Comment:
nit: the above comment and this variable name hide the fact that it can be used for AND'ing and taking the absolute value too.
bnjbvr created PR Review Comment:
Is the lowering missing for these operations?
bnjbvr created PR Review Comment:
Hmm, we shouldn't probably duplicate this comment everywhere, since it's the regular way to do things in x86; we may add a file doc-comment at some point explaining this, maybe. It's fine to keep this comment until then.
abrown submitted PR Review.
abrown created PR Review Comment:
Hm,
match (..., src)
takes ownership and then I can't dosrc.map_uses
later on.
abrown submitted PR Review.
abrown created PR Review Comment:
I had something like the
OperandSize
approach but based on someone's comments, I refactored it to this.
abrown submitted PR Review.
abrown created PR Review Comment:
Nope, see shifts.
abrown submitted PR Review.
abrown created PR Review Comment:
I don't think it matters since the result is all 1s either way and we don't cross domains.
abrown submitted PR Review.
abrown created PR Review Comment:
They are used directly. I think what was happening is that some
*
use above was allowing things to work... which is more weird.
abrown submitted PR Review.
abrown created PR Review Comment:
BTW, I purposely split these commits all up to atomic changes so we wouldn't have to squash.
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown submitted PR Review.
abrown created PR Review Comment:
This is being done in #2071; we need to figure out a merge ordering for these two PRs.
abrown submitted PR Review.
abrown created PR Review Comment:
I'm open to a different name but it should probably have the
Float
in the name somewhere and I'm not too sure aboutMask
--it relates to the instruction semantics but not directly to the predicate so that is a bit confusing.
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown requested bnjbvr and julian-seward1 for a review on PR #2060.
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Maybe you can use
let src = input_to_reg(ctx, inputs[0]);
here instead, to ensure the input is always emitted in a register? (and put the current src and dst definitions under the then branch above).
bnjbvr created PR Review Comment:
My convention has been to put
Seq
in the name for such synthetic instruction sequences, can you add the suffix too, please?
bnjbvr created PR Review Comment:
:tada:
abrown updated PR #2060 from simd-fp-arith
to main
:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast
:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown merged PR #2060.
Last updated: Dec 23 2024 at 12:05 UTC