abrown requested bnjbvr for a review on PR #2060.
abrown opened PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
This is because the
dstoperand inxmm_rm_ris the read-write operand, so we need to generate a move here, at the virtual register level. It doesn't necessarily materialize as a move instruction, because regalloc will try hard to coalesce the two virtual registers (and assign them the same physical register).
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Yeah, as Chris noted, we need a way to "connect" the input and output registers, so we need to generate a move instruction here; again, it won't necessarily materialize as a move instruction.
bnjbvr created PR Review Comment:
pre-existing: can you use
Inst::gen_movehere, instead? (and fill in the right bits ingen_moveto support vector instructions)?
bnjbvr created PR Review Comment:
I expect this to be incomplete, or a typo :-)
bnjbvr created PR Review Comment:
Can we tweak this to add a check that the vector type is 128-bits wide?
abrown submitted PR Review.
abrown created PR Review Comment:
Yup, good catch :big_smile:.
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown submitted PR Review.
abrown created PR Review Comment:
@cfallin I had to switch over to this version of
fnegdue to negating signed 0s (previously I had the "subtract the value from 0.0 version") but I believe that using this temporary register is causing this to fail again in the register allocator?
thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown submitted PR Review.
abrown created PR Review Comment:
I should have different paths for scalar and vector values here at some point but currently I am only running
simd_f32x4_arith.wast.
abrown submitted PR Review.
abrown created PR Review Comment:
Should there be a
gen_movehere as well (forOpcode::Sqrt)?
abrown submitted PR Review.
abrown created PR Review Comment:
We don't necessarily have to things this way--an alternate approach would be to have all of the different opcodes for packed shifts live separately under
SseOpcode. I only did this to match what I saw for scalar shifts.
abrown edited PR Review Comment.
abrown submitted PR Review.
abrown created PR Review Comment:
@cfallin proposed to add a
Inst::InputInvariant(or perhapsInput::InputClobber) that would allow us to usePXOR,CMPEQPS, etc., to generate certain constants. That newInstwould have a singleWritable<Reg>and it'sget_regwould contain one def and no uses.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
@bnjbvr and I discussed this (the use of
PXOR r,r,CMPEQPS r,retc) to generate constants in registers, and the effect on the regalloc. All that is needed to keep regalloc happy is forget_regsto spot such cases (where the two registers are the same) and report a singledeffor the register, instead of the usualuse+modpair. No need for anyInst::InputInvariantorInput::InputClobberor any other form of hinting.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I agree with that; shifts are not magically different here, so they should just be part of
SseOpcode.
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Ah, ignore my previous comment. In fact vector shifts by immediates do need a separate
Instcase because they have a different "shape" from normal XMM insns -- the read-only (src) operand is an immediate, not a RM. Note thatRegMemImmhas no place in XMM-world as far as I know. Hence I suggest the following:/// XMM shifts by immediates. XmmShiftByImm { opcode: SseOpcode, amount: u8, dst: Writable<Reg> }Then add the necessary extra entries to
SseOpcode, and have the emit function panic forInst::XmmShiftByImmifopcodeis not one of the few acceptable values.Maybe this should even be called
XMM_I_R(so as to be consistent with namingXMM_RM_R). Then we could use it for any XMM insn which takes an immediate and a modifiable register. I can't think of any except the abovementioned shifts though.
julian-seward1 edited PR Review Comment.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
I don't think so, since sqrt doesn't have any read-write operand. Am i missing something?
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
+1 to @julian-seward1's suggestion to reuse the existing instructions for this purpose.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Agreed here; the "shape" is equivalent to the
InstructionFormathere, so it makes sense to add a new vcode inst.tiny nit re: naming, since we're moving to some more Rusty-like naming, I'd rather go for properly camelCased
XmmImmReghere (there's going to be a PR to make the names more consistent soonish).
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
I agree with @bnjbvr; the insn reads its input register and writes a different output register, but there's no
mod-ify style behaviour. Note thatget_regsforXMM_RegMem_Regwill have to take this into account; that it, it will have to special-caseSseOpcode::Sqrtsset al accordingly.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
@julian-seward1 Sqrtss uses a different Vcode inst because of this, since it's rather an "unary" opcode (see also
XmmUnaryRmR).
abrown submitted PR Review.
abrown created PR Review Comment:
Ok, makes sense.
abrown submitted PR Review.
abrown created PR Review Comment:
Note that RegMemImm has no place in XMM-world as far as I know
I think in this case it does. Packed shifts accept either an
imm8or axmm/m128as operands. So there should be a newInstlike:... { opcode: SseOpcode, src: RegMemImm, dst: Writable<Reg> }Right? But what do I call it?
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
Ah, ok. So the src operand does have to be
RegMemImm.As for a name, what it really wants to have is
Xmm_RegMemImm_Reg. But the Rust naming rules would require jamming all that together,XmmRegMemImmReg, which is ambiguous: the underscores give meaning. @bnjbvr do you think this is one place where we might apply for an exemption to the normal rules?
abrown submitted PR Review.
abrown created PR Review Comment:
In Zulip @bnjbvr suggested
XmmRmiRegwhich is shorter and avoids the ambiguity.
cfallin submitted PR Review.
cfallin created PR Review Comment:
One concern I'd have in using the same instruction format for both e.g.
PXORandPAND(if that's a thing? let's say it exists as a straw-instruction if not), with theget_regslogic to special-case the same-register case, is that the input-invariant behavior really does depend on the opcode. SoPXOR r1, r1zeroesr1, and doesn't use its initial value; butPAND r1, r1leavesr1's final value equal to its initial value (so usesr1).Using a
modfor the first reg would solve this issue (PANDwould still be correct), but then doesn't give us the behavior we want forPXOR, becauser1is still (incorrectly, or rather imprecisely) seen as an input, even though the instruction semantically doesr1 := 0no matter what.So I think we either need variants of
Inst, or else we need a flag inside of this variant, or else we need logic inget_regsto look at the opcode, to know the difference between ops wherea OP b = constantanda OP b = f(a, b). I'd prefer the first (it feels cleanest), but any is fine. Thoughts?
cfallin submitted PR Review.
cfallin created PR Review Comment:
(Actually, I'm warming up to the third option now; basically we want a helper
op_on_self_yields_constantor somesuch, and call that inget_regs, which is just a match yielding a bool, which is simple enough.)
julian-seward1 submitted PR Review.
julian-seward1 created PR Review Comment:
For me, the simplest and most obvious solution is also the third. In short we want
get_regsto always tell us the exact truth, and if it requires looking at the opcode and/or operands, so be it. To allow a situation where it doesn't tell us the exact truth strikes me as setting a dangerous precedent.I wouldn't even bother with a helper -- these cases are sufficiently rare to make that unnecessary. I'd just write something like
if opcode == XOR && src.isReg() && src.Reg() == dst { // special case }It's one extra line (more or less).
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown has marked PR #2060 as ready for review.
abrown requested cfallin for a review on PR #2060.
abrown requested cfallin and bnjbvr for a review on PR #2060.
abrown requested cfallin, bnjbvr and julian-seward1 for a review on PR #2060.
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
cfallin submitted PR Review.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Given that this logic is duplicated, let's pull it out into a helper to ensure the two locations stay in sync.
cfallin created PR Review Comment:
I'd prefer to add a bit more detail here -- something like:
"In certain cases, instructions of this format can act as a definition of an XMM register, producing a value that is independent of its initial value. For example, a vector equality comparison (
cmppdorcmpps) that compares a register to itself will generate all ones as a result, regardless of its value. From the register allocator's point of view, we should (i) record the first register, which is normally a mod, as a def instread; and (ii) not record the second register as a use, because it is the same as the first register (already handed)."Maybe also "Any opcode for which
a OP b = constantis true can be included here; we may add more in the future, such asXOR."?
cfallin created PR Review Comment:
We should be able to do something like:
match (a_op_b_is_constant(op), src) { (true, RegMem::Reg { reg }) if *reg == dst.to_reg() => { ... } _ => { ... }to combine the
ifs. (And probably ignore my half-baked name for the helper...)
abrown submitted PR Review.
abrown created PR Review Comment:
Ah, nice, let me try that...
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Can you please add a TODO comment about the fact that it might incur a cross-domain penalty for packed ints, and that we might want to specialize the move later?
bnjbvr created PR Review Comment:
Yeah, it could be at an arbitrary offset. Do you mind implementing it in this PR, please?
bnjbvr created PR Review Comment:
preexisting, but maybe we could remove
I128andB128from this list, and have another match arm below with only the v128 test?
bnjbvr created PR Review Comment:
ditto here
bnjbvr created PR Review Comment:
nit: if no other instruction you've implemented in this PR uses this Vcode inst, how about inlining the value of
0x0fc2where it's used, instead?
bnjbvr created PR Review Comment:
nit: can you add in comments an example of this instruction?
bnjbvr created PR Review Comment:
It's weird to have a commit that introduces this but don't use it directly; no need to worry, we'll just squash before merging.
bnjbvr created PR Review Comment:
Since we also use ucomiss/ucomisd for float comparisons, and those don't have the same set of encodable conditions, could you rename this? Good names are hard to distinguish between other float CCs and these. The main difference is that the cmpps family returns a boolean mask in the dst operand, so how about
CmpMaskPredicate? (as opposed to the valid set of predicates that can be used in ucomiss asCmpFlagsPredicate, if there was the need for such an enum)
bnjbvr created PR Review Comment:
Can you add a call to
src.assert_reg_type_istoo, please? Can you also add asserts for valid immediates?
bnjbvr created PR Review Comment:
The way to go here would be to use the
OperandSize(maybe add variants for lower sizes), and have it in the instruction, instead of having one SSE opcode per size. Since you need a new Inst anyways, how about doing this here?
bnjbvr created PR Review Comment:
Shouldn't this be cmpps or cmppd according to the input type? If not, it might be worth commenting why it's ok to do this here.
bnjbvr created PR Review Comment:
The shift amount should be 1 for andps and andpd, for this to work correctly.
bnjbvr created PR Review Comment:
nit: Can you use
FloatComparisonPredicate::from(FloatCC::Equal)instead? it doesn't require the explicit type then, which is nice.
bnjbvr created PR Review Comment:
Can you add an assert on the src reg's type here too, please?
bnjbvr created PR Review Comment:
Maybe put a panic with a message explaining which combination has been seen?
bnjbvr created PR Review Comment:
light suggestion: we use
deffordefinition/definein misc places in the code base, so for consistency, it'd be nice to use it here too.
bnjbvr created PR Review Comment:
nit: the above comment and this variable name hide the fact that it can be used for AND'ing and taking the absolute value too.
bnjbvr created PR Review Comment:
Is the lowering missing for these operations?
bnjbvr created PR Review Comment:
Hmm, we shouldn't probably duplicate this comment everywhere, since it's the regular way to do things in x86; we may add a file doc-comment at some point explaining this, maybe. It's fine to keep this comment until then.
abrown submitted PR Review.
abrown created PR Review Comment:
Hm,
match (..., src)takes ownership and then I can't dosrc.map_useslater on.
abrown submitted PR Review.
abrown created PR Review Comment:
I had something like the
OperandSizeapproach but based on someone's comments, I refactored it to this.
abrown submitted PR Review.
abrown created PR Review Comment:
Nope, see shifts.
abrown submitted PR Review.
abrown created PR Review Comment:
I don't think it matters since the result is all 1s either way and we don't cross domains.
abrown submitted PR Review.
abrown created PR Review Comment:
They are used directly. I think what was happening is that some
*use above was allowing things to work... which is more weird.
abrown submitted PR Review.
abrown created PR Review Comment:
BTW, I purposely split these commits all up to atomic changes so we wouldn't have to squash.
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown submitted PR Review.
abrown created PR Review Comment:
This is being done in #2071; we need to figure out a merge ordering for these two PRs.
abrown submitted PR Review.
abrown created PR Review Comment:
I'm open to a different name but it should probably have the
Floatin the name somewhere and I'm not too sure aboutMask--it relates to the instruction semantics but not directly to the predicate so that is a bit confusing.
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown requested bnjbvr and julian-seward1 for a review on PR #2060.
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Maybe you can use
let src = input_to_reg(ctx, inputs[0]);here instead, to ensure the input is always emitted in a register? (and put the current src and dst definitions under the then branch above).
bnjbvr created PR Review Comment:
My convention has been to put
Seqin the name for such synthetic instruction sequences, can you add the suffix too, please?
bnjbvr created PR Review Comment:
:tada:
abrown updated PR #2060 from simd-fp-arith to main:
This adds support for f32x4 and f64x2 arithmetic in the x64 backend.
Currently, I need some help troubleshooting the error returned by
cargo run --features experimental_x64 -- wast --enable-simd tests/spec_testsuite/proposals/simd/simd_f32x4_arith.wast:thread '<unnamed>' panicked at 'register allocation: Analysis(EntryLiveinValues)', cranelift/codegen/src/machinst/compile.rs:73:9
abrown merged PR #2060.
Last updated: Dec 06 2025 at 06:05 UTC