afonso360 opened issue #6890:
:wave: Hey,
.clif
Test Casetest interpret test run target x86_64 has_sse41 function %bitselect_vconst_f64x2(f64x2, f64x2) -> f64x2 { block0(v1: f64x2, v2: f64x2): v0 = vconst.f64x2 0xFF00000000000000FF00000000000000 v3 = bitselect v0, v1, v2 return v3 } ; run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000
Steps to Reproduce
clif-util test ./the-above.clif
Expected Results
The test to pass.
Actual Results
Running `/home/afonso/git/wasmtime/target/debug/clif-util test ./lmao2.clif` ERROR cranelift_filetests::concurrent > FAIL: run FAIL ./lmao2.clif: run Caused by: Failed test: run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000, actual: 0x11111111111111111111111111111111 1 tests Error: 1 failure
Versions and Environment
Cranelift version or commit: main
Operating system: Linux
Architecture: x86_64
Extra Info
This issue is caused by this optimization to
bitselect
. It checks if every byte in thevconst
is0xFF
or0x00
, which it is in this case, but then emits ablend
instruction of whatever type the original bitselect was issued.This is correct for
i8x16
, but not for any type with a larger lane size.This does not affect wasmtime since wasmtime always bitcasts the inputs to bitselect into a
i8x16
before the operation which is the only type for which this works.We also currently don't remove bitcasts in the midend, so this won't get accidentally converted into a
bitselect.f64x2
.
afonso360 added the bug label to Issue #6890.
afonso360 added the cranelift label to Issue #6890.
afonso360 added the cranelift:area:x64 label to Issue #6890.
jameysharp commented on issue #6890:
I think
bitcast
isn't the cause of this problem, right? Should the issue title saybitselect.f64x2
instead?If I understand this correctly, we can fix this by using
x64_pblendvb
instead ofx64_blend
in this optimized lowering, right?
https://github.com/bytecodealliance/wasmtime/blob/14b39bc234b1c8f9d5212e481a93a91e72b22807/cranelift/codegen/src/isa/x64/lower.isle#L1372-L1376The
x64_blend
term is a helper that tries to use a type-appropriate variant of blend, and is only used from the above rule. Similarlyvconst_all_ones_or_all_zeros
is only used from this rule, and checks each byte of the constant. So I think it's appropriate to use a byte-oriented version of blend whenever that pattern matches.
https://github.com/bytecodealliance/wasmtime/blob/14b39bc234b1c8f9d5212e481a93a91e72b22807/cranelift/codegen/src/isa/x64/inst.isle#L3330-L3334
alexcrichton commented on issue #6890:
Good find @afonso360, thanks! Thinking about what @jameysharp said though plus a little more, I think that this optimization on x64 is even buggier (but still not reachable from wasm)
Then
blend
-style instructions use lane selection so are only valid if the entire lane has the same bit pattern (as a substitute forbitselect
), but the type of the vector and comparison isn't being factored in here. This means that a f64x2 bitselect based on an i8x16 {i,f}cmp is not valid, in addition to the issues with constants that identified.I think essentially
all_ones_or_all_zeros
needs to take the type of thebitselect
as input and test whether the comparison has the same type lane-width-wise and additionally thevconst_all_ones_or_all_zeros
helper needs to take the type as input to test the full lane width is all ones or all zeros, not just each byte is all ones or all zeros.Also, for constants, I think the entire rule should be removed. Instead I think it would be better to transform a
bitselect
-with-constant-mask into ashuffle
which then enables a whole different set of pattern-matching which is much more optimal than loading a constant into a register and performing a blend.
Last updated: Nov 22 2024 at 17:03 UTC