afonso360 opened issue #6890:
:wave: Hey,
.clifTest Casetest interpret test run target x86_64 has_sse41 function %bitselect_vconst_f64x2(f64x2, f64x2) -> f64x2 { block0(v1: f64x2, v2: f64x2): v0 = vconst.f64x2 0xFF00000000000000FF00000000000000 v3 = bitselect v0, v1, v2 return v3 } ; run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000Steps to Reproduce
clif-util test ./the-above.clifExpected Results
The test to pass.
Actual Results
Running `/home/afonso/git/wasmtime/target/debug/clif-util test ./lmao2.clif` ERROR cranelift_filetests::concurrent > FAIL: run FAIL ./lmao2.clif: run Caused by: Failed test: run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000, actual: 0x11111111111111111111111111111111 1 tests Error: 1 failureVersions and Environment
Cranelift version or commit: main
Operating system: Linux
Architecture: x86_64
Extra Info
This issue is caused by this optimization to
bitselect. It checks if every byte in thevconstis0xFFor0x00, which it is in this case, but then emits ablendinstruction of whatever type the original bitselect was issued.This is correct for
i8x16, but not for any type with a larger lane size.This does not affect wasmtime since wasmtime always bitcasts the inputs to bitselect into a
i8x16before the operation which is the only type for which this works.We also currently don't remove bitcasts in the midend, so this won't get accidentally converted into a
bitselect.f64x2.
afonso360 added the bug label to Issue #6890.
afonso360 added the cranelift label to Issue #6890.
afonso360 added the cranelift:area:x64 label to Issue #6890.
jameysharp commented on issue #6890:
I think
bitcastisn't the cause of this problem, right? Should the issue title saybitselect.f64x2instead?If I understand this correctly, we can fix this by using
x64_pblendvbinstead ofx64_blendin this optimized lowering, right?
https://github.com/bytecodealliance/wasmtime/blob/14b39bc234b1c8f9d5212e481a93a91e72b22807/cranelift/codegen/src/isa/x64/lower.isle#L1372-L1376The
x64_blendterm is a helper that tries to use a type-appropriate variant of blend, and is only used from the above rule. Similarlyvconst_all_ones_or_all_zerosis only used from this rule, and checks each byte of the constant. So I think it's appropriate to use a byte-oriented version of blend whenever that pattern matches.
https://github.com/bytecodealliance/wasmtime/blob/14b39bc234b1c8f9d5212e481a93a91e72b22807/cranelift/codegen/src/isa/x64/inst.isle#L3330-L3334
alexcrichton commented on issue #6890:
Good find @afonso360, thanks! Thinking about what @jameysharp said though plus a little more, I think that this optimization on x64 is even buggier (but still not reachable from wasm)
Then
blend-style instructions use lane selection so are only valid if the entire lane has the same bit pattern (as a substitute forbitselect), but the type of the vector and comparison isn't being factored in here. This means that a f64x2 bitselect based on an i8x16 {i,f}cmp is not valid, in addition to the issues with constants that identified.I think essentially
all_ones_or_all_zerosneeds to take the type of thebitselectas input and test whether the comparison has the same type lane-width-wise and additionally thevconst_all_ones_or_all_zeroshelper needs to take the type as input to test the full lane width is all ones or all zeros, not just each byte is all ones or all zeros.Also, for constants, I think the entire rule should be removed. Instead I think it would be better to transform a
bitselect-with-constant-mask into ashufflewhich then enables a whole different set of pattern-matching which is much more optimal than loading a constant into a register and performing a blend.
Last updated: Dec 06 2025 at 06:05 UTC