afonso360 edited issue #6890:
:wave: Hey,
.clif
Test Casetest interpret test run target x86_64 has_sse41 function %bitselect_vconst_f64x2(f64x2, f64x2) -> f64x2 { block0(v1: f64x2, v2: f64x2): v0 = vconst.f64x2 0xFF00000000000000FF00000000000000 v3 = bitselect v0, v1, v2 return v3 } ; run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000
Steps to Reproduce
clif-util test ./the-above.clif
Expected Results
The test to pass.
Actual Results
Running `/home/afonso/git/wasmtime/target/debug/clif-util test ./lmao2.clif` ERROR cranelift_filetests::concurrent > FAIL: run FAIL ./lmao2.clif: run Caused by: Failed test: run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000, actual: 0x11111111111111111111111111111111 1 tests Error: 1 failure
Versions and Environment
Cranelift version or commit: main
Operating system: Linux
Architecture: x86_64
Extra Info
This issue is caused by this optimization to
bitselect
. It checks if every byte in thevconst
is0xFF
or0x00
, which it is in this case, but then emits ablend
instruction of whatever type the original bitselect was issued.This is correct for
i8x16
, but not for any type with a larger lane size.This does not affect wasmtime since wasmtime always bitcasts the inputs to bitselect into a
i8x16
before the operation which is the only type for which this works.We also currently don't remove bitcasts in the midend, so this won't get accidentally converted into a
bitselect.f64x2
.
afonso360 commented on issue #6890:
I think bitcast isn't the cause of this problem, right? Should the issue title say bitselect.f64x2 instead?
Oops, yes, bitselect, not bitcast!
jameysharp commented on issue #6890:
I'm fine with converting
bitselect
with a constant mask intoshuffle
if you think that's best, @alexcrichton.But if we don't do that then I'd like to think through the rest of what you said more carefully. I don't see why
vconst_all_ones_or_all_zeros
needs to do anything but a byte-oriented pattern-match, so long as the result is always a byte-oriented blend. As long as both use the same definition of what a "lane" is, then we only fire this rule when the MSB of a lane is the same as all the other bits in the lane, so the intended type doesn't matter. Choosing byte-sized lanes means the rule can match more cases while still giving correct results.
alexcrichton commented on issue #6890:
Good point! Also sorry I should have read your comment more closely and more carefully. I believe you're correct and always using
x64_pblendvb
here is sufficient, and my point about turning things intoshuffle
is orthogonal.(sorry I should have learned by now to read things completely)
abrown closed issue #6890:
:wave: Hey,
.clif
Test Casetest interpret test run target x86_64 has_sse41 function %bitselect_vconst_f64x2(f64x2, f64x2) -> f64x2 { block0(v1: f64x2, v2: f64x2): v0 = vconst.f64x2 0xFF00000000000000FF00000000000000 v3 = bitselect v0, v1, v2 return v3 } ; run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000
Steps to Reproduce
clif-util test ./the-above.clif
Expected Results
The test to pass.
Actual Results
Running `/home/afonso/git/wasmtime/target/debug/clif-util test ./lmao2.clif` ERROR cranelift_filetests::concurrent > FAIL: run FAIL ./lmao2.clif: run Caused by: Failed test: run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000, actual: 0x11111111111111111111111111111111 1 tests Error: 1 failure
Versions and Environment
Cranelift version or commit: main
Operating system: Linux
Architecture: x86_64
Extra Info
This issue is caused by this optimization to
bitselect
. It checks if every byte in thevconst
is0xFF
or0x00
, which it is in this case, but then emits ablend
instruction of whatever type the original bitselect was issued.This is correct for
i8x16
, but not for any type with a larger lane size.This does not affect wasmtime since wasmtime always bitcasts the inputs to bitselect into a
i8x16
before the operation which is the only type for which this works.We also currently don't remove bitcasts in the midend, so this won't get accidentally converted into a
bitselect.f64x2
.
Last updated: Dec 23 2024 at 12:05 UTC