Stream: git-wasmtime

Topic: wasmtime / issue #6890 Cranelift: Wrong result for `bitca...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2023 at 12:53):

afonso360 opened issue #6890:

:wave: Hey,

.clif Test Case

test interpret
test run
target x86_64 has_sse41

function %bitselect_vconst_f64x2(f64x2, f64x2) -> f64x2 {
block0(v1: f64x2, v2: f64x2):
    v0 = vconst.f64x2 0xFF00000000000000FF00000000000000
    v3 = bitselect v0, v1, v2
    return v3
}
; run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000

Steps to Reproduce

Expected Results

The test to pass.

Actual Results

     Running `/home/afonso/git/wasmtime/target/debug/clif-util test ./lmao2.clif`
 ERROR cranelift_filetests::concurrent > FAIL: run
FAIL ./lmao2.clif: run

Caused by:
    Failed test: run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000, actual: 0x11111111111111111111111111111111
1 tests
Error: 1 failure

Versions and Environment

Cranelift version or commit: main

Operating system: Linux

Architecture: x86_64

Extra Info

This issue is caused by this optimization to bitselect. It checks if every byte in the vconst is 0xFF or 0x00, which it is in this case, but then emits a blend instruction of whatever type the original bitselect was issued.

This is correct for i8x16, but not for any type with a larger lane size.

This does not affect wasmtime since wasmtime always bitcasts the inputs to bitselect into a i8x16 before the operation which is the only type for which this works.

We also currently don't remove bitcasts in the midend, so this won't get accidentally converted into a bitselect.f64x2.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2023 at 12:53):

afonso360 added the bug label to Issue #6890.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2023 at 12:53):

afonso360 added the cranelift label to Issue #6890.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2023 at 12:53):

afonso360 added the cranelift:area:x64 label to Issue #6890.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2023 at 19:12):

jameysharp commented on issue #6890:

I think bitcast isn't the cause of this problem, right? Should the issue title say bitselect.f64x2 instead?

If I understand this correctly, we can fix this by using x64_pblendvb instead of x64_blend in this optimized lowering, right?
https://github.com/bytecodealliance/wasmtime/blob/14b39bc234b1c8f9d5212e481a93a91e72b22807/cranelift/codegen/src/isa/x64/lower.isle#L1372-L1376

The x64_blend term is a helper that tries to use a type-appropriate variant of blend, and is only used from the above rule. Similarly vconst_all_ones_or_all_zeros is only used from this rule, and checks each byte of the constant. So I think it's appropriate to use a byte-oriented version of blend whenever that pattern matches.
https://github.com/bytecodealliance/wasmtime/blob/14b39bc234b1c8f9d5212e481a93a91e72b22807/cranelift/codegen/src/isa/x64/inst.isle#L3330-L3334

view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2023 at 19:45):

alexcrichton commented on issue #6890:

Good find @afonso360, thanks! Thinking about what @jameysharp said though plus a little more, I think that this optimization on x64 is even buggier (but still not reachable from wasm)

Then blend-style instructions use lane selection so are only valid if the entire lane has the same bit pattern (as a substitute for bitselect), but the type of the vector and comparison isn't being factored in here. This means that a f64x2 bitselect based on an i8x16 {i,f}cmp is not valid, in addition to the issues with constants that identified.

I think essentially all_ones_or_all_zeros needs to take the type of the bitselect as input and test whether the comparison has the same type lane-width-wise and additionally the vconst_all_ones_or_all_zeros helper needs to take the type as input to test the full lane width is all ones or all zeros, not just each byte is all ones or all zeros.

Also, for constants, I think the entire rule should be removed. Instead I think it would be better to transform a bitselect-with-constant-mask into a shuffle which then enables a whole different set of pattern-matching which is much more optimal than loading a constant into a register and performing a blend.


Last updated: Oct 23 2024 at 20:03 UTC