Stream: git-wasmtime

Topic: wasmtime / issue #6890 Cranelift: Wrong result for `bitse...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2023 at 19:50):

afonso360 edited issue #6890:

:wave: Hey,

.clif Test Case

test interpret
test run
target x86_64 has_sse41

function %bitselect_vconst_f64x2(f64x2, f64x2) -> f64x2 {
block0(v1: f64x2, v2: f64x2):
    v0 = vconst.f64x2 0xFF00000000000000FF00000000000000
    v3 = bitselect v0, v1, v2
    return v3
}
; run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000

Steps to Reproduce

Expected Results

The test to pass.

Actual Results

     Running `/home/afonso/git/wasmtime/target/debug/clif-util test ./lmao2.clif`
 ERROR cranelift_filetests::concurrent > FAIL: run
FAIL ./lmao2.clif: run

Caused by:
    Failed test: run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000, actual: 0x11111111111111111111111111111111
1 tests
Error: 1 failure

Versions and Environment

Cranelift version or commit: main

Operating system: Linux

Architecture: x86_64

Extra Info

This issue is caused by this optimization to bitselect. It checks if every byte in the vconst is 0xFF or 0x00, which it is in this case, but then emits a blend instruction of whatever type the original bitselect was issued.

This is correct for i8x16, but not for any type with a larger lane size.

This does not affect wasmtime since wasmtime always bitcasts the inputs to bitselect into a i8x16 before the operation which is the only type for which this works.

We also currently don't remove bitcasts in the midend, so this won't get accidentally converted into a bitselect.f64x2.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2023 at 19:52):

afonso360 commented on issue #6890:

I think bitcast isn't the cause of this problem, right? Should the issue title say bitselect.f64x2 instead?

Oops, yes, bitselect, not bitcast!

view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2023 at 19:58):

jameysharp commented on issue #6890:

I'm fine with converting bitselect with a constant mask into shuffle if you think that's best, @alexcrichton.

But if we don't do that then I'd like to think through the rest of what you said more carefully. I don't see why vconst_all_ones_or_all_zeros needs to do anything but a byte-oriented pattern-match, so long as the result is always a byte-oriented blend. As long as both use the same definition of what a "lane" is, then we only fire this rule when the MSB of a lane is the same as all the other bits in the lane, so the intended type doesn't matter. Choosing byte-sized lanes means the rule can match more cases while still giving correct results.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 23 2023 at 20:09):

alexcrichton commented on issue #6890:

Good point! Also sorry I should have read your comment more closely and more carefully. I believe you're correct and always using x64_pblendvb here is sufficient, and my point about turning things into shuffle is orthogonal.

(sorry I should have learned by now to read things completely)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 28 2023 at 15:04):

abrown closed issue #6890:

:wave: Hey,

.clif Test Case

test interpret
test run
target x86_64 has_sse41

function %bitselect_vconst_f64x2(f64x2, f64x2) -> f64x2 {
block0(v1: f64x2, v2: f64x2):
    v0 = vconst.f64x2 0xFF00000000000000FF00000000000000
    v3 = bitselect v0, v1, v2
    return v3
}
; run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000

Steps to Reproduce

Expected Results

The test to pass.

Actual Results

     Running `/home/afonso/git/wasmtime/target/debug/clif-util test ./lmao2.clif`
 ERROR cranelift_filetests::concurrent > FAIL: run
FAIL ./lmao2.clif: run

Caused by:
    Failed test: run: %bitselect_vconst_f64x2(0x11111111111111111111111111111111, 0x00000000000000000000000000000000) == 0x11000000000000001100000000000000, actual: 0x11111111111111111111111111111111
1 tests
Error: 1 failure

Versions and Environment

Cranelift version or commit: main

Operating system: Linux

Architecture: x86_64

Extra Info

This issue is caused by this optimization to bitselect. It checks if every byte in the vconst is 0xFF or 0x00, which it is in this case, but then emits a blend instruction of whatever type the original bitselect was issued.

This is correct for i8x16, but not for any type with a larger lane size.

This does not affect wasmtime since wasmtime always bitcasts the inputs to bitselect into a i8x16 before the operation which is the only type for which this works.

We also currently don't remove bitcasts in the midend, so this won't get accidentally converted into a bitselect.f64x2.


Last updated: Dec 23 2024 at 12:05 UTC