alexcrichton opened issue #6562:
The fuzz bug here shows that on
main
this program:(module (func (result v128) i32.const 0 v128.load32_splat align=1 f64x2.convert_low_i32x4_u ) (memory 0 1) )
will crash when compiled for x86_64:
$ cargo run -q --features all-arch compile testcase0.shrunken.wat --target x86_64 thread '<unnamed>' panicked at 'should be implemented in ISLE: inst = `v10 = fcvt_from_uint.f64x2 v14`, type = `Some(types::F64X2)`', cranelift/codegen/src/machinst/lower.rs:749:21 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Bisection shows that this was introduced in https://github.com/bytecodealliance/wasmtime/pull/6533 and this is because the x64 backend only has one lowering for
fcvt_from_uint
matching the exact shape of the lowering off64x2.convert_low_i32x4_u
. This lowering is transformed via the optimization passes added in #6533, meaning that the lowering isn't triggered which leads to the panic.
alexcrichton labeled issue #6562:
The fuzz bug here shows that on
main
this program:(module (func (result v128) i32.const 0 v128.load32_splat align=1 f64x2.convert_low_i32x4_u ) (memory 0 1) )
will crash when compiled for x86_64:
$ cargo run -q --features all-arch compile testcase0.shrunken.wat --target x86_64 thread '<unnamed>' panicked at 'should be implemented in ISLE: inst = `v10 = fcvt_from_uint.f64x2 v14`, type = `Some(types::F64X2)`', cranelift/codegen/src/machinst/lower.rs:749:21 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Bisection shows that this was introduced in https://github.com/bytecodealliance/wasmtime/pull/6533 and this is because the x64 backend only has one lowering for
fcvt_from_uint
matching the exact shape of the lowering off64x2.convert_low_i32x4_u
. This lowering is transformed via the optimization passes added in #6533, meaning that the lowering isn't triggered which leads to the panic.
alexcrichton labeled issue #6562:
The fuzz bug here shows that on
main
this program:(module (func (result v128) i32.const 0 v128.load32_splat align=1 f64x2.convert_low_i32x4_u ) (memory 0 1) )
will crash when compiled for x86_64:
$ cargo run -q --features all-arch compile testcase0.shrunken.wat --target x86_64 thread '<unnamed>' panicked at 'should be implemented in ISLE: inst = `v10 = fcvt_from_uint.f64x2 v14`, type = `Some(types::F64X2)`', cranelift/codegen/src/machinst/lower.rs:749:21 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Bisection shows that this was introduced in https://github.com/bytecodealliance/wasmtime/pull/6533 and this is because the x64 backend only has one lowering for
fcvt_from_uint
matching the exact shape of the lowering off64x2.convert_low_i32x4_u
. This lowering is transformed via the optimization passes added in #6533, meaning that the lowering isn't triggered which leads to the panic.
alexcrichton commented on issue #6562:
Some possible solutions I can think of to this, none of which are great I think, are:
Add a lowering for
fcvt_from_uint.f64x2
for all inputsThis "feels" like the right solution but the generated code on x64 is pretty heinous. Looking at LLVM's translation of:
#[no_mangle] pub unsafe extern "C" fn foo(a: u64x2) -> f64x2 { simd_cast::<u64x2, f64x2>(a) }
it disassembles as:
0000000000000000 <_foo>: 0: 55 pushq %rbp 1: 48 89 e5 movq %rsp, %rbp 4: c5 f1 ef c9 vpxor %xmm1, %xmm1, %xmm1 8: c4 e3 79 02 c9 0a vpblendd $10, %xmm1, %xmm0, %xmm1 ## xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3] e: c5 f1 eb 0d 2a 00 00 00 vpor 42(%rip), %xmm1, %xmm1 ## 0x40 <_foo+0x40> 16: c5 f9 73 d0 20 vpsrlq $32, %xmm0, %xmm0 1b: c5 f9 eb 05 2d 00 00 00 vpor 45(%rip), %xmm0, %xmm0 ## 0x50 <_foo+0x50> 23: c5 f9 5c 05 35 00 00 00 vsubpd 53(%rip), %xmm0, %xmm0 ## 0x60 <_foo+0x60> 2b: c5 f1 58 c0 vaddpd %xmm0, %xmm1, %xmm0 2f: 5d popq %rbp 30: c3 retq
so not exactly a trivial or small lowering.
Add a lowering for this new pattern
For example:
(lower (has_type $F64X2 (fcvt_from_uint (splat (uextend val)))) ...)
This arguably could be a simpler rule than the previous lowering too by doing the conversion then the splat, for example
cvtsi2sd
+pshufd
(or something like that). This feels pretty specific to the problem at hand, though.Add more egraph optimization rules
Sort of like the previous pattern, egraph optimizations could transform
(fcvt_from_uint (splat (uextend val)))
into(splat (fcvt_from_uint (uextend val)))
which would then lower correctly. This still feels very specific, though and still brittle.Revert the original commit
Only added here for completeness, I don't think this should be done. The original egraph rules added seem good to have and I don't think there's a strong reason to remove them.
jameysharp commented on issue #6562:
Transforming
(fcvt_from_uint (splat (uextend val)))
into(splat (fcvt_from_uint (uextend val)))
strikes me as a good idea regardless. I think moving operations insidesplat
increases the odds that we'll find more optimization opportunities since we're much more likely to have good ideas for scalar optimizations.I'd _also_ like to have a non-fragile lowering for vector fcvt, and I think that will be necessary as we expand the set of instructions that we cover with cranelift-fuzzgen.
But in terms of fixing the immediate fuzz bug I like the idea of just adding the additional egraph rules for now.
alexcrichton closed issue #6562:
The fuzz bug here shows that on
main
this program:(module (func (result v128) i32.const 0 v128.load32_splat align=1 f64x2.convert_low_i32x4_u ) (memory 0 1) )
will crash when compiled for x86_64:
$ cargo run -q --features all-arch compile testcase0.shrunken.wat --target x86_64 thread '<unnamed>' panicked at 'should be implemented in ISLE: inst = `v10 = fcvt_from_uint.f64x2 v14`, type = `Some(types::F64X2)`', cranelift/codegen/src/machinst/lower.rs:749:21 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Bisection shows that this was introduced in https://github.com/bytecodealliance/wasmtime/pull/6533 and this is because the x64 backend only has one lowering for
fcvt_from_uint
matching the exact shape of the lowering off64x2.convert_low_i32x4_u
. This lowering is transformed via the optimization passes added in #6533, meaning that the lowering isn't triggered which leads to the panic.
Last updated: Nov 22 2024 at 17:03 UTC