Stream: git-wasmtime

Topic: wasmtime / issue #6562 Missing lowering on x64 after rece...


view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2023 at 19:13):

alexcrichton opened issue #6562:

The fuzz bug here shows that on main this program:

(module
  (func (result v128)
    i32.const 0
    v128.load32_splat align=1
    f64x2.convert_low_i32x4_u
  )
  (memory 0 1)
)

will crash when compiled for x86_64:

$ cargo run -q --features all-arch compile testcase0.shrunken.wat --target x86_64
thread '<unnamed>' panicked at 'should be implemented in ISLE: inst = `v10 = fcvt_from_uint.f64x2 v14`, type = `Some(types::F64X2)`', cranelift/codegen/src/machinst/lower.rs:749:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Bisection shows that this was introduced in https://github.com/bytecodealliance/wasmtime/pull/6533 and this is because the x64 backend only has one lowering for fcvt_from_uint matching the exact shape of the lowering of f64x2.convert_low_i32x4_u. This lowering is transformed via the optimization passes added in #6533, meaning that the lowering isn't triggered which leads to the panic.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2023 at 19:13):

alexcrichton labeled issue #6562:

The fuzz bug here shows that on main this program:

(module
  (func (result v128)
    i32.const 0
    v128.load32_splat align=1
    f64x2.convert_low_i32x4_u
  )
  (memory 0 1)
)

will crash when compiled for x86_64:

$ cargo run -q --features all-arch compile testcase0.shrunken.wat --target x86_64
thread '<unnamed>' panicked at 'should be implemented in ISLE: inst = `v10 = fcvt_from_uint.f64x2 v14`, type = `Some(types::F64X2)`', cranelift/codegen/src/machinst/lower.rs:749:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Bisection shows that this was introduced in https://github.com/bytecodealliance/wasmtime/pull/6533 and this is because the x64 backend only has one lowering for fcvt_from_uint matching the exact shape of the lowering of f64x2.convert_low_i32x4_u. This lowering is transformed via the optimization passes added in #6533, meaning that the lowering isn't triggered which leads to the panic.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2023 at 19:13):

alexcrichton labeled issue #6562:

The fuzz bug here shows that on main this program:

(module
  (func (result v128)
    i32.const 0
    v128.load32_splat align=1
    f64x2.convert_low_i32x4_u
  )
  (memory 0 1)
)

will crash when compiled for x86_64:

$ cargo run -q --features all-arch compile testcase0.shrunken.wat --target x86_64
thread '<unnamed>' panicked at 'should be implemented in ISLE: inst = `v10 = fcvt_from_uint.f64x2 v14`, type = `Some(types::F64X2)`', cranelift/codegen/src/machinst/lower.rs:749:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Bisection shows that this was introduced in https://github.com/bytecodealliance/wasmtime/pull/6533 and this is because the x64 backend only has one lowering for fcvt_from_uint matching the exact shape of the lowering of f64x2.convert_low_i32x4_u. This lowering is transformed via the optimization passes added in #6533, meaning that the lowering isn't triggered which leads to the panic.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2023 at 19:23):

alexcrichton commented on issue #6562:

Some possible solutions I can think of to this, none of which are great I think, are:

Add a lowering for fcvt_from_uint.f64x2 for all inputs

This "feels" like the right solution but the generated code on x64 is pretty heinous. Looking at LLVM's translation of:

#[no_mangle]
pub unsafe extern "C" fn foo(a: u64x2) -> f64x2 {
    simd_cast::<u64x2, f64x2>(a)
}

it disassembles as:

0000000000000000 <_foo>:
       0: 55                            pushq   %rbp
       1: 48 89 e5                      movq    %rsp, %rbp
       4: c5 f1 ef c9                   vpxor   %xmm1, %xmm1, %xmm1
       8: c4 e3 79 02 c9 0a             vpblendd        $10, %xmm1, %xmm0, %xmm1 ## xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3]
       e: c5 f1 eb 0d 2a 00 00 00       vpor    42(%rip), %xmm1, %xmm1  ## 0x40 <_foo+0x40>
      16: c5 f9 73 d0 20                vpsrlq  $32, %xmm0, %xmm0
      1b: c5 f9 eb 05 2d 00 00 00       vpor    45(%rip), %xmm0, %xmm0  ## 0x50 <_foo+0x50>
      23: c5 f9 5c 05 35 00 00 00       vsubpd  53(%rip), %xmm0, %xmm0  ## 0x60 <_foo+0x60>
      2b: c5 f1 58 c0                   vaddpd  %xmm0, %xmm1, %xmm0
      2f: 5d                            popq    %rbp
      30: c3                            retq

so not exactly a trivial or small lowering.

Add a lowering for this new pattern

For example:

(lower (has_type $F64X2 (fcvt_from_uint (splat (uextend val)))) ...)

This arguably could be a simpler rule than the previous lowering too by doing the conversion then the splat, for example cvtsi2sd + pshufd (or something like that). This feels pretty specific to the problem at hand, though.

Add more egraph optimization rules

Sort of like the previous pattern, egraph optimizations could transform (fcvt_from_uint (splat (uextend val))) into (splat (fcvt_from_uint (uextend val))) which would then lower correctly. This still feels very specific, though and still brittle.

Revert the original commit

Only added here for completeness, I don't think this should be done. The original egraph rules added seem good to have and I don't think there's a strong reason to remove them.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2023 at 19:31):

jameysharp commented on issue #6562:

Transforming (fcvt_from_uint (splat (uextend val))) into (splat (fcvt_from_uint (uextend val))) strikes me as a good idea regardless. I think moving operations inside splat increases the odds that we'll find more optimization opportunities since we're much more likely to have good ideas for scalar optimizations.

I'd _also_ like to have a non-fragile lowering for vector fcvt, and I think that will be necessary as we expand the set of instructions that we cover with cranelift-fuzzgen.

But in terms of fixing the immediate fuzz bug I like the idea of just adding the additional egraph rules for now.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2023 at 20:50):

alexcrichton closed issue #6562:

The fuzz bug here shows that on main this program:

(module
  (func (result v128)
    i32.const 0
    v128.load32_splat align=1
    f64x2.convert_low_i32x4_u
  )
  (memory 0 1)
)

will crash when compiled for x86_64:

$ cargo run -q --features all-arch compile testcase0.shrunken.wat --target x86_64
thread '<unnamed>' panicked at 'should be implemented in ISLE: inst = `v10 = fcvt_from_uint.f64x2 v14`, type = `Some(types::F64X2)`', cranelift/codegen/src/machinst/lower.rs:749:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Bisection shows that this was introduced in https://github.com/bytecodealliance/wasmtime/pull/6533 and this is because the x64 backend only has one lowering for fcvt_from_uint matching the exact shape of the lowering of f64x2.convert_low_i32x4_u. This lowering is transformed via the optimization passes added in #6533, meaning that the lowering isn't triggered which leads to the panic.


Last updated: Nov 22 2024 at 17:03 UTC