Stream: git-wasmtime

Topic: wasmtime / issue #3327 x64: Incorrect codegen for f32x4.a...


view this post on Zulip Wasmtime GitHub notifications bot (Sep 11 2021 at 05:50):

alexcrichton opened issue #3327:

Found via fuzzing this module:

(module
  (func (result v128)
    v128.const f32x4 0 0 0 0
    f32x4.abs
    v128.not)
  (export "1" (func 0))
)

yields:

$ cargo run testcase0.wat --invoke 1 --enable-simd
warning: using `--invoke` with a function that returns values is experimental and may break in the future
0

when it should print u128::MAX.

The disassembly of this function is:

0000000000000000 <_wasm_function_0>:
       0:       55                      push   %rbp
       1:       48 89 e5                mov    %rsp,%rbp
       4:       f3 0f 6f 05 24 00 00    movdqu 0x24(%rip),%xmm0        # 30 <_wasm_function_0+0x30>
       b:       00
       c:       0f 57 c9                xorps  %xmm1,%xmm1
       f:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      13:       66 0f 72 d1 01          psrld  $0x1,%xmm1
      18:       0f 54 c1                andps  %xmm1,%xmm0
      1b:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      1f:       0f 57 c1                xorps  %xmm1,%xmm0
      22:       48 89 ec                mov    %rbp,%rsp
      25:       5d                      pop    %rbp
      26:       c3                      retq

I don't for sure know what's going on here with each individual instruction, but this sort of looks like a register allocator issue? I'm not sure what the second xorps is doing there with those registers. If this is a register allocator thing it may or may not be related to https://github.com/bytecodealliance/wasmtime/issues/3216

view this post on Zulip Wasmtime GitHub notifications bot (Sep 11 2021 at 05:50):

alexcrichton labeled issue #3327:

Found via fuzzing this module:

(module
  (func (result v128)
    v128.const f32x4 0 0 0 0
    f32x4.abs
    v128.not)
  (export "1" (func 0))
)

yields:

$ cargo run testcase0.wat --invoke 1 --enable-simd
warning: using `--invoke` with a function that returns values is experimental and may break in the future
0

when it should print u128::MAX.

The disassembly of this function is:

0000000000000000 <_wasm_function_0>:
       0:       55                      push   %rbp
       1:       48 89 e5                mov    %rsp,%rbp
       4:       f3 0f 6f 05 24 00 00    movdqu 0x24(%rip),%xmm0        # 30 <_wasm_function_0+0x30>
       b:       00
       c:       0f 57 c9                xorps  %xmm1,%xmm1
       f:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      13:       66 0f 72 d1 01          psrld  $0x1,%xmm1
      18:       0f 54 c1                andps  %xmm1,%xmm0
      1b:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      1f:       0f 57 c1                xorps  %xmm1,%xmm0
      22:       48 89 ec                mov    %rbp,%rsp
      25:       5d                      pop    %rbp
      26:       c3                      retq

I don't for sure know what's going on here with each individual instruction, but this sort of looks like a register allocator issue? I'm not sure what the second xorps is doing there with those registers. If this is a register allocator thing it may or may not be related to https://github.com/bytecodealliance/wasmtime/issues/3216

view this post on Zulip Wasmtime GitHub notifications bot (Sep 11 2021 at 05:50):

alexcrichton labeled issue #3327:

Found via fuzzing this module:

(module
  (func (result v128)
    v128.const f32x4 0 0 0 0
    f32x4.abs
    v128.not)
  (export "1" (func 0))
)

yields:

$ cargo run testcase0.wat --invoke 1 --enable-simd
warning: using `--invoke` with a function that returns values is experimental and may break in the future
0

when it should print u128::MAX.

The disassembly of this function is:

0000000000000000 <_wasm_function_0>:
       0:       55                      push   %rbp
       1:       48 89 e5                mov    %rsp,%rbp
       4:       f3 0f 6f 05 24 00 00    movdqu 0x24(%rip),%xmm0        # 30 <_wasm_function_0+0x30>
       b:       00
       c:       0f 57 c9                xorps  %xmm1,%xmm1
       f:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      13:       66 0f 72 d1 01          psrld  $0x1,%xmm1
      18:       0f 54 c1                andps  %xmm1,%xmm0
      1b:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      1f:       0f 57 c1                xorps  %xmm1,%xmm0
      22:       48 89 ec                mov    %rbp,%rsp
      25:       5d                      pop    %rbp
      26:       c3                      retq

I don't for sure know what's going on here with each individual instruction, but this sort of looks like a register allocator issue? I'm not sure what the second xorps is doing there with those registers. If this is a register allocator thing it may or may not be related to https://github.com/bytecodealliance/wasmtime/issues/3216

view this post on Zulip Wasmtime GitHub notifications bot (Sep 11 2021 at 05:50):

alexcrichton labeled issue #3327:

Found via fuzzing this module:

(module
  (func (result v128)
    v128.const f32x4 0 0 0 0
    f32x4.abs
    v128.not)
  (export "1" (func 0))
)

yields:

$ cargo run testcase0.wat --invoke 1 --enable-simd
warning: using `--invoke` with a function that returns values is experimental and may break in the future
0

when it should print u128::MAX.

The disassembly of this function is:

0000000000000000 <_wasm_function_0>:
       0:       55                      push   %rbp
       1:       48 89 e5                mov    %rsp,%rbp
       4:       f3 0f 6f 05 24 00 00    movdqu 0x24(%rip),%xmm0        # 30 <_wasm_function_0+0x30>
       b:       00
       c:       0f 57 c9                xorps  %xmm1,%xmm1
       f:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      13:       66 0f 72 d1 01          psrld  $0x1,%xmm1
      18:       0f 54 c1                andps  %xmm1,%xmm0
      1b:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      1f:       0f 57 c1                xorps  %xmm1,%xmm0
      22:       48 89 ec                mov    %rbp,%rsp
      25:       5d                      pop    %rbp
      26:       c3                      retq

I don't for sure know what's going on here with each individual instruction, but this sort of looks like a register allocator issue? I'm not sure what the second xorps is doing there with those registers. If this is a register allocator thing it may or may not be related to https://github.com/bytecodealliance/wasmtime/issues/3216

view this post on Zulip Wasmtime GitHub notifications bot (Sep 11 2021 at 05:50):

alexcrichton labeled issue #3327:

Found via fuzzing this module:

(module
  (func (result v128)
    v128.const f32x4 0 0 0 0
    f32x4.abs
    v128.not)
  (export "1" (func 0))
)

yields:

$ cargo run testcase0.wat --invoke 1 --enable-simd
warning: using `--invoke` with a function that returns values is experimental and may break in the future
0

when it should print u128::MAX.

The disassembly of this function is:

0000000000000000 <_wasm_function_0>:
       0:       55                      push   %rbp
       1:       48 89 e5                mov    %rsp,%rbp
       4:       f3 0f 6f 05 24 00 00    movdqu 0x24(%rip),%xmm0        # 30 <_wasm_function_0+0x30>
       b:       00
       c:       0f 57 c9                xorps  %xmm1,%xmm1
       f:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      13:       66 0f 72 d1 01          psrld  $0x1,%xmm1
      18:       0f 54 c1                andps  %xmm1,%xmm0
      1b:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      1f:       0f 57 c1                xorps  %xmm1,%xmm0
      22:       48 89 ec                mov    %rbp,%rsp
      25:       5d                      pop    %rbp
      26:       c3                      retq

I don't for sure know what's going on here with each individual instruction, but this sort of looks like a register allocator issue? I'm not sure what the second xorps is doing there with those registers. If this is a register allocator thing it may or may not be related to https://github.com/bytecodealliance/wasmtime/issues/3216

view this post on Zulip Wasmtime GitHub notifications bot (Sep 12 2021 at 00:19):

alexcrichton commented on issue #3327:

This is another module which I believe is broken:

(module
  (func (;0;) (result i32)
    v128.const i32x4 0xffffffff 0x80bfffff 0x80bf0a0a 0x80bf0a0a
    f64x2.promote_low_f32x4
    v128.not
    v128.not
    v128.not
    v128.not
    v128.not
    v128.not
    v128.not
    v128.const i32x4 0 0 0 0
    f64x2.gt
    v128.not
    i64x2.bitmask)
  (export "" (func 0)))

It still has v128.not but I don't know if it's the same bug, I'm just assuming it's related. This is fuzz-generated and v8 claims the function should produce 0 but Wasmtime produces 3 as the answer.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2021 at 16:37):

cfallin commented on issue #3327:

cc @jlb6740, could you take a look?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 17 2021 at 13:59):

alexcrichton commented on issue #3327:

Another one cropped up today which I'm lumping in with this one, but recording to ensure we can test the fix:

(module
  (type (;0;) (func (param i32) (result i32)))
  (func (;0;) (type 0) (param i32) (result i32)
    local.get 0
    i32x4.splat
    f64x2.abs
    v128.not
    i64x2.bitmask)
  (memory (;0;) 1 1)
  (export "" (func 0))
  (export "1" (func 0)))

view this post on Zulip Wasmtime GitHub notifications bot (Nov 05 2021 at 20:43):

alexcrichton commented on issue #3327:

I've got a fix for this when ISLE lands

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2021 at 00:42):

jlb6740 commented on issue #3327:

@alexcrichton I think I see the issue here. The abs lowering:

xorps   xmm1, xmm1
cmpeqps xmm1, xmm1
psrld   xmm1, 1
andps   xmm0, xmm1

leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:

"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

Takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to do then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with a number that cmpeqps returns an unexpected result. If we do this on another register that just so happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical or correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this to fix this bug.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2021 at 00:43):

jlb6740 edited a comment on issue #3327:

@alexcrichton I think I see the issue here. The abs lowering:

xorps   xmm1, xmm1
cmpeqps xmm1, xmm1
psrld   xmm1, 1
andps   xmm0, xmm1

leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:

"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

Takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to do then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with a number that cmpeqps returns an unexpected result. If we do this on another register that just so happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical or correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this lowering for not:

"xorps %%xmm1, %%xmm1\n\t"
"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

to resolve this bug.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2021 at 00:47):

jlb6740 edited a comment on issue #3327:

@alexcrichton I think I see the issue here. The abs lowering:

xorps   xmm1, xmm1
cmpeqps xmm1, xmm1
psrld   xmm1, 1
andps   xmm0, xmm1

leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:

"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to do then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with a number that cmpeqps returns an unexpected result. If we do this on another register that just so happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical or correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this lowering for not:

"xorps %%xmm1, %%xmm1\n\t"
"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

to resolve this bug.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2021 at 01:29):

jlb6740 edited a comment on issue #3327:

@alexcrichton I think I see the issue here. The abs lowering:

xorps   xmm1, xmm1
cmpeqps xmm1, xmm1
psrld   xmm1, 1
andps   xmm0, xmm1

leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:

"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to do then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with a number that cmpeqps doesn't return as expected because it is not a valid float. If we do this on another register that just so happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical or correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this lowering for not:

"xorps %%xmm1, %%xmm1\n\t"
"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

to resolve this bug.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2021 at 01:44):

jlb6740 edited a comment on issue #3327:

@alexcrichton I think I see the issue here. The abs lowering:

xorps   xmm1, xmm1
cmpeqps xmm1, xmm1
psrld   xmm1, 1
andps   xmm0, xmm1

leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:

"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with a number that cmpeqps doesn't return as expected because it is not a valid float. If we do this on another register that just so happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical or correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this lowering for not:

"xorps %%xmm1, %%xmm1\n\t"
"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

to resolve this bug.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2021 at 01:45):

jlb6740 edited a comment on issue #3327:

@alexcrichton I think I see the issue here. The abs lowering:

xorps   xmm1, xmm1
cmpeqps xmm1, xmm1
psrld   xmm1, 1
andps   xmm0, xmm1

leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:

"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with an invalid float so cmpeqps doesn't return as expected. If we do this on another register that happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical or correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this lowering for not:

"xorps %%xmm1, %%xmm1\n\t"
"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

to resolve this bug.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2021 at 01:45):

jlb6740 edited a comment on issue #3327:

@alexcrichton I think I see the issue here. The abs lowering:

xorps   xmm1, xmm1
cmpeqps xmm1, xmm1
psrld   xmm1, 1
andps   xmm0, xmm1

leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:

"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with an invalid float so cmpeqps doesn't return as expected. If we do this on another register that happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical xor correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this lowering for not:

"xorps %%xmm1, %%xmm1\n\t"
"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

to resolve this bug.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2021 at 01:45):

jlb6740 edited a comment on issue #3327:

@alexcrichton I think I see the issue here. The abs lowering:

xorps   xmm1, xmm1
cmpeqps xmm1, xmm1
psrld   xmm1, 1
andps   xmm0, xmm1

leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:

"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with an invalid float so cmpeqps doesn't return as expected. If we do this on another register that happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical xor correctly zeros the register allowing the cmpeqps to return as desired. Maybe a another instruction sequence exist but will push this lowering for not:

"xorps %%xmm1, %%xmm1\n\t"
"cmpeqps %%xmm1, %%xmm1\n\t"
"xorps %%xmm1, %%xmm0\n\t"

to resolve this bug.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2021 at 01:49):

jlb6740 commented on issue #3327:

@alexcrichton btw .. just read your comment on https://github.com/alexcrichton/wasmtime/commit/f7ccaf2395b4fed071e97eaf6c9d490e42638a65 and indeed the fabs/fneg lowering implements this correctly. The bnot implementation apparently didn't foresee this issue.

I created a jist here that may be useful for other bugs: https://gist.github.com/jlb6740/33e77c7f32beb99bdc392ff456fbc864

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2021 at 01:50):

jlb6740 edited a comment on issue #3327:

@alexcrichton btw .. just read your comment on https://github.com/alexcrichton/wasmtime/commit/f7ccaf2395b4fed071e97eaf6c9d490e42638a65 and indeed the fabs/fneg lowering implements this correctly. The bnot implementation apparently didn't foresee this issue.

I created a jist here that may be useful for debugging similar bugs: https://gist.github.com/jlb6740/33e77c7f32beb99bdc392ff456fbc864

view this post on Zulip Wasmtime GitHub notifications bot (Nov 16 2021 at 15:28):

alexcrichton closed issue #3327:

Found via fuzzing this module:

(module
  (func (result v128)
    v128.const f32x4 0 0 0 0
    f32x4.abs
    v128.not)
  (export "1" (func 0))
)

yields:

$ cargo run testcase0.wat --invoke 1 --enable-simd
warning: using `--invoke` with a function that returns values is experimental and may break in the future
0

when it should print u128::MAX.

The disassembly of this function is:

0000000000000000 <_wasm_function_0>:
       0:       55                      push   %rbp
       1:       48 89 e5                mov    %rsp,%rbp
       4:       f3 0f 6f 05 24 00 00    movdqu 0x24(%rip),%xmm0        # 30 <_wasm_function_0+0x30>
       b:       00
       c:       0f 57 c9                xorps  %xmm1,%xmm1
       f:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      13:       66 0f 72 d1 01          psrld  $0x1,%xmm1
      18:       0f 54 c1                andps  %xmm1,%xmm0
      1b:       0f c2 c9 00             cmpeqps %xmm1,%xmm1
      1f:       0f 57 c1                xorps  %xmm1,%xmm0
      22:       48 89 ec                mov    %rbp,%rsp
      25:       5d                      pop    %rbp
      26:       c3                      retq

I don't for sure know what's going on here with each individual instruction, but this sort of looks like a register allocator issue? I'm not sure what the second xorps is doing there with those registers. If this is a register allocator thing it may or may not be related to https://github.com/bytecodealliance/wasmtime/issues/3216

view this post on Zulip Wasmtime GitHub notifications bot (Nov 16 2021 at 15:28):

alexcrichton commented on issue #3327:

Fixed in https://github.com/bytecodealliance/wasmtime/pull/3517


Last updated: Jan 24 2025 at 00:11 UTC