alexcrichton opened issue #3327:
Found via fuzzing this module:
(module (func (result v128) v128.const f32x4 0 0 0 0 f32x4.abs v128.not) (export "1" (func 0)) )
yields:
$ cargo run testcase0.wat --invoke 1 --enable-simd warning: using `--invoke` with a function that returns values is experimental and may break in the future 0
when it should print
u128::MAX
.The disassembly of this function is:
0000000000000000 <_wasm_function_0>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: f3 0f 6f 05 24 00 00 movdqu 0x24(%rip),%xmm0 # 30 <_wasm_function_0+0x30> b: 00 c: 0f 57 c9 xorps %xmm1,%xmm1 f: 0f c2 c9 00 cmpeqps %xmm1,%xmm1 13: 66 0f 72 d1 01 psrld $0x1,%xmm1 18: 0f 54 c1 andps %xmm1,%xmm0 1b: 0f c2 c9 00 cmpeqps %xmm1,%xmm1 1f: 0f 57 c1 xorps %xmm1,%xmm0 22: 48 89 ec mov %rbp,%rsp 25: 5d pop %rbp 26: c3 retq
I don't for sure know what's going on here with each individual instruction, but this sort of looks like a register allocator issue? I'm not sure what the second
xorps
is doing there with those registers. If this is a register allocator thing it may or may not be related to https://github.com/bytecodealliance/wasmtime/issues/3216
alexcrichton labeled issue #3327:
Found via fuzzing this module:
(module (func (result v128) v128.const f32x4 0 0 0 0 f32x4.abs v128.not) (export "1" (func 0)) )
yields:
$ cargo run testcase0.wat --invoke 1 --enable-simd warning: using `--invoke` with a function that returns values is experimental and may break in the future 0
when it should print
u128::MAX
.The disassembly of this function is:
0000000000000000 <_wasm_function_0>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: f3 0f 6f 05 24 00 00 movdqu 0x24(%rip),%xmm0 # 30 <_wasm_function_0+0x30> b: 00 c: 0f 57 c9 xorps %xmm1,%xmm1 f: 0f c2 c9 00 cmpeqps %xmm1,%xmm1 13: 66 0f 72 d1 01 psrld $0x1,%xmm1 18: 0f 54 c1 andps %xmm1,%xmm0 1b: 0f c2 c9 00 cmpeqps %xmm1,%xmm1 1f: 0f 57 c1 xorps %xmm1,%xmm0 22: 48 89 ec mov %rbp,%rsp 25: 5d pop %rbp 26: c3 retq
I don't for sure know what's going on here with each individual instruction, but this sort of looks like a register allocator issue? I'm not sure what the second
xorps
is doing there with those registers. If this is a register allocator thing it may or may not be related to https://github.com/bytecodealliance/wasmtime/issues/3216
alexcrichton labeled issue #3327:
Found via fuzzing this module:
(module (func (result v128) v128.const f32x4 0 0 0 0 f32x4.abs v128.not) (export "1" (func 0)) )
yields:
$ cargo run testcase0.wat --invoke 1 --enable-simd warning: using `--invoke` with a function that returns values is experimental and may break in the future 0
when it should print
u128::MAX
.The disassembly of this function is:
0000000000000000 <_wasm_function_0>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: f3 0f 6f 05 24 00 00 movdqu 0x24(%rip),%xmm0 # 30 <_wasm_function_0+0x30> b: 00 c: 0f 57 c9 xorps %xmm1,%xmm1 f: 0f c2 c9 00 cmpeqps %xmm1,%xmm1 13: 66 0f 72 d1 01 psrld $0x1,%xmm1 18: 0f 54 c1 andps %xmm1,%xmm0 1b: 0f c2 c9 00 cmpeqps %xmm1,%xmm1 1f: 0f 57 c1 xorps %xmm1,%xmm0 22: 48 89 ec mov %rbp,%rsp 25: 5d pop %rbp 26: c3 retq
I don't for sure know what's going on here with each individual instruction, but this sort of looks like a register allocator issue? I'm not sure what the second
xorps
is doing there with those registers. If this is a register allocator thing it may or may not be related to https://github.com/bytecodealliance/wasmtime/issues/3216
alexcrichton labeled issue #3327:
Found via fuzzing this module:
(module (func (result v128) v128.const f32x4 0 0 0 0 f32x4.abs v128.not) (export "1" (func 0)) )
yields:
$ cargo run testcase0.wat --invoke 1 --enable-simd warning: using `--invoke` with a function that returns values is experimental and may break in the future 0
when it should print
u128::MAX
.The disassembly of this function is:
0000000000000000 <_wasm_function_0>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: f3 0f 6f 05 24 00 00 movdqu 0x24(%rip),%xmm0 # 30 <_wasm_function_0+0x30> b: 00 c: 0f 57 c9 xorps %xmm1,%xmm1 f: 0f c2 c9 00 cmpeqps %xmm1,%xmm1 13: 66 0f 72 d1 01 psrld $0x1,%xmm1 18: 0f 54 c1 andps %xmm1,%xmm0 1b: 0f c2 c9 00 cmpeqps %xmm1,%xmm1 1f: 0f 57 c1 xorps %xmm1,%xmm0 22: 48 89 ec mov %rbp,%rsp 25: 5d pop %rbp 26: c3 retq
I don't for sure know what's going on here with each individual instruction, but this sort of looks like a register allocator issue? I'm not sure what the second
xorps
is doing there with those registers. If this is a register allocator thing it may or may not be related to https://github.com/bytecodealliance/wasmtime/issues/3216
alexcrichton labeled issue #3327:
Found via fuzzing this module:
(module (func (result v128) v128.const f32x4 0 0 0 0 f32x4.abs v128.not) (export "1" (func 0)) )
yields:
$ cargo run testcase0.wat --invoke 1 --enable-simd warning: using `--invoke` with a function that returns values is experimental and may break in the future 0
when it should print
u128::MAX
.The disassembly of this function is:
0000000000000000 <_wasm_function_0>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: f3 0f 6f 05 24 00 00 movdqu 0x24(%rip),%xmm0 # 30 <_wasm_function_0+0x30> b: 00 c: 0f 57 c9 xorps %xmm1,%xmm1 f: 0f c2 c9 00 cmpeqps %xmm1,%xmm1 13: 66 0f 72 d1 01 psrld $0x1,%xmm1 18: 0f 54 c1 andps %xmm1,%xmm0 1b: 0f c2 c9 00 cmpeqps %xmm1,%xmm1 1f: 0f 57 c1 xorps %xmm1,%xmm0 22: 48 89 ec mov %rbp,%rsp 25: 5d pop %rbp 26: c3 retq
I don't for sure know what's going on here with each individual instruction, but this sort of looks like a register allocator issue? I'm not sure what the second
xorps
is doing there with those registers. If this is a register allocator thing it may or may not be related to https://github.com/bytecodealliance/wasmtime/issues/3216
alexcrichton commented on issue #3327:
This is another module which I believe is broken:
(module (func (;0;) (result i32) v128.const i32x4 0xffffffff 0x80bfffff 0x80bf0a0a 0x80bf0a0a f64x2.promote_low_f32x4 v128.not v128.not v128.not v128.not v128.not v128.not v128.not v128.const i32x4 0 0 0 0 f64x2.gt v128.not i64x2.bitmask) (export "" (func 0)))
It still has
v128.not
but I don't know if it's the same bug, I'm just assuming it's related. This is fuzz-generated and v8 claims the function should produce 0 but Wasmtime produces 3 as the answer.
cfallin commented on issue #3327:
cc @jlb6740, could you take a look?
alexcrichton commented on issue #3327:
Another one cropped up today which I'm lumping in with this one, but recording to ensure we can test the fix:
(module (type (;0;) (func (param i32) (result i32))) (func (;0;) (type 0) (param i32) (result i32) local.get 0 i32x4.splat f64x2.abs v128.not i64x2.bitmask) (memory (;0;) 1 1) (export "" (func 0)) (export "1" (func 0)))
alexcrichton commented on issue #3327:
I've got a fix for this when ISLE lands
jlb6740 commented on issue #3327:
@alexcrichton I think I see the issue here. The abs lowering:
xorps xmm1, xmm1 cmpeqps xmm1, xmm1 psrld xmm1, 1 andps xmm0, xmm1
leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:"cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
Takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to do then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with a number that cmpeqps returns an unexpected result. If we do this on another register that just so happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical or correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this to fix this bug.
jlb6740 edited a comment on issue #3327:
@alexcrichton I think I see the issue here. The abs lowering:
xorps xmm1, xmm1 cmpeqps xmm1, xmm1 psrld xmm1, 1 andps xmm0, xmm1
leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:"cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
Takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to do then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with a number that cmpeqps returns an unexpected result. If we do this on another register that just so happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical or correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this lowering for not:
"xorps %%xmm1, %%xmm1\n\t" "cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
to resolve this bug.
jlb6740 edited a comment on issue #3327:
@alexcrichton I think I see the issue here. The abs lowering:
xorps xmm1, xmm1 cmpeqps xmm1, xmm1 psrld xmm1, 1 andps xmm0, xmm1
leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:"cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to do then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with a number that cmpeqps returns an unexpected result. If we do this on another register that just so happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical or correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this lowering for not:
"xorps %%xmm1, %%xmm1\n\t" "cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
to resolve this bug.
jlb6740 edited a comment on issue #3327:
@alexcrichton I think I see the issue here. The abs lowering:
xorps xmm1, xmm1 cmpeqps xmm1, xmm1 psrld xmm1, 1 andps xmm0, xmm1
leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:"cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to do then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with a number that cmpeqps doesn't return as expected because it is not a valid float. If we do this on another register that just so happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical or correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this lowering for not:
"xorps %%xmm1, %%xmm1\n\t" "cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
to resolve this bug.
jlb6740 edited a comment on issue #3327:
@alexcrichton I think I see the issue here. The abs lowering:
xorps xmm1, xmm1 cmpeqps xmm1, xmm1 psrld xmm1, 1 andps xmm0, xmm1
leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:"cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with a number that cmpeqps doesn't return as expected because it is not a valid float. If we do this on another register that just so happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical or correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this lowering for not:
"xorps %%xmm1, %%xmm1\n\t" "cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
to resolve this bug.
jlb6740 edited a comment on issue #3327:
@alexcrichton I think I see the issue here. The abs lowering:
xorps xmm1, xmm1 cmpeqps xmm1, xmm1 psrld xmm1, 1 andps xmm0, xmm1
leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:"cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with an invalid float so cmpeqps doesn't return as expected. If we do this on another register that happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical or correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this lowering for not:
"xorps %%xmm1, %%xmm1\n\t" "cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
to resolve this bug.
jlb6740 edited a comment on issue #3327:
@alexcrichton I think I see the issue here. The abs lowering:
xorps xmm1, xmm1 cmpeqps xmm1, xmm1 psrld xmm1, 1 andps xmm0, xmm1
leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:"cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with an invalid float so cmpeqps doesn't return as expected. If we do this on another register that happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical xor correctly zeros the register allowing the cmpeqps to work as expected. Maybe a better sequence but will push this lowering for not:
"xorps %%xmm1, %%xmm1\n\t" "cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
to resolve this bug.
jlb6740 edited a comment on issue #3327:
@alexcrichton I think I see the issue here. The abs lowering:
xorps xmm1, xmm1 cmpeqps xmm1, xmm1 psrld xmm1, 1 andps xmm0, xmm1
leaves xmm1 with:
2147483647
But this number (the largest 32-bit int) is not a float. The not instruction sequence:"cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
takes a register (in this case xmm1) and then tries to make it have the float value 1. It does this in order to then do an xorps to flip the bits. The problem is that the abs sequence has left the register xmm1 with an invalid float so cmpeqps doesn't return as expected. If we do this on another register that happens to have a valid float value then this sequence would work, but in our case it doesn't. The fix seems to be to do a logical xor on the float register before doing the cmpeqps. This logical xor correctly zeros the register allowing the cmpeqps to return as desired. Maybe a another instruction sequence exist but will push this lowering for not:
"xorps %%xmm1, %%xmm1\n\t" "cmpeqps %%xmm1, %%xmm1\n\t" "xorps %%xmm1, %%xmm0\n\t"
to resolve this bug.
jlb6740 commented on issue #3327:
@alexcrichton btw .. just read your comment on https://github.com/alexcrichton/wasmtime/commit/f7ccaf2395b4fed071e97eaf6c9d490e42638a65 and indeed the fabs/fneg lowering implements this correctly. The bnot implementation apparently didn't foresee this issue.
I created a jist here that may be useful for other bugs: https://gist.github.com/jlb6740/33e77c7f32beb99bdc392ff456fbc864
jlb6740 edited a comment on issue #3327:
@alexcrichton btw .. just read your comment on https://github.com/alexcrichton/wasmtime/commit/f7ccaf2395b4fed071e97eaf6c9d490e42638a65 and indeed the fabs/fneg lowering implements this correctly. The bnot implementation apparently didn't foresee this issue.
I created a jist here that may be useful for debugging similar bugs: https://gist.github.com/jlb6740/33e77c7f32beb99bdc392ff456fbc864
alexcrichton closed issue #3327:
Found via fuzzing this module:
(module (func (result v128) v128.const f32x4 0 0 0 0 f32x4.abs v128.not) (export "1" (func 0)) )
yields:
$ cargo run testcase0.wat --invoke 1 --enable-simd warning: using `--invoke` with a function that returns values is experimental and may break in the future 0
when it should print
u128::MAX
.The disassembly of this function is:
0000000000000000 <_wasm_function_0>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: f3 0f 6f 05 24 00 00 movdqu 0x24(%rip),%xmm0 # 30 <_wasm_function_0+0x30> b: 00 c: 0f 57 c9 xorps %xmm1,%xmm1 f: 0f c2 c9 00 cmpeqps %xmm1,%xmm1 13: 66 0f 72 d1 01 psrld $0x1,%xmm1 18: 0f 54 c1 andps %xmm1,%xmm0 1b: 0f c2 c9 00 cmpeqps %xmm1,%xmm1 1f: 0f 57 c1 xorps %xmm1,%xmm0 22: 48 89 ec mov %rbp,%rsp 25: 5d pop %rbp 26: c3 retq
I don't for sure know what's going on here with each individual instruction, but this sort of looks like a register allocator issue? I'm not sure what the second
xorps
is doing there with those registers. If this is a register allocator thing it may or may not be related to https://github.com/bytecodealliance/wasmtime/issues/3216
alexcrichton commented on issue #3327:
Fixed in https://github.com/bytecodealliance/wasmtime/pull/3517
Last updated: Jan 24 2025 at 00:11 UTC