Stream: git-wasmtime

Topic: wasmtime / issue #3095 Fix for 3089 X64 ext_mul_i8x16 has...


view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2021 at 19:41):

jlb6740 commented on issue #3095:

Not ready to merge but intended to fix #3089. Passes spec tests but spec tests but spec tests didn't catch the faulty lowering and so need improvements. Can't find tools wat2wasm tools that support extmul_high (wabt for example does not generate from a wat file) .. but will check against v8 implementation as soon as I can generate a wasm binary that has this instruction.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2021 at 20:51):

jlb6740 commented on issue #3095:

Ok ... checked by just decomposing the instruction and comparing results.
Here compared call to low with call to mul_widen_low_s and did the same for the high compared to mul_widen_high_s. Results were consistent. But was just due to incorrect register usage when inlining the widen part for the instructions.

`(module
(func $inputs (result v128 v128)
v128.const i8x16 -1 -2 3 100 124 -38 33 87 92 108 22 8 -43 -128 22 0
v128.const i8x16 -5 -2 6 10 45 -4 4 -2 0 88 92 -102 -98 83 73 54)

(func $input_a (result v128)
v128.const i8x16 -1 -2 3 100 124 -38 33 87 92 108 22 8 -43 -128 22 0)

(func $input_b (result v128)
v128.const i8x16 -5 -2 6 10 45 -4 4 -2 0 88 92 -102 -98 83 73 54)

(func (export "low") (result v128)
call $inputs
i16x8.extmul_low_i8x16_s)
(func (export "high") (result v128)
call $inputs
i16x8.extmul_high_i8x16_s)

(func (export "mul_widen_low_s") (result v128)
call $input_a
i16x8.extend_low_i8x16_s
call $input_b
i16x8.extend_low_i8x16_s
i16x8.mul)

(func (export "mul_widen_high_s") (result v128)
call $input_a
i16x8.extend_high_i8x16_s
call $input_b
i16x8.extend_high_i8x16_s
i16x8.mul)
)`

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2021 at 20:51):

jlb6740 edited a comment on issue #3095:

Ok ... checked by just decomposing the instruction and comparing results.
Here compared call to low with call to mul_widen_low_s and did the same for the high compared to mul_widen_high_s. Results were consistent. But was just due to incorrect register usage when inlining the widen part for the instructions.

`
(module
(func $inputs (result v128 v128)
v128.const i8x16 -1 -2 3 100 124 -38 33 87 92 108 22 8 -43 -128 22 0
v128.const i8x16 -5 -2 6 10 45 -4 4 -2 0 88 92 -102 -98 83 73 54)

(func $input_a (result v128)
v128.const i8x16 -1 -2 3 100 124 -38 33 87 92 108 22 8 -43 -128 22 0)

(func $input_b (result v128)
v128.const i8x16 -5 -2 6 10 45 -4 4 -2 0 88 92 -102 -98 83 73 54)

(func (export "low") (result v128)
call $inputs
i16x8.extmul_low_i8x16_s)
(func (export "high") (result v128)
call $inputs
i16x8.extmul_high_i8x16_s)

(func (export "mul_widen_low_s") (result v128)
call $input_a
i16x8.extend_low_i8x16_s
call $input_b
i16x8.extend_low_i8x16_s
i16x8.mul)

(func (export "mul_widen_high_s") (result v128)
call $input_a
i16x8.extend_high_i8x16_s
call $input_b
i16x8.extend_high_i8x16_s
i16x8.mul)
)
`

view this post on Zulip Wasmtime GitHub notifications bot (Jul 18 2021 at 20:51):

jlb6740 edited a comment on issue #3095:

Ok ... checked by just decomposing the instruction and comparing results.
Here compared call to low with call to mul_widen_low_s and did the same for the high compared to mul_widen_high_s. Results were consistent. But was just due to incorrect register usage when inlining the widen part for the instructions.

(module
(func $inputs (result v128 v128)
v128.const i8x16 -1 -2 3 100 124 -38 33 87 92 108 22 8 -43 -128 22 0
v128.const i8x16 -5 -2 6 10 45 -4 4 -2 0 88 92 -102 -98 83 73 54)

(func $input_a (result v128)
v128.const i8x16 -1 -2 3 100 124 -38 33 87 92 108 22 8 -43 -128 22 0)

(func $input_b (result v128)
v128.const i8x16 -5 -2 6 10 45 -4 4 -2 0 88 92 -102 -98 83 73 54)

(func (export "low") (result v128)
call $inputs
i16x8.extmul_low_i8x16_s)
(func (export "high") (result v128)
call $inputs
i16x8.extmul_high_i8x16_s)

(func (export "mul_widen_low_s") (result v128)
call $input_a
i16x8.extend_low_i8x16_s
call $input_b
i16x8.extend_low_i8x16_s
i16x8.mul)

(func (export "mul_widen_high_s") (result v128)
call $input_a
i16x8.extend_high_i8x16_s
call $input_b
i16x8.extend_high_i8x16_s
i16x8.mul)
)

view this post on Zulip Wasmtime GitHub notifications bot (Jul 27 2021 at 03:06):

jlb6740 commented on issue #3095:

LGTM; thanks for the fix!

Regarding extra tests: yes, it might be nice to add some more tests here, and actually I'd prefer to get coverage via our own tests and merge this sooner (since we know we are generating incorrect code) rather than wait for improvements to the upstream spec-test suite and updating that here.

IMHO a runtest might be slightly preferable over a generated-code test; especially with SIMD and FP tests, where the goal is to ensure correctness rather than to see that we're successfully using a particular instruction, I'd prefer not to overfit on particular golden machine code (which may change later with other compiler updates, e.g. regalloc or other opts).

Thanks @cfallin. I'll merge this then and we can focus on a runtest as a separate commit.


Last updated: Jan 24 2025 at 00:11 UTC