jlb6740 commented on issue #3095:
Not ready to merge but intended to fix #3089. Passes spec tests but spec tests but spec tests didn't catch the faulty lowering and so need improvements. Can't find tools wat2wasm tools that support extmul_high (wabt for example does not generate from a wat file) .. but will check against v8 implementation as soon as I can generate a wasm binary that has this instruction.
jlb6740 commented on issue #3095:
Ok ... checked by just decomposing the instruction and comparing results.
Here compared call to low with call to mul_widen_low_s and did the same for the high compared to mul_widen_high_s. Results were consistent. But was just due to incorrect register usage when inlining the widen part for the instructions.`(module
(func $inputs (result v128 v128)
v128.const i8x16 -1 -2 3 100 124 -38 33 87 92 108 22 8 -43 -128 22 0
v128.const i8x16 -5 -2 6 10 45 -4 4 -2 0 88 92 -102 -98 83 73 54)(func $input_a (result v128)
v128.const i8x16 -1 -2 3 100 124 -38 33 87 92 108 22 8 -43 -128 22 0)(func $input_b (result v128)
v128.const i8x16 -5 -2 6 10 45 -4 4 -2 0 88 92 -102 -98 83 73 54)(func (export "low") (result v128)
call $inputs
i16x8.extmul_low_i8x16_s)
(func (export "high") (result v128)
call $inputs
i16x8.extmul_high_i8x16_s)(func (export "mul_widen_low_s") (result v128)
call $input_a
i16x8.extend_low_i8x16_s
call $input_b
i16x8.extend_low_i8x16_s
i16x8.mul)(func (export "mul_widen_high_s") (result v128)
call $input_a
i16x8.extend_high_i8x16_s
call $input_b
i16x8.extend_high_i8x16_s
i16x8.mul)
)`
jlb6740 edited a comment on issue #3095:
Ok ... checked by just decomposing the instruction and comparing results.
Here compared call to low with call to mul_widen_low_s and did the same for the high compared to mul_widen_high_s. Results were consistent. But was just due to incorrect register usage when inlining the widen part for the instructions.`
(module
(func $inputs (result v128 v128)
v128.const i8x16 -1 -2 3 100 124 -38 33 87 92 108 22 8 -43 -128 22 0
v128.const i8x16 -5 -2 6 10 45 -4 4 -2 0 88 92 -102 -98 83 73 54)(func $input_a (result v128)
v128.const i8x16 -1 -2 3 100 124 -38 33 87 92 108 22 8 -43 -128 22 0)(func $input_b (result v128)
v128.const i8x16 -5 -2 6 10 45 -4 4 -2 0 88 92 -102 -98 83 73 54)(func (export "low") (result v128)
call $inputs
i16x8.extmul_low_i8x16_s)
(func (export "high") (result v128)
call $inputs
i16x8.extmul_high_i8x16_s)(func (export "mul_widen_low_s") (result v128)
call $input_a
i16x8.extend_low_i8x16_s
call $input_b
i16x8.extend_low_i8x16_s
i16x8.mul)(func (export "mul_widen_high_s") (result v128)
call $input_a
i16x8.extend_high_i8x16_s
call $input_b
i16x8.extend_high_i8x16_s
i16x8.mul)
)
`
jlb6740 edited a comment on issue #3095:
Ok ... checked by just decomposing the instruction and comparing results.
Here compared call to low with call to mul_widen_low_s and did the same for the high compared to mul_widen_high_s. Results were consistent. But was just due to incorrect register usage when inlining the widen part for the instructions.(module
(func $inputs (result v128 v128)
v128.const i8x16 -1 -2 3 100 124 -38 33 87 92 108 22 8 -43 -128 22 0
v128.const i8x16 -5 -2 6 10 45 -4 4 -2 0 88 92 -102 -98 83 73 54)(func $input_a (result v128)
v128.const i8x16 -1 -2 3 100 124 -38 33 87 92 108 22 8 -43 -128 22 0)(func $input_b (result v128)
v128.const i8x16 -5 -2 6 10 45 -4 4 -2 0 88 92 -102 -98 83 73 54)(func (export "low") (result v128)
call $inputs
i16x8.extmul_low_i8x16_s)
(func (export "high") (result v128)
call $inputs
i16x8.extmul_high_i8x16_s)(func (export "mul_widen_low_s") (result v128)
call $input_a
i16x8.extend_low_i8x16_s
call $input_b
i16x8.extend_low_i8x16_s
i16x8.mul)(func (export "mul_widen_high_s") (result v128)
call $input_a
i16x8.extend_high_i8x16_s
call $input_b
i16x8.extend_high_i8x16_s
i16x8.mul)
)
jlb6740 commented on issue #3095:
LGTM; thanks for the fix!
Regarding extra tests: yes, it might be nice to add some more tests here, and actually I'd prefer to get coverage via our own tests and merge this sooner (since we know we are generating incorrect code) rather than wait for improvements to the upstream spec-test suite and updating that here.
IMHO a runtest might be slightly preferable over a generated-code test; especially with SIMD and FP tests, where the goal is to ensure correctness rather than to see that we're successfully using a particular instruction, I'd prefer not to overfit on particular golden machine code (which may change later with other compiler updates, e.g. regalloc or other opts).
Thanks @cfallin. I'll merge this then and we can focus on a runtest as a separate commit.
Last updated: Jan 24 2025 at 00:11 UTC