wasmtime / PR #3095 Fix for 3089 X64 ext_mul_i8x16 has in... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #3095 Fix for 3089 X64 ext_mul_i8x16 has in...

Wasmtime GitHub notifications bot (Jul 18 2021 at 19:36):

jlb6740 opened PR #3095 from fix-x64-ext_mul_i8x16 to main:

Wasmtime GitHub notifications bot (Jul 18 2021 at 20:51):

jlb6740 has marked PR #3095 as ready for review.

Wasmtime GitHub notifications bot (Jul 18 2021 at 20:52):

jlb6740 assigned PR #3095 to alexcrichton.

Wasmtime GitHub notifications bot (Jul 18 2021 at 20:52):

jlb6740 assigned PR #3095 to alexcrichton.

Wasmtime GitHub notifications bot (Jul 19 2021 at 14:41):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2021 at 14:41):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2021 at 14:41):

alexcrichton created PR review comment:

This may be my lack of knowledge about the x64 backend, but the above instruction

Inst::gen_move(tmp_reg, lhs, output_ty)

has the destination as the first parameter (tmp_reg I believe) and the parameters to the instruction as the second (lhs).

Does that mean that this should be dst as the first parameter since that should be the destination? (or does xmm_rm_r behave differently in that the destination is the second parameter?)

Wasmtime GitHub notifications bot (Jul 19 2021 at 14:41):

alexcrichton created PR review comment:

Looking at the pattern matching and such, it looks like the codegen here is actually the same as-if the specialized pattern matching in this function weren't present? Basically it looks like this is generating palignr; pmovsxbw for each lhs/rhs, and then it's pmullw for the result. That looks like the same codegen as if the component operations of i16x8.extmul_high_i8x16_s were each translated individually.

If that's the case is it necessary to have a specialized pattern-match here for x64? E.g. were the wasm instructions primarily for other platforms which have a native instruction for the operation?

Wasmtime GitHub notifications bot (Jul 19 2021 at 14:41):

alexcrichton created PR review comment:

Also, how come a temp register is used? It seems like the goal of palignr is to get the high 8 values into the low 8 values, and then the pmovsxbw will sign-extend the low values into 16-bit values, but looking at the intructions it seems like that can all be done with the same register?

Wasmtime GitHub notifications bot (Jul 19 2021 at 14:41):

alexcrichton created PR review comment:

Hm although given xmm_rm_r_imm it looks like the destination isn't the first argument, so I may just not know the backend here in this regard.

Wasmtime GitHub notifications bot (Jul 19 2021 at 15:27):

jlb6740 submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2021 at 15:27):

jlb6740 created PR review comment:

This may be my lack of knowledge about the x64 backend, but the above instruction

Inst::gen_move(tmp_reg, lhs, output_ty)

has the destination as the first parameter (tmp_reg I believe) and the parameters to the instruction as the second (lhs).

Does that mean that this should be dst as the first parameter since that should be the destination? (or does xmm_rm_r behave differently in that the destination is the second parameter?)

Yes .. good observation. gen_move is unique in this way by having the first operand as the destination. Not sure the history behind that but it is something that perhaps should be refactored to avoid confusion.

Wasmtime GitHub notifications bot (Jul 19 2021 at 15:37):

jlb6740 submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2021 at 15:37):

jlb6740 created PR review comment:

Looking at the pattern matching and such, it looks like the codegen here is actually the same as-if the specialized pattern matching in this function weren't present? Basically it looks like this is generating palignr; pmovsxbw for each lhs/rhs, and then it's pmullw for the result. That looks like the same codegen as if the component operations of i16x8.extmul_high_i8x16_s were each translated individually.

If that's the case is it necessary to have a specialized pattern-match here for x64? E.g. were the wasm instructions primarily for other platforms which have a native instruction for the operation?

@alexcrichton Another good observation. Yes, obviously this instruction is just the combination of a 3 instruction sequence written as one. For arm these 3 instructions can be lowered efficiently more easily with this 3 instruction combination for all 3 variants, i.e. (i16x8.extmul_high_i8x16_s, i32x4.extmul_high_i16x8_s, i64x2.extmul_high_i32x4_s) so the lowering is optimal but for x64 more optimal lowering is only available for a few of the variants. Namely i16x8.extmul_high_i8x16_s and i16x8.extmul_high_i8x16_u does not have a more optimal lowering so I end up just inlining the codegen here. In fact the mistake was simply a copy paste error in that inlining, not seeing a way to print the final codegen to triple-check, and then not having adequate to testing to catch the mistake. For the other variants the specialized codegen I think saves an instruction or two (though I haven't counted explicitly).

Wasmtime GitHub notifications bot (Jul 19 2021 at 17:50):

jlb6740 updated PR #3095 (assigned to alexcrichton) from fix-x64-ext_mul_i8x16 to main.

Wasmtime GitHub notifications bot (Jul 19 2021 at 18:14):

alexcrichton created PR review comment:

Ah ok! Do you think it'd be possible to structure the code such that it falls-through to the normal lowerings for non-specialized implementations, but specialized implementaitons (the other widths > 8) still have pattern-matched codegen? Either way seems fine to me though

Wasmtime GitHub notifications bot (Jul 19 2021 at 18:14):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2021 at 18:49):

jlb6740 submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2021 at 18:49):

jlb6740 created PR review comment:

Hi @alexcrichton, I think so but I am not quite confident how to write this ... particular when there are two input registers here. What you gain is the normal benefit of code reuse but the lowering should be no more efficient than what you see here and may introduce a few more jumps. If you are ok with either way I'd prefer to just merge this fix now and refactor after a discussion with @cfallin.

Wasmtime GitHub notifications bot (Jul 19 2021 at 19:32):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2021 at 19:32):

alexcrichton created PR review comment:

Oh sure yeah, I think it's fine to land this now and refactor later (although I think some tests here would still be good)

Wasmtime GitHub notifications bot (Jul 26 2021 at 21:11):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Jul 26 2021 at 21:11):

cfallin created PR review comment:

FWIW I'm fine with this minor code duplication for now; a bigger refactor later when we have a more systematic way of writing lowering pattern-matching seems reasonable.

Wasmtime GitHub notifications bot (Jul 26 2021 at 21:15):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Jul 26 2021 at 23:48):

jlb6740 submitted PR review.

Wasmtime GitHub notifications bot (Jul 26 2021 at 23:48):

jlb6740 created PR review comment:

Hi @alexcrichton, did you have in many any particular type of test in mind? There is spec tests, file tests, unit tests, and in the future fuzz testing but for proving correctness I think the spec tests have been what we've used. The unit tests help confirm encodings as long as we are confirming with as or some other tool.

Wasmtime GitHub notifications bot (Jul 27 2021 at 03:04):

jlb6740 deleted PR review comment.

Wasmtime GitHub notifications bot (Jul 27 2021 at 03:06):

jlb6740 merged PR #3095.

Last updated: Dec 06 2025 at 06:05 UTC