wasmtime / PR #2397 [machinst x64]: implement packed shifts · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #2397 [machinst x64]: implement packed shifts

Wasmtime GitHub notifications bot (Nov 12 2020 at 00:12):

abrown opened PR #2397 from packed-shifts to main:

This change first modifies SyntheticAmode to allow it to refer to constants: this is helpful because SyntheticAmode and Amode are used throughout the x64 backend to refer to memory locations so constants can now fit into that scheme. (In a future PR I will modify vconst to simply lower to a Inst::XmmUnaryRmR, which accepts a SyntheticAmode, instead of using the special Inst::XmmLoadConst variant).

With that in place, I added all of the lowerings for the vector versions of ishl, sshr, and ushr. There are several interesting parts to this:

vector shifts can accept an immediate shift amount, so this must be handled in each case; if the shift amount is dynamic, then it must be moved to an XMM register

the x86 ISA does not provide i8x16 shifts, so these are lowered either using a mask table (ishl and ushr) or using packing/unpacking (sshr)

the equivalent of sshr.i64x2 is also missing in the x86 ISA so a "extract, scalar shift, insert" scheme is used (as it was in the old backend)
The issues above are documented in detail in the code.

Wasmtime GitHub notifications bot (Nov 12 2020 at 00:12):

abrown requested cfallin, bnjbvr and jlb6740 for a review on PR #2397.

Wasmtime GitHub notifications bot (Nov 12 2020 at 00:12):

abrown requested cfallin, bnjbvr and jlb6740 for a review on PR #2397.

Wasmtime GitHub notifications bot (Nov 12 2020 at 00:12):

abrown requested cfallin, bnjbvr and jlb6740 for a review on PR #2397.

Wasmtime GitHub notifications bot (Nov 12 2020 at 00:12):

abrown edited PR #2397 from packed-shifts to main:

This change first modifies SyntheticAmode to allow it to refer to constants: this is helpful because SyntheticAmode and Amode are used throughout the x64 backend to refer to memory locations so constants can now fit into that scheme. (In a future PR I will modify vconst to simply lower to a Inst::XmmUnaryRmR, which accepts a SyntheticAmode, instead of using the special Inst::XmmLoadConst variant).

With that in place, I added all of the lowerings for the vector versions of ishl, sshr, and ushr. There are several interesting parts to this:

vector shifts can accept an immediate shift amount, so this must be handled in each case; if the shift amount is dynamic, then it must be moved to an XMM register

the x86 ISA does not provide i8x16 shifts, so these are lowered either using a mask table (ishl and ushr) or using packing/unpacking (sshr)

the equivalent of sshr.i64x2 is also missing in the x86 ISA so a "extract, scalar shift, insert" scheme is used (as it was in the old backend)

The issues above are documented in detail in the code.

Wasmtime GitHub notifications bot (Nov 12 2020 at 02:04):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Nov 12 2020 at 02:04):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Nov 12 2020 at 02:04):

cfallin created PR Review Comment:

Perhaps assert!((ty == types::F32 && lane < 4) || (ty == types::F64 && lane < 1)) above to ensure this is the case?

Wasmtime GitHub notifications bot (Nov 12 2020 at 02:04):

cfallin created PR Review Comment:

Could we add some documentation here describing how these tables were computed? I think I get the general idea -- for the "low half" 8 bits of each 16-bit shifted-right value, we need to mask out the bits that migrated from the high half to the low half -- but, e.g., why are the first three values of the shift-by-1 case (second row) 0xff, rather than the first, third, fifth, ...?

Wasmtime GitHub notifications bot (Nov 12 2020 at 02:04):

cfallin created PR Review Comment:

s/then/them/

Wasmtime GitHub notifications bot (Nov 12 2020 at 02:04):

cfallin created PR Review Comment:

This is somewhat surprising to me; I would have expected the definition of ushr to be that it masks the shift amount first, so e.g. a 33-bit shift of a 32-bit lane is the same as a 1-bit shift. The definition in codegen/meta/src/shared/instructions.rs seems to indicate this as well (shift-amount s := y mod B). Does this mean we need an explicit mask first?

Wasmtime GitHub notifications bot (Nov 12 2020 at 05:32):

jlb6740 submitted PR Review.

Wasmtime GitHub notifications bot (Nov 12 2020 at 05:32):

jlb6740 created PR Review Comment:

I think this is a good catch. ushr_i32x4 lowers to SHR (unsigned divide) which is masks the count to 5 bits .. meaning the shift is 1-bit and not 33-bit. @cfallin What do you mean by an explicit mask first .. is it just the file tests that needs updating?

Wasmtime GitHub notifications bot (Nov 12 2020 at 19:54):

abrown submitted PR Review.

Wasmtime GitHub notifications bot (Nov 12 2020 at 19:54):

abrown created PR Review Comment:

Added some documentation; to make things simpler, I mask both lanes though this is not strictly necessary and what you are seeing is rustfmt sending some of the mask on to the next line.

Wasmtime GitHub notifications bot (Nov 12 2020 at 20:05):

abrown submitted PR Review.

Wasmtime GitHub notifications bot (Nov 12 2020 at 20:05):

abrown created PR Review Comment:

Huh, yeah. Good thing I added that test :big_smile:; I think I remember adding it to the old backend to remind me of this issue. code_translator.rs already does this masking so the extra masking was overkill, but the old backend was legalizing ushr to x86_psrl (and the like) so the test was correct for those ISA-specific instructions. When I refactored the test to use ushr, it doesn't match the Cranelift semantics, as you observe.

I don't think we should mask twice,once in CLIF and again in the x64 backend, and I don't feel too strongly about which solution is best:

avoid masking in code_translator.rs and mask instead in the backend, leaving the CLIF semantics as-is

leave the masking in code_translator.rs and change the CLIF semantics

Since either change might involve changes to other backends and since this currently passes the Wasm spec tests, I would be in favor of creating an issue and resolving it in a separate PR. What do you think?

Wasmtime GitHub notifications bot (Nov 12 2020 at 20:06):

abrown edited PR Review Comment.

Wasmtime GitHub notifications bot (Nov 12 2020 at 20:06):

abrown updated PR #2397 from packed-shifts to main:

This change first modifies SyntheticAmode to allow it to refer to constants: this is helpful because SyntheticAmode and Amode are used throughout the x64 backend to refer to memory locations so constants can now fit into that scheme. (In a future PR I will modify vconst to simply lower to a Inst::XmmUnaryRmR, which accepts a SyntheticAmode, instead of using the special Inst::XmmLoadConst variant).

With that in place, I added all of the lowerings for the vector versions of ishl, sshr, and ushr. There are several interesting parts to this:

vector shifts can accept an immediate shift amount, so this must be handled in each case; if the shift amount is dynamic, then it must be moved to an XMM register

the x86 ISA does not provide i8x16 shifts, so these are lowered either using a mask table (ishl and ushr) or using packing/unpacking (sshr)

the equivalent of sshr.i64x2 is also missing in the x86 ISA so a "extract, scalar shift, insert" scheme is used (as it was in the old backend)

The issues above are documented in detail in the code.

Wasmtime GitHub notifications bot (Nov 12 2020 at 20:35):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Nov 12 2020 at 20:35):

cfallin created PR Review Comment:

Hmm, interesting; so this is mainly an issue for (i) CLIF compile tests and (ii) potentially other Cranelift embedders. For the latters' sake, I think we should err on the side of implementing the CLIF semantics as currently defined, i.e., doing the masking in the backends. Happy to have this become a followup issue/PR though. Thanks for digging into it!

Wasmtime GitHub notifications bot (Nov 12 2020 at 20:36):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Nov 12 2020 at 22:21):

abrown submitted PR Review.

Wasmtime GitHub notifications bot (Nov 12 2020 at 22:21):

abrown created PR Review Comment:

Sure, I opened https://github.com/bytecodealliance/wasmtime/issues/2409.

Wasmtime GitHub notifications bot (Nov 12 2020 at 22:21):

abrown merged PR #2397.

Last updated: Apr 18 2025 at 17:03 UTC