Stream: git-wasmtime

Topic: wasmtime / PR #2397 [machinst x64]: implement packed shifts


view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 00:12):

abrown opened PR #2397 from packed-shifts to main:

This change first modifies SyntheticAmode to allow it to refer to constants: this is helpful because SyntheticAmode and Amode are used throughout the x64 backend to refer to memory locations so constants can now fit into that scheme. (In a future PR I will modify vconst to simply lower to a Inst::XmmUnaryRmR, which accepts a SyntheticAmode, instead of using the special Inst::XmmLoadConst variant).

With that in place, I added all of the lowerings for the vector versions of ishl, sshr, and ushr. There are several interesting parts to this:

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 00:12):

abrown requested cfallin, bnjbvr and jlb6740 for a review on PR #2397.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 00:12):

abrown requested cfallin, bnjbvr and jlb6740 for a review on PR #2397.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 00:12):

abrown requested cfallin, bnjbvr and jlb6740 for a review on PR #2397.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 00:12):

abrown edited PR #2397 from packed-shifts to main:

This change first modifies SyntheticAmode to allow it to refer to constants: this is helpful because SyntheticAmode and Amode are used throughout the x64 backend to refer to memory locations so constants can now fit into that scheme. (In a future PR I will modify vconst to simply lower to a Inst::XmmUnaryRmR, which accepts a SyntheticAmode, instead of using the special Inst::XmmLoadConst variant).

With that in place, I added all of the lowerings for the vector versions of ishl, sshr, and ushr. There are several interesting parts to this:

The issues above are documented in detail in the code.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 02:04):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 02:04):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 02:04):

cfallin created PR Review Comment:

Perhaps assert!((ty == types::F32 && lane < 4) || (ty == types::F64 && lane < 1)) above to ensure this is the case?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 02:04):

cfallin created PR Review Comment:

Could we add some documentation here describing how these tables were computed? I think I get the general idea -- for the "low half" 8 bits of each 16-bit shifted-right value, we need to mask out the bits that migrated from the high half to the low half -- but, e.g., why are the first three values of the shift-by-1 case (second row) 0xff, rather than the first, third, fifth, ...?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 02:04):

cfallin created PR Review Comment:

s/then/them/

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 02:04):

cfallin created PR Review Comment:

This is somewhat surprising to me; I would have expected the definition of ushr to be that it masks the shift amount first, so e.g. a 33-bit shift of a 32-bit lane is the same as a 1-bit shift. The definition in codegen/meta/src/shared/instructions.rs seems to indicate this as well (shift-amount s := y mod B). Does this mean we need an explicit mask first?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 05:32):

jlb6740 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 05:32):

jlb6740 created PR Review Comment:

I think this is a good catch. ushr_i32x4 lowers to SHR (unsigned divide) which is masks the count to 5 bits .. meaning the shift is 1-bit and not 33-bit. @cfallin What do you mean by an explicit mask first .. is it just the file tests that needs updating?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 19:54):

abrown submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 19:54):

abrown created PR Review Comment:

Added some documentation; to make things simpler, I mask both lanes though this is not strictly necessary and what you are seeing is rustfmt sending some of the mask on to the next line.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 20:05):

abrown submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 20:05):

abrown created PR Review Comment:

Huh, yeah. Good thing I added that test :big_smile:; I think I remember adding it to the old backend to remind me of this issue. code_translator.rs already does this masking so the extra masking was overkill, but the old backend was legalizing ushr to x86_psrl (and the like) so the test was correct for those ISA-specific instructions. When I refactored the test to use ushr, it doesn't match the Cranelift semantics, as you observe.

I don't think we should mask twice,once in CLIF and again in the x64 backend, and I don't feel too strongly about which solution is best:

Since either change might involve changes to other backends and since this currently passes the Wasm spec tests, I would be in favor of creating an issue and resolving it in a separate PR. What do you think?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 20:06):

abrown edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 20:06):

abrown updated PR #2397 from packed-shifts to main:

This change first modifies SyntheticAmode to allow it to refer to constants: this is helpful because SyntheticAmode and Amode are used throughout the x64 backend to refer to memory locations so constants can now fit into that scheme. (In a future PR I will modify vconst to simply lower to a Inst::XmmUnaryRmR, which accepts a SyntheticAmode, instead of using the special Inst::XmmLoadConst variant).

With that in place, I added all of the lowerings for the vector versions of ishl, sshr, and ushr. There are several interesting parts to this:

The issues above are documented in detail in the code.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 20:35):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 20:35):

cfallin created PR Review Comment:

Hmm, interesting; so this is mainly an issue for (i) CLIF compile tests and (ii) potentially other Cranelift embedders. For the latters' sake, I think we should err on the side of implementing the CLIF semantics as currently defined, i.e., doing the masking in the backends. Happy to have this become a followup issue/PR though. Thanks for digging into it!

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 20:36):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 22:21):

abrown submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 22:21):

abrown created PR Review Comment:

Sure, I opened https://github.com/bytecodealliance/wasmtime/issues/2409.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 12 2020 at 22:21):

abrown merged PR #2397.


Last updated: Dec 23 2024 at 12:05 UTC