Stream: git-wasmtime

Topic: wasmtime / PR #1377 Implement I8x16 shift for x86 SIMD


view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2020 at 02:14):

abrown opened PR #1377 from i8x16-shift to master:

This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:

With this functionality in place, I then legalized ushr.i8x16 to the equivalent of these seven instructions:

v0 = band_imm [shift index], 7  # this is pre-existent in code_translator.rs
v1 = bitcast i64x2 v0                # this moves the shift index into an XMM register (could be scalar_to_vector)
v2 = x86_psrl v1                       # I'm eliding some raw_bitcasts around this for clarity
v3 = ishl_imm [shift index], 4     # this gets the index into the mask
v4 = const_addr [mask ref]        # a RIP-relative LEA
v5 = load_complex v3, v4          # MOVUPS the mask
v6 = band v2, v5                       # mask off the bits that would have been zeroed in a true ushr.i8x16

4a9a4cb is actually not directly related to this PR but is useful for the benchmark I am attempting to run.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2020 at 02:15):

abrown updated PR #1377 from i8x16-shift to master:

This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:

With this functionality in place, I then legalized ushr.i8x16 to the equivalent of these seven instructions:

v0 = band_imm [shift index], 7  # this is pre-existent in code_translator.rs
v1 = bitcast i64x2 v0                # this moves the shift index into an XMM register (could be scalar_to_vector)
v2 = x86_psrl v1                       # I'm eliding some raw_bitcasts around this for clarity
v3 = ishl_imm [shift index], 4     # this gets the index into the mask
v4 = const_addr [mask ref]        # a RIP-relative LEA
v5 = load_complex v3, v4          # MOVUPS the mask
v6 = band v2, v5                       # mask off the bits that would have been zeroed in a true ushr.i8x16

4a9a4cb is actually not directly related to this PR but is useful for the benchmark I am attempting to run.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2020 at 02:16):

abrown edited PR #1377 from i8x16-shift to master:

This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:

With this functionality in place, I then legalized ushr.i8x16 to the equivalent of these seven instructions:

v0 = band_imm [shift index], 7  # this is pre-existent in code_translator.rs
v1 = bitcast i64x2 v0                # this moves the shift index into an XMM register (could be scalar_to_vector)
v2 = x86_psrl v1                       # I'm eliding some raw_bitcasts around this for clarity
v3 = ishl_imm [shift index], 4     # this gets the index into the mask
v4 = const_addr [mask ref]        # a RIP-relative LEA
v5 = load_complex v3, v4          # MOVUPS the mask
v6 = band v2, v5                       # mask off the bits that would have been zeroed in a true ushr.i8x16

2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2020 at 03:07):

abrown edited PR #1377 from i8x16-shift to master:

This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:

With this functionality in place, I then legalized ushr.i8x16 to the equivalent of these seven instructions:

v0 = band_imm [shift index], 7  # this is pre-existent in code_translator.rs
v1 = bitcast i64x2 v0           # this moves the shift index into an XMM register (could be scalar_to_vector)
v2 = x86_psrl v1                # I'm eliding some raw_bitcasts around this for clarity
v3 = ishl_imm [shift index], 4  # this gets the index into the mask
v4 = const_addr [mask ref]      # a RIP-relative LEA
v5 = load_complex v3, v4        # MOVUPS the mask
v6 = band v2, v5                # mask off the bits that would have been zeroed in a true ushr.i8x16```

2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.


- [ ] This has been discussed in issue #..., or if not, please tell us why
  here.

- [ ] A short description of what this does, why it is needed; if the
  description becomes long, the matter should probably be discussed in an issue
  first.

- [ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
  If you don't know who could review this, please indicate so. The list of
  suggested reviewers on the right can help you.

Please ensure all communication adheres to the [code of
conduct](https://github.com/bytecodealliance/wasmtime/blob/master/CODE_OF_CONDUCT.md).
-->

~~~

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2020 at 03:07):

abrown edited PR #1377 from i8x16-shift to master:

This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:

With this functionality in place, I then legalized ushr.i8x16 to the equivalent of these seven instructions:

v0 = band_imm [shift index], 7  # this is pre-existent in code_translator.rs
v1 = bitcast i64x2 v0           # this moves the shift index into an XMM register (could be scalar_to_vector)
v2 = x86_psrl v1                # I'm eliding some raw_bitcasts around this for clarity
v3 = ishl_imm [shift index], 4  # this gets the index into the mask
v4 = const_addr [mask ref]      # a RIP-relative LEA
v5 = load_complex v3, v4        # MOVUPS the mask
v6 = band v2, v5                # mask off the bits that would have been zeroed in a true ushr.i8x16```

2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2020 at 03:08):

abrown requested sunfishcode for a review on PR #1377.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 24 2020 at 22:26):

abrown updated PR #1377 from i8x16-shift to master:

This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:

With this functionality in place, I then legalized ushr.i8x16 to the equivalent of these seven instructions:

v0 = band_imm [shift index], 7  # this is pre-existent in code_translator.rs
v1 = bitcast i64x2 v0           # this moves the shift index into an XMM register (could be scalar_to_vector)
v2 = x86_psrl v1                # I'm eliding some raw_bitcasts around this for clarity
v3 = ishl_imm [shift index], 4  # this gets the index into the mask
v4 = const_addr [mask ref]      # a RIP-relative LEA
v5 = load_complex v3, v4        # MOVUPS the mask
v6 = band v2, v5                # mask off the bits that would have been zeroed in a true ushr.i8x16```

2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Mar 24 2020 at 22:42):

abrown updated PR #1377 from i8x16-shift to master:

This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:

With this functionality in place, I then legalized ushr.i8x16 to the equivalent of these seven instructions:

v0 = band_imm [shift index], 7  # this is pre-existent in code_translator.rs
v1 = bitcast i64x2 v0           # this moves the shift index into an XMM register (could be scalar_to_vector)
v2 = x86_psrl v1                # I'm eliding some raw_bitcasts around this for clarity
v3 = ishl_imm [shift index], 4  # this gets the index into the mask
v4 = const_addr [mask ref]      # a RIP-relative LEA
v5 = load_complex v3, v4        # MOVUPS the mask
v6 = band v2, v5                # mask off the bits that would have been zeroed in a true ushr.i8x16```

2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Mar 24 2020 at 22:58):

abrown submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 24 2020 at 22:58):

abrown created PR Review Comment:

TODO: I just need to remove this.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 24 2020 at 23:01):

abrown submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 24 2020 at 23:01):

abrown created PR Review Comment:

TODO: remove, included in simd-bitwise-run.clif

view this post on Zulip Wasmtime GitHub notifications bot (Mar 24 2020 at 23:01):

abrown submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 24 2020 at 23:01):

abrown created PR Review Comment:

TODO: rename to ushr_i8x16

view this post on Zulip Wasmtime GitHub notifications bot (Mar 25 2020 at 22:22):

abrown updated PR #1377 from i8x16-shift to master:

This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:

With this functionality in place, I then legalized ushr.i8x16 to the equivalent of these seven instructions:

v0 = band_imm [shift index], 7  # this is pre-existent in code_translator.rs
v1 = bitcast i64x2 v0           # this moves the shift index into an XMM register (could be scalar_to_vector)
v2 = x86_psrl v1                # I'm eliding some raw_bitcasts around this for clarity
v3 = ishl_imm [shift index], 4  # this gets the index into the mask
v4 = const_addr [mask ref]      # a RIP-relative LEA
v5 = load_complex v3, v4        # MOVUPS the mask
v6 = band v2, v5                # mask off the bits that would have been zeroed in a true ushr.i8x16```

2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 03 2020 at 00:22):

abrown updated PR #1377 from i8x16-shift to master:

This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:

With this functionality in place, I then legalized ushr.i8x16 to the equivalent of these seven instructions:

v0 = band_imm [shift index], 7  # this is pre-existent in code_translator.rs
v1 = bitcast i64x2 v0           # this moves the shift index into an XMM register (could be scalar_to_vector)
v2 = x86_psrl v1                # I'm eliding some raw_bitcasts around this for clarity
v3 = ishl_imm [shift index], 4  # this gets the index into the mask
v4 = const_addr [mask ref]      # a RIP-relative LEA
v5 = load_complex v3, v4        # MOVUPS the mask
v6 = band v2, v5                # mask off the bits that would have been zeroed in a true ushr.i8x16```

2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 03 2020 at 18:26):

abrown updated PR #1377 from i8x16-shift to master:

This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:

With this functionality in place, I then legalized ushr.i8x16 to the equivalent of these seven instructions:

v0 = band_imm [shift index], 7  # this is pre-existent in code_translator.rs
v1 = bitcast i64x2 v0           # this moves the shift index into an XMM register (could be scalar_to_vector)
v2 = x86_psrl v1                # I'm eliding some raw_bitcasts around this for clarity
v3 = ishl_imm [shift index], 4  # this gets the index into the mask
v4 = const_addr [mask ref]      # a RIP-relative LEA
v5 = load_complex v3, v4        # MOVUPS the mask
v6 = band v2, v5                # mask off the bits that would have been zeroed in a true ushr.i8x16```

2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2020 at 16:39):

abrown updated PR #1377 from i8x16-shift to master:

This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:

With this functionality in place, I then legalized ushr.i8x16 to the equivalent of these seven instructions:

v0 = band_imm [shift index], 7  # this is pre-existent in code_translator.rs
v1 = bitcast i64x2 v0           # this moves the shift index into an XMM register (could be scalar_to_vector)
v2 = x86_psrl v1                # I'm eliding some raw_bitcasts around this for clarity
v3 = ishl_imm [shift index], 4  # this gets the index into the mask
v4 = const_addr [mask ref]      # a RIP-relative LEA
v5 = load_complex v3, v4        # MOVUPS the mask
v6 = band v2, v5                # mask off the bits that would have been zeroed in a true ushr.i8x16```

2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2020 at 17:35):

abrown updated PR #1377 from i8x16-shift to master:

This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:

With this functionality in place, I then legalized ushr.i8x16 to the equivalent of these seven instructions:

v0 = band_imm [shift index], 7  # this is pre-existent in code_translator.rs
v1 = bitcast i64x2 v0           # this moves the shift index into an XMM register (could be scalar_to_vector)
v2 = x86_psrl v1                # I'm eliding some raw_bitcasts around this for clarity
v3 = ishl_imm [shift index], 4  # this gets the index into the mask
v4 = const_addr [mask ref]      # a RIP-relative LEA
v5 = load_complex v3, v4        # MOVUPS the mask
v6 = band v2, v5                # mask off the bits that would have been zeroed in a true ushr.i8x16```

2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 17 2020 at 18:52):

sunfishcode submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 17 2020 at 18:59):

abrown merged PR #1377.


Last updated: Jan 24 2025 at 00:11 UTC