abrown opened PR #1377 from i8x16-shift
to master
:
This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:
- Make constants declarable in the function preamble:
const42 = 0x.....
- Add a
const_addr
instruction in order to get the base address of a constant in the constant poolWith this functionality in place, I then legalized
ushr.i8x16
to the equivalent of these seven instructions:v0 = band_imm [shift index], 7 # this is pre-existent in code_translator.rs v1 = bitcast i64x2 v0 # this moves the shift index into an XMM register (could be scalar_to_vector) v2 = x86_psrl v1 # I'm eliding some raw_bitcasts around this for clarity v3 = ishl_imm [shift index], 4 # this gets the index into the mask v4 = const_addr [mask ref] # a RIP-relative LEA v5 = load_complex v3, v4 # MOVUPS the mask v6 = band v2, v5 # mask off the bits that would have been zeroed in a true ushr.i8x164a9a4cb is actually not directly related to this PR but is useful for the benchmark I am attempting to run.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
abrown updated PR #1377 from i8x16-shift
to master
:
This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:
- Make constants declarable in the function preamble:
const42 = 0x.....
- Add a
const_addr
instruction in order to get the base address of a constant in the constant poolWith this functionality in place, I then legalized
ushr.i8x16
to the equivalent of these seven instructions:v0 = band_imm [shift index], 7 # this is pre-existent in code_translator.rs v1 = bitcast i64x2 v0 # this moves the shift index into an XMM register (could be scalar_to_vector) v2 = x86_psrl v1 # I'm eliding some raw_bitcasts around this for clarity v3 = ishl_imm [shift index], 4 # this gets the index into the mask v4 = const_addr [mask ref] # a RIP-relative LEA v5 = load_complex v3, v4 # MOVUPS the mask v6 = band v2, v5 # mask off the bits that would have been zeroed in a true ushr.i8x164a9a4cb is actually not directly related to this PR but is useful for the benchmark I am attempting to run.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
abrown edited PR #1377 from i8x16-shift
to master
:
This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:
- Make constants declarable in the function preamble:
const42 = 0x.....
- Add a
const_addr
instruction in order to get the base address of a constant in the constant poolWith this functionality in place, I then legalized
ushr.i8x16
to the equivalent of these seven instructions:v0 = band_imm [shift index], 7 # this is pre-existent in code_translator.rs v1 = bitcast i64x2 v0 # this moves the shift index into an XMM register (could be scalar_to_vector) v2 = x86_psrl v1 # I'm eliding some raw_bitcasts around this for clarity v3 = ishl_imm [shift index], 4 # this gets the index into the mask v4 = const_addr [mask ref] # a RIP-relative LEA v5 = load_complex v3, v4 # MOVUPS the mask v6 = band v2, v5 # mask off the bits that would have been zeroed in a true ushr.i8x162f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
abrown edited PR #1377 from i8x16-shift
to master
:
This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:
- Make constants declarable in the function preamble:
const42 = 0x.....
- Add a
const_addr
instruction in order to get the base address of a constant in the constant poolWith this functionality in place, I then legalized
ushr.i8x16
to the equivalent of these seven instructions:v0 = band_imm [shift index], 7 # this is pre-existent in code_translator.rs v1 = bitcast i64x2 v0 # this moves the shift index into an XMM register (could be scalar_to_vector) v2 = x86_psrl v1 # I'm eliding some raw_bitcasts around this for clarity v3 = ishl_imm [shift index], 4 # this gets the index into the mask v4 = const_addr [mask ref] # a RIP-relative LEA v5 = load_complex v3, v4 # MOVUPS the mask v6 = band v2, v5 # mask off the bits that would have been zeroed in a true ushr.i8x16``` 2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run. <!-- Please ensure that the following steps are all taken care of before submitting the PR. - [ ] This has been discussed in issue #..., or if not, please tell us why here. - [ ] A short description of what this does, why it is needed; if the description becomes long, the matter should probably be discussed in an issue first. - [ ] This PR contains test cases, if meaningful. - [ ] A reviewer from the core maintainer team has been assigned for this PR. If you don't know who could review this, please indicate so. The list of suggested reviewers on the right can help you. Please ensure all communication adheres to the [code of conduct](https://github.com/bytecodealliance/wasmtime/blob/master/CODE_OF_CONDUCT.md). --> ~~~
abrown edited PR #1377 from i8x16-shift
to master
:
This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:
- Make constants declarable in the function preamble:
const42 = 0x.....
- Add a
const_addr
instruction in order to get the base address of a constant in the constant poolWith this functionality in place, I then legalized
ushr.i8x16
to the equivalent of these seven instructions:v0 = band_imm [shift index], 7 # this is pre-existent in code_translator.rs v1 = bitcast i64x2 v0 # this moves the shift index into an XMM register (could be scalar_to_vector) v2 = x86_psrl v1 # I'm eliding some raw_bitcasts around this for clarity v3 = ishl_imm [shift index], 4 # this gets the index into the mask v4 = const_addr [mask ref] # a RIP-relative LEA v5 = load_complex v3, v4 # MOVUPS the mask v6 = band v2, v5 # mask off the bits that would have been zeroed in a true ushr.i8x16```2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
abrown requested sunfishcode for a review on PR #1377.
abrown updated PR #1377 from i8x16-shift
to master
:
This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:
- Make constants declarable in the function preamble:
const42 = 0x.....
- Add a
const_addr
instruction in order to get the base address of a constant in the constant poolWith this functionality in place, I then legalized
ushr.i8x16
to the equivalent of these seven instructions:v0 = band_imm [shift index], 7 # this is pre-existent in code_translator.rs v1 = bitcast i64x2 v0 # this moves the shift index into an XMM register (could be scalar_to_vector) v2 = x86_psrl v1 # I'm eliding some raw_bitcasts around this for clarity v3 = ishl_imm [shift index], 4 # this gets the index into the mask v4 = const_addr [mask ref] # a RIP-relative LEA v5 = load_complex v3, v4 # MOVUPS the mask v6 = band v2, v5 # mask off the bits that would have been zeroed in a true ushr.i8x16```2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
abrown updated PR #1377 from i8x16-shift
to master
:
This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:
- Make constants declarable in the function preamble:
const42 = 0x.....
- Add a
const_addr
instruction in order to get the base address of a constant in the constant poolWith this functionality in place, I then legalized
ushr.i8x16
to the equivalent of these seven instructions:v0 = band_imm [shift index], 7 # this is pre-existent in code_translator.rs v1 = bitcast i64x2 v0 # this moves the shift index into an XMM register (could be scalar_to_vector) v2 = x86_psrl v1 # I'm eliding some raw_bitcasts around this for clarity v3 = ishl_imm [shift index], 4 # this gets the index into the mask v4 = const_addr [mask ref] # a RIP-relative LEA v5 = load_complex v3, v4 # MOVUPS the mask v6 = band v2, v5 # mask off the bits that would have been zeroed in a true ushr.i8x16```2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
abrown submitted PR Review.
abrown created PR Review Comment:
TODO: I just need to remove this.
abrown submitted PR Review.
abrown created PR Review Comment:
TODO: remove, included in
simd-bitwise-run.clif
abrown submitted PR Review.
abrown created PR Review Comment:
TODO: rename to
ushr_i8x16
abrown updated PR #1377 from i8x16-shift
to master
:
This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:
- Make constants declarable in the function preamble:
const42 = 0x.....
- Add a
const_addr
instruction in order to get the base address of a constant in the constant poolWith this functionality in place, I then legalized
ushr.i8x16
to the equivalent of these seven instructions:v0 = band_imm [shift index], 7 # this is pre-existent in code_translator.rs v1 = bitcast i64x2 v0 # this moves the shift index into an XMM register (could be scalar_to_vector) v2 = x86_psrl v1 # I'm eliding some raw_bitcasts around this for clarity v3 = ishl_imm [shift index], 4 # this gets the index into the mask v4 = const_addr [mask ref] # a RIP-relative LEA v5 = load_complex v3, v4 # MOVUPS the mask v6 = band v2, v5 # mask off the bits that would have been zeroed in a true ushr.i8x16```2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
abrown updated PR #1377 from i8x16-shift
to master
:
This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:
- Make constants declarable in the function preamble:
const42 = 0x.....
- Add a
const_addr
instruction in order to get the base address of a constant in the constant poolWith this functionality in place, I then legalized
ushr.i8x16
to the equivalent of these seven instructions:v0 = band_imm [shift index], 7 # this is pre-existent in code_translator.rs v1 = bitcast i64x2 v0 # this moves the shift index into an XMM register (could be scalar_to_vector) v2 = x86_psrl v1 # I'm eliding some raw_bitcasts around this for clarity v3 = ishl_imm [shift index], 4 # this gets the index into the mask v4 = const_addr [mask ref] # a RIP-relative LEA v5 = load_complex v3, v4 # MOVUPS the mask v6 = band v2, v5 # mask off the bits that would have been zeroed in a true ushr.i8x16```2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
abrown updated PR #1377 from i8x16-shift
to master
:
This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:
- Make constants declarable in the function preamble:
const42 = 0x.....
- Add a
const_addr
instruction in order to get the base address of a constant in the constant poolWith this functionality in place, I then legalized
ushr.i8x16
to the equivalent of these seven instructions:v0 = band_imm [shift index], 7 # this is pre-existent in code_translator.rs v1 = bitcast i64x2 v0 # this moves the shift index into an XMM register (could be scalar_to_vector) v2 = x86_psrl v1 # I'm eliding some raw_bitcasts around this for clarity v3 = ishl_imm [shift index], 4 # this gets the index into the mask v4 = const_addr [mask ref] # a RIP-relative LEA v5 = load_complex v3, v4 # MOVUPS the mask v6 = band v2, v5 # mask off the bits that would have been zeroed in a true ushr.i8x16```2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
abrown updated PR #1377 from i8x16-shift
to master
:
This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:
- Make constants declarable in the function preamble:
const42 = 0x.....
- Add a
const_addr
instruction in order to get the base address of a constant in the constant poolWith this functionality in place, I then legalized
ushr.i8x16
to the equivalent of these seven instructions:v0 = band_imm [shift index], 7 # this is pre-existent in code_translator.rs v1 = bitcast i64x2 v0 # this moves the shift index into an XMM register (could be scalar_to_vector) v2 = x86_psrl v1 # I'm eliding some raw_bitcasts around this for clarity v3 = ishl_imm [shift index], 4 # this gets the index into the mask v4 = const_addr [mask ref] # a RIP-relative LEA v5 = load_complex v3, v4 # MOVUPS the mask v6 = band v2, v5 # mask off the bits that would have been zeroed in a true ushr.i8x16```2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
abrown updated PR #1377 from i8x16-shift
to master
:
This PR adds an implementation of i8x16 shift. Since x86 does not have such an instruction (i.e. it stops at i16x8), we spent a considerable time discussing the best way to implement this for x86 in a spec issue. In the end, I chose to implement this with constant masks, for which I had to:
- Make constants declarable in the function preamble:
const42 = 0x.....
- Add a
const_addr
instruction in order to get the base address of a constant in the constant poolWith this functionality in place, I then legalized
ushr.i8x16
to the equivalent of these seven instructions:v0 = band_imm [shift index], 7 # this is pre-existent in code_translator.rs v1 = bitcast i64x2 v0 # this moves the shift index into an XMM register (could be scalar_to_vector) v2 = x86_psrl v1 # I'm eliding some raw_bitcasts around this for clarity v3 = ishl_imm [shift index], 4 # this gets the index into the mask v4 = const_addr [mask ref] # a RIP-relative LEA v5 = load_complex v3, v4 # MOVUPS the mask v6 = band v2, v5 # mask off the bits that would have been zeroed in a true ushr.i8x16```2f648ea is actually not directly related to this PR but is useful for the benchmark I am attempting to run.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
sunfishcode submitted PR Review.
abrown merged PR #1377.
Last updated: Jan 24 2025 at 00:11 UTC