elliottt opened PR #4772 from trevor/x64-shuffle
to main
:
- Lower shuffle in ISLE
- Add a shuffle test that produces zeros
- Add a TODO for the use of vpermi2b
- Lower swizzle in ISLE
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
elliottt edited PR #4772 from trevor/x64-shuffle
to main
:
Lower
shuffle
andswizzle
in ISLE.This PR surfaced a bug with the lowering of
shuffle
when avx512vl and avx512vbmi are enabled: we usevpermi2b
as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into0
in the result.I plan to fix this immediately in a follow-up PR, and have added a TODO that identifies the problem in the x64 lower.isle.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
elliottt has marked PR #4772 as ready for review.
elliottt updated PR #4772 from trevor/x64-shuffle
to main
.
elliottt edited PR #4772 from trevor/x64-shuffle
to main
:
Lower
shuffle
andswizzle
in ISLE.This PR surfaced a bug with the lowering of
shuffle
when avx512vl and avx512vbmi are enabled: we usevpermi2b
as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into0
in the result.I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
elliottt updated PR #4772 from trevor/x64-shuffle
to main
.
elliottt submitted PR review.
elliottt created PR review comment:
Here's the case where the mask wasn't necessary, thus no
andps
instruction was generated.
elliottt submitted PR review.
elliottt created PR review comment:
Here's a case where the mask was necessary, so the
andps
on line 38 is necessary.
elliottt submitted PR review.
elliottt created PR review comment:
This test already passed for the non-avx512 lowering, and this PR allows it to pass when avx512 is available as well.
elliottt edited PR review comment.
elliottt edited PR #4772 from trevor/x64-shuffle
to main
:
Lower
shuffle
andswizzle
in ISLE.This PR surfaced a bug with the lowering of
shuffle
when avx512vl and avx512vbmi are enabled: we usevpermi2b
as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into0
in the result.I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur. This brings the avx512 case into line with the semantics of the
shuffle
op: https://github.com/bytecodealliance/wasmtime/blob/94bcbe844612306c136193f41f5cbdeec45a42ec/cranelift/codegen/meta/src/shared/instructions.rs#L1495-L1498<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
elliottt submitted PR review.
elliottt created PR review comment:
I'm a little unhappy about this, but we don't have an encoding for xmm instructions that have three arguments currently.
cfallin submitted PR review.
cfallin created PR review comment:
could we say what that behavior is and/or mention "CLIF-level shuffle" here? otherwise it's a bit unclear if one doesn't already have the context I think.
cfallin submitted PR review.
cfallin created PR review comment:
Excellent -- it sounds like you've tested on avx512 hardware?
(As an aside, it might be nice to think about ways to use qemu to get us AVX512 testing in CI, if the GitHub Actions host doesn't natively have it. @abrown any thoughts on that?)
cfallin created PR review comment:
Here we're relying on implicit firing-order heuristics (the above rule before this one, specifically
perm_from_mask...
extractor beforemask
variable binding); I think tests below should ensure this works properly but just wanted to call it out to be sure.
cfallin created PR review comment:
Yeah, we can definitely add such a thing later, and should I think; we'll get to this as part of our "no more mod operands" cleanup on regalloc operands, if not before.
cfallin created PR review comment:
s/shuffle/swizzle/ ?
elliottt created PR review comment:
I have -- my laptop has those extensions :D
elliottt submitted PR review.
elliottt submitted PR review.
elliottt created PR review comment:
Thanks for catching that!
elliottt created PR review comment:
Yep, we should have enough test coverage to catch problems with these two: there are precise-output tests for both rules.
elliottt submitted PR review.
elliottt updated PR #4772 from trevor/x64-shuffle
to main
.
elliottt updated PR #4772 from trevor/x64-shuffle
to main
.
elliottt has enabled auto merge for PR #4772.
elliottt merged PR #4772.
Last updated: Jan 24 2025 at 00:11 UTC