Stream: git-wasmtime

Topic: wasmtime / PR #4772 x64: Lower shuffle and swizzle in ISLE


view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 17:32):

elliottt opened PR #4772 from trevor/x64-shuffle to main:

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 17:38):

elliottt edited PR #4772 from trevor/x64-shuffle to main:

Lower shuffle and swizzle in ISLE.

This PR surfaced a bug with the lowering of shuffle when avx512vl and avx512vbmi are enabled: we use vpermi2b as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into 0 in the result.

I plan to fix this immediately in a follow-up PR, and have added a TODO that identifies the problem in the x64 lower.isle.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 17:38):

elliottt has marked PR #4772 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 18:39):

elliottt updated PR #4772 from trevor/x64-shuffle to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 18:41):

elliottt edited PR #4772 from trevor/x64-shuffle to main:

Lower shuffle and swizzle in ISLE.

This PR surfaced a bug with the lowering of shuffle when avx512vl and avx512vbmi are enabled: we use vpermi2b as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into 0 in the result.

I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 18:44):

elliottt updated PR #4772 from trevor/x64-shuffle to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 18:45):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 18:45):

elliottt created PR review comment:

Here's the case where the mask wasn't necessary, thus no andps instruction was generated.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 18:45):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 18:45):

elliottt created PR review comment:

Here's a case where the mask was necessary, so the andps on line 38 is necessary.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 18:46):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 18:46):

elliottt created PR review comment:

This test already passed for the non-avx512 lowering, and this PR allows it to pass when avx512 is available as well.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 18:46):

elliottt edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 19:13):

elliottt edited PR #4772 from trevor/x64-shuffle to main:

Lower shuffle and swizzle in ISLE.

This PR surfaced a bug with the lowering of shuffle when avx512vl and avx512vbmi are enabled: we use vpermi2b as the implementation, but panic if the immediate shuffle mask contains any out-of-bounds values. The behavior when the avx512 extensions are not present is that out-of-bounds values are turned into 0 in the result.

I've resolved this by detecting when the shuffle immediate has out-of-bounds indices in the avx512-enabled lowering, and generating an additional mask to zero out the lanes where those indices occur. This brings the avx512 case into line with the semantics of the shuffle op: https://github.com/bytecodealliance/wasmtime/blob/94bcbe844612306c136193f41f5cbdeec45a42ec/cranelift/codegen/meta/src/shared/instructions.rs#L1495-L1498

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 19:31):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 19:31):

elliottt created PR review comment:

I'm a little unhappy about this, but we don't have an encoding for xmm instructions that have three arguments currently.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:37):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:37):

cfallin created PR review comment:

could we say what that behavior is and/or mention "CLIF-level shuffle" here? otherwise it's a bit unclear if one doesn't already have the context I think.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:37):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:37):

cfallin created PR review comment:

Excellent -- it sounds like you've tested on avx512 hardware?

(As an aside, it might be nice to think about ways to use qemu to get us AVX512 testing in CI, if the GitHub Actions host doesn't natively have it. @abrown any thoughts on that?)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:37):

cfallin created PR review comment:

Here we're relying on implicit firing-order heuristics (the above rule before this one, specifically perm_from_mask... extractor before mask variable binding); I think tests below should ensure this works properly but just wanted to call it out to be sure.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:37):

cfallin created PR review comment:

Yeah, we can definitely add such a thing later, and should I think; we'll get to this as part of our "no more mod operands" cleanup on regalloc operands, if not before.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:37):

cfallin created PR review comment:

s/shuffle/swizzle/ ?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:55):

elliottt created PR review comment:

I have -- my laptop has those extensions :D

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:55):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:56):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:56):

elliottt created PR review comment:

Thanks for catching that!

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:59):

elliottt created PR review comment:

Yep, we should have enough test coverage to catch problems with these two: there are precise-output tests for both rules.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:59):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 20:59):

elliottt updated PR #4772 from trevor/x64-shuffle to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 21:12):

elliottt updated PR #4772 from trevor/x64-shuffle to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 21:30):

elliottt has enabled auto merge for PR #4772.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2022 at 21:49):

elliottt merged PR #4772.


Last updated: Jan 24 2025 at 00:11 UTC