alexcrichton opened PR #5977 from aarch64-shuffles
to main
:
This is the equivalent of https://github.com/bytecodealliance/wasmtime/pull/5930 but for AArch64. I went through various instructions I saw for AArch64 and added corresponding
shuffle
lowerings where appropriate. These lowerings cover all the lowerings I found in the meshoptimizer repository plus a few more based on various instructions I found while perusing ARM's documentation. Like with x86_64 I've tried to make sure there's a runtest and a precise-output test for each lowering, even if some of them probably overlap with the x86_64 runtests.I'll note that many of these lowerings probably won't end up getting used by "portable" wasm binaries since some of the shifts here are pretty specific to AArch64 and don't have efficient 1/2 instruction lowerings on x86_64. That being said these are useful to any sort of hypothetical Cranelift-as-an-AArch64-backend-compiler such as rustc_cranelift_codegen since this broadens the spectrum of instructions supported by Cranelift's AArch64 backend.
alexcrichton updated PR #5977 from aarch64-shuffles
to main
.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
I wonder if it would make these patterns clearer to have an extractor something like
(shuffle_immediate 30 28 26 ...)
(with external Rust impl that isFn(&mut self, imm: Immediate) -> Option<(u8, u8, u8, u8, ...)>
)?
cfallin created PR review comment:
Can we add doc comments here to describe what pattern in the
Immediate
each of these etors matches on? (Likewise below)
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I originally did this in https://github.com/bytecodealliance/wasmtime/pull/5905 but @jameysharp preferred the hex masks instead. I don't mind myself, but I do think it's worth being consistent across the backends so I'd want to update all the x64 things if these aarch64 rules change as wlel.
jameysharp submitted PR review.
jameysharp created PR review comment:
Funny, I suggested exactly the opposite in a previous PR :laughing:
alexcrichton updated PR #5977 from aarch64-shuffles
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Interesting!
I see the points in #5905 now about exposing more opportunity to islec by making the full mask visible as one value; that's a reasonable argument I think. My rationale was that I was having some friction converting hex values in my head to understand the permutation (but maybe the right answer to that is just to think in hex directly). I don't feel too strongly about it, so this is fine as-is.
alexcrichton merged PR #5977.
Last updated: Jan 24 2025 at 00:11 UTC