afonso360 opened PR #6397 from afonso360:riscv-extract-splat
to bytecodealliance:main
:
:wave: Hey,
This PR implements both
extractlane
andsplat
.
splat
is fairly simple in that we have a bunch of move instructions that by default splat the source X or F register into the vector register. These arevmv.v.x
,vfmv.v.f
andvmv.v.i
, for X, F and immediate sources. The only noteworthy thing about these instructions is that they have weird encodings, and I've added two new instruction formats to deal with this.vmv.v.i
has no source operands, only a destination register. And the other ones could maybe be encoded using the existing Imm5 instruction format it was becoming a bit weird keeping all of the variations together.For
extractlane
we have two additional move instructionsvfmv.f.s
andvmv.x.s
these move element 0 of the source vector into the appropriate X or F register. Additionally for extracting other elements we usevslidedown
that moves all elements of a vector down by n positions and then emit the appropriate move into the destination register.
afonso360 requested elliottt for a review on PR #6397.
afonso360 requested wasmtime-compiler-reviewers for a review on PR #6397.
afonso360 requested wasmtime-default-reviewers for a review on PR #6397.
alexcrichton submitted PR review:
Nice!
alexcrichton submitted PR review:
Nice!
alexcrichton created PR review comment:
IIRC the aarch64 backend does some trickery along these lines where it iteratively halves the size of a constant if it's splatted, which may serve as good inspiration for supporting this.
alexcrichton created PR review comment:
Out of curiosity, is there a particular motivation for having this helper here vs inlining it into the lowering of
extractlane
?
afonso360 created PR review comment:
Mostly to avoid having two rules for float and integer in the slide down cases.
vslidedown
is generic across integer and float elements, butvmv
is not, so we recursively call this rule to decide the correctvmv
instruction to use.That being said, we can probably inline the
vslidedown
rules and have just a genericvmv
that decides the correct instruction based on the type.
alexcrichton created PR review comment:
Oh sorry to clarify I mean that you've got 4 cases of
gen_extractlane
here, but why not have those 4 cases be cases onlower (extractlane ..)
?
afonso360 created PR review comment:
If I inline it directly into
(lower (extractlane ..))
then it would become, 6 rules, right? Since I would have to duplicatevslidedown.vi+vmv
,vslidedown.vi+vfm
,vslidedown.vx+vmv
andvslidedown.vx+vfm
, which are currently just 2 rules.Or can I then recursively call the lower instructions directly?
afonso360 edited PR review comment.
alexcrichton created PR review comment:
Ah I apologize I missed that crucial bit of this being a recursive rule! In that case yeah definitely makes sense as a standalone decl.
afonso360 edited PR review comment.
afonso360 updated PR #6397.
afonso360 has enabled auto merge for PR #6397.
afonso360 merged PR #6397.
Last updated: Nov 22 2024 at 17:03 UTC