afonso360 opened PR #6408 from afonso360:riscv-insertlane
to bytecodealliance:main
:
:wave: Hey,
This PR implements
insertlane
. The implementation is essentially the same as thesplat
instruction, but using the masked version ofvmv
where only one lane is enabled.Almost all instructions in the V spec support a mask. The mask is determined by a single bit in the instruction encoding, and when enabled always sources
v0
.The WASM
replacelane
tests exposed a preexisting bug in the encoding ofvmv.s.x
andvfmv.s.f
, where their source operands are encoded in thevs1
field, instead ofvs2
. A fix for that is also included in this PR.
afonso360 requested elliottt for a review on PR #6408.
afonso360 requested wasmtime-compiler-reviewers for a review on PR #6408.
afonso360 requested wasmtime-default-reviewers for a review on PR #6408.
afonso360 edited PR #6408:
:wave: Hey,
This PR implements
insertlane
. The implementation is essentially the same as thesplat
instruction, but using the masked version ofvmv
where only one lane is enabled.Almost all instructions in the V spec support a mask. The mask is determined by a single bit in the instruction encoding, and when enabled always sources register
v0
as a mask. Masks are bitfields where each bit corresponds to one lane.The WASM
replacelane
tests exposed a preexisting bug in the encoding ofvmv.s.x
andvfmv.s.f
, where their source operands are encoded in thevs1
field, instead ofvs2
. A fix for that is also included in this PR.
afonso360 edited PR #6408:
:wave: Hey,
This PR implements
insertlane
. The implementation is essentially the same as thesplat
instruction, but using the masked version ofvmv
where only one lane is enabled.Almost all instructions in the V spec support a mask. The mask is determined by a single bit in the instruction encoding, and when enabled always sources register
v0
as a mask. Masks are represented as bitfields where each bit corresponds to one lane.The WASM
replacelane
tests exposed a preexisting bug in the encoding ofvmv.s.x
andvfmv.s.f
, where their source operands are encoded in thevs1
field, instead ofvs2
. A fix for that is also included in this PR.
afonso360 edited PR #6408:
:wave: Hey,
This PR implements
insertlane
. The implementation is essentially the same as thesplat
instruction, but using the masked version ofvmv
where only one lane is enabled. Maskedvmv
's are calledvmerge
, and they are only special in the fact that they don't follow the mask policy for masked out elements. Instead, masked out elements are sourced from one of the operands, instead of being ignored.Almost all instructions in the V spec support a mask. The mask is determined by a single bit in the instruction encoding, and when enabled always sources register
v0
as a mask. Masks are represented as bitfields where each bit corresponds to one lane.The WASM
replacelane
tests exposed a preexisting bug in the encoding ofvmv.s.x
andvfmv.s.f
, where their source operands are encoded in thevs1
field, instead ofvs2
. A fix for that is also included in this PR.
alexcrichton submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Drive-by comment on this, not something specific to this PR, but nowadays
format!("{foo}")
support in Rust can probably help simplify a lot of these format strings. This affects all backends, not just risc-v, but may be worth incrementally updating from now on if you're interested.
alexcrichton created PR review comment:
I don't know much about RISC-V vector instructions, but is this the most optimal this can be? For example are there instructions which move immediates into registers if the immediate is small?
I ask because it seems sort of out of place abstractly to have these sorts of operations when dealing with the immediates of insertlane/extractlane, but hey if it's the way it is then that's how it's gonna be.
One thing I thought though is that this might be good to delegate to a general
vconst
helper. Right nowvconst
always performs a load but it might be reasonable to fold a rule like this where if the immediate fits in an i64 it could do the x->v register movement.
afonso360 created PR review comment:
Yeah, sounds good, I'll start using that.
afonso360 created PR review comment:
I don't know much about RISC-V vector instructions, but is this the most optimal this can be? For example are there instructions which move immediates into registers if the immediate is small?
I'm not sure!
There are a few combos here that I can think of that could potentially be better than this in terms of materializing the mask. One of them like you mentioned is using something like the immediate splat instruction
vmv.v.i
, but that only covers a weird subset of masks. Where the mask result can fit in a signed imm5 per lane. So with that one we can do a mask like0b001111
but not0b011111
, though we do get the benefit of it being just a single instruction.I didn't want to try to pull off those sort of weird edge cases yet, but it might be easier than I think if we can look at the whole thing through the lens of
vconst
, I'm going to give that a try.I ask because it seems sort of out of place abstractly to have these sorts of operations when dealing with the immediates of insertlane/extractlane, but hey if it's the way it is then that's how it's gonna be.
Do you mean having a separate move instruction into the X / F registers that has no immediate? We used to have instructions that matched exactly those semantics, but they were removed with the following reasoning:
NOTE: The complementary
vins.v.x
instruction, which allow a write to
any element in a vector register, has been removed. This instruction
would be the only instruction (apart fromvsetvl
) that requires two
integer source operands, and also would be slow to execute in an
implementation with vector register renaming, relegating its main use
to debugger modifications to state. The alternative and more
generally usefulvslide1up
andvslide1down
instructions can be
used to update vector register state in place over a debug link
without accessing memory.
vslide1up
andvslide1down
aren't exactly applicable toinsertlane
andextractlane
. And they have a special note in the standard stating that they are slow instructions, pretty much just reserved for debuggers.One thing I thought though is that this might be good to delegate to a general vconst helper. Right now vconst always performs a load but it might be reasonable to fold a rule like this where if the immediate fits in an i64 it could do the x->v register movement.
Yeah, that seems like a good idea! I'm also going to try to merge the splat const rules into that.
On a broader note I checked what the other engines are doing here w.r.t
insertlane
:V8 seems to be doing exactly what we do.
LLVM does a
vslideup
with a tail undisturbed policy, which is also an option and it does mean they can avoid materializing a mask since they can usevslideup.vi
with the lane as part of the immediate. I have no idea if this is better or worse than the option above since it also emits 2 or 3vsetvli
's.
alexcrichton created PR review comment:
Interesting, and thanks for explaining all that! To be clear I don't think any changes are necessary for this PR, I was mostly just musing. With prior art in V8 this is definitely fine to land (even without that I think it's ok). I agree it's probably not worth it yet to have lots of special cases for various immediates here and there, better to start simple and build out. I was mostly curious if there was a pattern that was going to be implemented later that would simplify this or whether this would be the way forward for awhile.
afonso360 updated PR #6408.
afonso360 has enabled auto merge for PR #6408.
afonso360 created PR review comment:
I think I would prefer handling the merging all of the vconst stuff in a future PR. But yeah, making mask generation as part of that makes a ton of sense!
afonso360 merged PR #6408.
Last updated: Dec 23 2024 at 13:07 UTC