Stream: git-wasmtime

Topic: wasmtime / PR #6408 riscv64: Implement `insertlane`


view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 11:17):

afonso360 opened PR #6408 from afonso360:riscv-insertlane to bytecodealliance:main:

:wave: Hey,

This PR implements insertlane. The implementation is essentially the same as the splat instruction, but using the masked version of vmv where only one lane is enabled.

Almost all instructions in the V spec support a mask. The mask is determined by a single bit in the instruction encoding, and when enabled always sources v0.

The WASM replacelane tests exposed a preexisting bug in the encoding of vmv.s.x and vfmv.s.f, where their source operands are encoded in the vs1 field, instead of vs2. A fix for that is also included in this PR.

view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 11:17):

afonso360 requested elliottt for a review on PR #6408.

view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 11:17):

afonso360 requested wasmtime-compiler-reviewers for a review on PR #6408.

view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 11:17):

afonso360 requested wasmtime-default-reviewers for a review on PR #6408.

view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 11:18):

afonso360 edited PR #6408:

:wave: Hey,

This PR implements insertlane. The implementation is essentially the same as the splat instruction, but using the masked version of vmv where only one lane is enabled.

Almost all instructions in the V spec support a mask. The mask is determined by a single bit in the instruction encoding, and when enabled always sources register v0 as a mask. Masks are bitfields where each bit corresponds to one lane.

The WASM replacelane tests exposed a preexisting bug in the encoding of vmv.s.x and vfmv.s.f, where their source operands are encoded in the vs1 field, instead of vs2. A fix for that is also included in this PR.

view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 11:18):

afonso360 edited PR #6408:

:wave: Hey,

This PR implements insertlane. The implementation is essentially the same as the splat instruction, but using the masked version of vmv where only one lane is enabled.

Almost all instructions in the V spec support a mask. The mask is determined by a single bit in the instruction encoding, and when enabled always sources register v0 as a mask. Masks are represented as bitfields where each bit corresponds to one lane.

The WASM replacelane tests exposed a preexisting bug in the encoding of vmv.s.x and vfmv.s.f, where their source operands are encoded in the vs1 field, instead of vs2. A fix for that is also included in this PR.

view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 11:23):

afonso360 edited PR #6408:

:wave: Hey,

This PR implements insertlane. The implementation is essentially the same as the splat instruction, but using the masked version of vmv where only one lane is enabled. Masked vmv's are called vmerge, and they are only special in the fact that they don't follow the mask policy for masked out elements. Instead, masked out elements are sourced from one of the operands, instead of being ignored.

Almost all instructions in the V spec support a mask. The mask is determined by a single bit in the instruction encoding, and when enabled always sources register v0 as a mask. Masks are represented as bitfields where each bit corresponds to one lane.

The WASM replacelane tests exposed a preexisting bug in the encoding of vmv.s.x and vfmv.s.f, where their source operands are encoded in the vs1 field, instead of vs2. A fix for that is also included in this PR.

view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 14:19):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 14:19):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 14:19):

alexcrichton created PR review comment:

Drive-by comment on this, not something specific to this PR, but nowadays format!("{foo}") support in Rust can probably help simplify a lot of these format strings. This affects all backends, not just risc-v, but may be worth incrementally updating from now on if you're interested.

view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 14:19):

alexcrichton created PR review comment:

I don't know much about RISC-V vector instructions, but is this the most optimal this can be? For example are there instructions which move immediates into registers if the immediate is small?

I ask because it seems sort of out of place abstractly to have these sorts of operations when dealing with the immediates of insertlane/extractlane, but hey if it's the way it is then that's how it's gonna be.

One thing I thought though is that this might be good to delegate to a general vconst helper. Right now vconst always performs a load but it might be reasonable to fold a rule like this where if the immediate fits in an i64 it could do the x->v register movement.

view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 14:28):

afonso360 created PR review comment:

Yeah, sounds good, I'll start using that.

view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 15:59):

afonso360 created PR review comment:

I don't know much about RISC-V vector instructions, but is this the most optimal this can be? For example are there instructions which move immediates into registers if the immediate is small?

I'm not sure!

There are a few combos here that I can think of that could potentially be better than this in terms of materializing the mask. One of them like you mentioned is using something like the immediate splat instruction vmv.v.i, but that only covers a weird subset of masks. Where the mask result can fit in a signed imm5 per lane. So with that one we can do a mask like 0b001111 but not 0b011111, though we do get the benefit of it being just a single instruction.

I didn't want to try to pull off those sort of weird edge cases yet, but it might be easier than I think if we can look at the whole thing through the lens of vconst, I'm going to give that a try.

I ask because it seems sort of out of place abstractly to have these sorts of operations when dealing with the immediates of insertlane/extractlane, but hey if it's the way it is then that's how it's gonna be.

Do you mean having a separate move instruction into the X / F registers that has no immediate? We used to have instructions that matched exactly those semantics, but they were removed with the following reasoning:

NOTE: The complementary vins.v.x instruction, which allow a write to
any element in a vector register, has been removed. This instruction
would be the only instruction (apart from vsetvl) that requires two
integer source operands, and also would be slow to execute in an
implementation with vector register renaming, relegating its main use
to debugger modifications to state. The alternative and more
generally useful vslide1up and vslide1down instructions can be
used to update vector register state in place over a debug link
without accessing memory.

vslide1up and vslide1down aren't exactly applicable to insertlane and extractlane. And they have a special note in the standard stating that they are slow instructions, pretty much just reserved for debuggers.

One thing I thought though is that this might be good to delegate to a general vconst helper. Right now vconst always performs a load but it might be reasonable to fold a rule like this where if the immediate fits in an i64 it could do the x->v register movement.

Yeah, that seems like a good idea! I'm also going to try to merge the splat const rules into that.


On a broader note I checked what the other engines are doing here w.r.t insertlane:

V8 seems to be doing exactly what we do.

LLVM does a vslideup with a tail undisturbed policy, which is also an option and it does mean they can avoid materializing a mask since they can use vslideup.vi with the lane as part of the immediate. I have no idea if this is better or worse than the option above since it also emits 2 or 3 vsetvli's.

view this post on Zulip Wasmtime GitHub notifications bot (May 18 2023 at 16:45):

alexcrichton created PR review comment:

Interesting, and thanks for explaining all that! To be clear I don't think any changes are necessary for this PR, I was mostly just musing. With prior art in V8 this is definitely fine to land (even without that I think it's ok). I agree it's probably not worth it yet to have lots of special cases for various immediates here and there, better to start simple and build out. I was mostly curious if there was a pattern that was going to be implemented later that would simplify this or whether this would be the way forward for awhile.

view this post on Zulip Wasmtime GitHub notifications bot (May 20 2023 at 10:31):

afonso360 updated PR #6408.

view this post on Zulip Wasmtime GitHub notifications bot (May 20 2023 at 10:33):

afonso360 has enabled auto merge for PR #6408.

view this post on Zulip Wasmtime GitHub notifications bot (May 20 2023 at 10:34):

afonso360 created PR review comment:

I think I would prefer handling the merging all of the vconst stuff in a future PR. But yeah, making mask generation as part of that makes a ton of sense!

view this post on Zulip Wasmtime GitHub notifications bot (May 20 2023 at 11:46):

afonso360 merged PR #6408.


Last updated: Nov 22 2024 at 17:03 UTC