afonso360 edited issue #7188.
afonso360 added the cranelift:area:riscv64 label to Issue #7188.
afonso360 edited issue #7188:
:wave: Hey,
vslideup.vi
is a very neat instruction that merges two vector registers by copying the bottom elements of one register into the top elements of another register.We currently only use it when we cannot perform an operation in a single register due to the register not being large enough.
We should create a
ty_vec_fits_twice_in_reg
extractor that allows us to know that we can fit two vector values in a single register. That way we know we can place twice the elements in a single register.Instructions that use this operation are:
{s,u,uu}narrow
These operations do a 2x
vnclip
and 1xvslideup
. We could instead do thevslideup
first merging both values in the same register, and emit a singlevnclip
.
iadd_pairwise
We don't have a dedicated
iadd_pairwise
instruction. Instead we shuffle the elements in a register and do a regularvadd.vv
.This works, but requires us two perform twice the instructions that we normally would have to.
Ideally we would merge both registers in a single
vslideup
, usevcompress
to extract each side of the addition, and emit a singlevadd
to sum both sides.The V8 lowering for
i32x4.dot_i16x8_s
pulls a similar trick by using LMUL2 which uses two registers. We can't use LMUL>1 due to regalloc incompatibilities.Alternatives
We don't actually need to do this. The RISC-V vector specification allows us to use LMUL > 1 to treat a register as having more space than it actually does.
The way this works is by combining multiple registers and having each instruction working on multiple registers at once.
The reason we don't currently do this is that it adds a bunch of register allocation constraints that we can't describe to regalloc2.
Last updated: Jan 24 2025 at 00:11 UTC