afonso360 opened issue #6623:
Feature
We currently implement a bunch of rules to try to efficiently lower an
icmp
clif instruction into a single RISC-V vector instruction. We have a table with all of the cases where we know there is an equivalent instruction (seegen_icmp_mask
in the RISC-V backend).However, we are still missing one set of cases. When we have a
icmp+splat+iconst
we can usually emit a*.vi
opcode. And we do that if theicmp
andsplat
match exactly what the opcode does, but we can go further.Take this example:
function %simd_icmp_sge_splat_const_rhs_i32(i32x4) -> i32x4 { block0(v0: i32x4): v1 = iconst.i32 10 v2 = splat.i32x4 v1 v3 = icmp sge v0, v2 return v3 }
While we don't have a
vmsge.vi
instruction we can use thevmsgt.vi
instruction to essentially do the same thing by decrementing the immediate by 1.There 8 cases like this one that we can still improve in the backend. (See the godbolt link below)
Benefit
This improves the
icmp
codegen by generating fewer instructions for a select number of cases.Implementation
Here's a godbolt link of what LLVM does in each of the missing scenarios in
gen_icmp_mask
.For some of them, such as
icmp_ule_lhs_splat
it looks like we don't have any efficient implementation other than a separate splat and.vv
instruction. In these cases, we can leave holes in the table and we'll automatically generate that code.However the more interesting cases are
icmp_ule_lhs_splat_const
(and similar) where we generate the reverse IntCC with an immediate that isimm-1
. That gives us some range optimize these icmp's.Alternatives
This is all optional, and these improvements only cover 8 cases (out of 50 possible), so they might be fairly rare.
afonso360 labeled issue #6623:
Feature
We currently implement a bunch of rules to try to efficiently lower an
icmp
clif instruction into a single RISC-V vector instruction. We have a table with all of the cases where we know there is an equivalent instruction (seegen_icmp_mask
in the RISC-V backend).However, we are still missing one set of cases. When we have a
icmp+splat+iconst
we can usually emit a*.vi
opcode. And we do that if theicmp
andsplat
match exactly what the opcode does, but we can go further.Take this example:
function %simd_icmp_sge_splat_const_rhs_i32(i32x4) -> i32x4 { block0(v0: i32x4): v1 = iconst.i32 10 v2 = splat.i32x4 v1 v3 = icmp sge v0, v2 return v3 }
While we don't have a
vmsge.vi
instruction we can use thevmsgt.vi
instruction to essentially do the same thing by decrementing the immediate by 1.There 8 cases like this one that we can still improve in the backend. (See the godbolt link below)
Benefit
This improves the
icmp
codegen by generating fewer instructions for a select number of cases.Implementation
Here's a godbolt link of what LLVM does in each of the missing scenarios in
gen_icmp_mask
.For some of them, such as
icmp_ule_lhs_splat
it looks like we don't have any efficient implementation other than a separate splat and.vv
instruction. In these cases, we can leave holes in the table and we'll automatically generate that code.However the more interesting cases are
icmp_ule_lhs_splat_const
(and similar) where we generate the reverse IntCC with an immediate that isimm-1
. That gives us some range optimize these icmp's.Alternatives
This is all optional, and these improvements only cover 8 cases (out of 50 possible), so they might be fairly rare.
afonso360 labeled issue #6623:
Feature
We currently implement a bunch of rules to try to efficiently lower an
icmp
clif instruction into a single RISC-V vector instruction. We have a table with all of the cases where we know there is an equivalent instruction (seegen_icmp_mask
in the RISC-V backend).However, we are still missing one set of cases. When we have a
icmp+splat+iconst
we can usually emit a*.vi
opcode. And we do that if theicmp
andsplat
match exactly what the opcode does, but we can go further.Take this example:
function %simd_icmp_sge_splat_const_rhs_i32(i32x4) -> i32x4 { block0(v0: i32x4): v1 = iconst.i32 10 v2 = splat.i32x4 v1 v3 = icmp sge v0, v2 return v3 }
While we don't have a
vmsge.vi
instruction we can use thevmsgt.vi
instruction to essentially do the same thing by decrementing the immediate by 1.There 8 cases like this one that we can still improve in the backend. (See the godbolt link below)
Benefit
This improves the
icmp
codegen by generating fewer instructions for a select number of cases.Implementation
Here's a godbolt link of what LLVM does in each of the missing scenarios in
gen_icmp_mask
.For some of them, such as
icmp_ule_lhs_splat
it looks like we don't have any efficient implementation other than a separate splat and.vv
instruction. In these cases, we can leave holes in the table and we'll automatically generate that code.However the more interesting cases are
icmp_ule_lhs_splat_const
(and similar) where we generate the reverse IntCC with an immediate that isimm-1
. That gives us some range optimize these icmp's.Alternatives
This is all optional, and these improvements only cover 8 cases (out of 50 possible), so they might be fairly rare.
Last updated: Dec 23 2024 at 12:05 UTC