afonso360 opened PR #6609 from afonso360:riscv-simd-icmp
to bytecodealliance:main
:
These are implemented as a combination of two steps, mask generation and mask expansion. Our comparison rules only return their results as a mask register, so we need to expand the mask into lane sized elements.
We have 20 (!) comparison instructions, nearly the full table of all IntCC codes in VV, VX and VI formats. However there are some holes in this table.
They are:
vmsltu.vi
(Less than Unsigned (Vec-Imm))vmslt.vi
(Less than (Vec-Imm))vmsgtu.vv
(Greater than Unsigned (Vec-Vec))vmsgt.vv
(Greater than (Vec-Vec))vmsgeu.*
(Greater than or equal Unsigned (All formats))vmsge.*
(Greater than or equal (All formats))Most of these can be replaced with the inverted IntCC instruction. To minimize the size of this initial PR I've only implemented rules for the opcodes that we have a direct translation.
However, in order to get all IntCC's working I've implemented some of the inverted instruction which are
vmsgtu.vv
,vmsgt.vv
,vmsgeu.vv
,vmsge.vv
. These are implemented as alias to their inverted counterparts.I'm planning on adding a follow up commit with the rest of the VX and VI rules in both the LHS an RHS sides. We should end up with 5 rules per IntCC once this is all done.
I've split the actual mask expansion into it's own separate rule since we are going to need it for the
fcmp
rules as well.The instruction selection for
icmp
is on a separate rule simply because the rules end up less verbose than if they were inlined directly into theicmp
rule.
afonso360 requested cfallin for a review on PR #6609.
afonso360 requested wasmtime-compiler-reviewers for a review on PR #6609.
afonso360 requested wasmtime-default-reviewers for a review on PR #6609.
afonso360 edited PR #6609:
:wave: Hey,
This PR Implements SIMD
icmp
for RISC-V. These rules are implemented as a combination of two steps, mask generation and mask expansion. Our comparison rules only return their results as a mask register, so we need to expand the mask into lane sized elements.We have 20 (!) comparison instructions, nearly the full table of all IntCC codes in VV, VX and VI formats. However there are some holes in this table.
They are:
vmsltu.vi
(Less than Unsigned (Vec-Imm))vmslt.vi
(Less than (Vec-Imm))vmsgtu.vv
(Greater than Unsigned (Vec-Vec))vmsgt.vv
(Greater than (Vec-Vec))vmsgeu.*
(Greater than or equal Unsigned (All formats))vmsge.*
(Greater than or equal (All formats))Most of these can be replaced with the inverted IntCC instruction. To minimize the size of this initial PR I've only implemented rules for the opcodes that we have a direct translation.
However, in order to get all IntCC's working I've implemented some of the inverted instruction which are
vmsgtu.vv
,vmsgt.vv
,vmsgeu.vv
,vmsge.vv
. These are implemented as alias to their inverted counterparts.I'm planning on adding a follow up commit with the rest of the VX and VI rules in both the LHS an RHS sides. We should end up with 5 rules per IntCC once this is all done.
I've split the actual mask expansion into it's own separate rule since we are going to need it for the
fcmp
rules as well.The instruction selection for
icmp
is on a separate rule simply because the rules end up less verbose than if they were inlined directly into theicmp
rule.
afonso360 edited PR #6609:
:wave: Hey,
This PR Implements SIMD
icmp
for RISC-V. These rules are implemented as a combination of two steps, mask generation and mask expansion. Our comparison rules only return their results as a mask register, so we need to expand the mask into lane sized elements.We have 20 (!) comparison instructions, nearly the full table of all IntCC codes in VV, VX and VI formats. However there are some holes in this table.
They are:
vmsltu.vi
(Less than Unsigned (Vec-Imm))vmslt.vi
(Less than (Vec-Imm))vmsgtu.vv
(Greater than Unsigned (Vec-Vec))vmsgt.vv
(Greater than (Vec-Vec))vmsgeu.*
(Greater than or equal Unsigned (All formats))vmsge.*
(Greater than or equal (All formats))Most of these can be replaced with the inverted IntCC instruction. To minimize the size of this initial PR I've only implemented rules for the opcodes that we have a direct translation.
However, in order to get all IntCC's working I've implemented some of the inverted instruction which are
vmsgtu.vv
,vmsgt.vv
,vmsgeu.vv
,vmsge.vv
. These are implemented as alias to their inverted counterparts (with the inputs swapped).I'm planning on adding a follow up commit with the rest of the VX and VI rules in both the LHS an RHS sides. We should end up with 5 rules per IntCC once this is all done.
I've split the actual mask expansion into it's own separate rule since we are going to need it for the
fcmp
rules as well.The instruction selection for
icmp
is on a separate rule simply because the rules end up less verbose than if they were inlined directly into theicmp
rule.
alexcrichton submitted PR review:
Nice! I like the organization of the lowerings as it's quite clear to see what's going on and scan over things quickly if necessary.
afonso360 merged PR #6609.
Last updated: Dec 23 2024 at 12:05 UTC