afonso360 opened PR #6366 from afonso360:riscv-regclass-vec
to bytecodealliance:main
:
:wave: Hey,
This PR changes the RISC-V backend to use the new Vector Regclass when dealing with vectors. It also implements the calling convention described in this document. Essentially all vector arguments are passed via stack.
Additionally this also adds a test for the vconst pool, which was supposed to be merged in with #6324 but was failing due to some missing ABI stuff.
I've also had to limit the maximum usable SIMD type down to 1024bits, it looks like regalloc2 does not support spills larger than 2040 bytes and it's not really worth fixing it. 1024bit vectors ought to be enough for anybody.
afonso360 requested cfallin for a review on PR #6366.
afonso360 requested wasmtime-compiler-reviewers for a review on PR #6366.
afonso360 created PR review comment:
I'm not 100% sure about this.
v0
is a completely normal vector register, however it is also the only vector register that can be used as a mask with instructions. i.e. when instructions are masked they implicitly readv0
.Should I mark v0 as non preferable to avoid unnecessary moves, or will regalloc figure it out once we start adding
fixed_def
's?To be clear, masks can be computed in any vector register, they can only be used in
v0
.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
afonso360 edited PR #6366:
:wave: Hey,
This PR changes the RISC-V backend to use the new Vector Regclass when dealing with vectors. It also implements the calling convention described in this document. Essentially all vector arguments are passed via stack and all registers are caller saved.
Additionally this also adds a test for the vconst pool, which was supposed to be merged in with #6324 but was failing due to some missing ABI stuff.
I've also had to limit the maximum usable SIMD type down to 1024bits, it looks like regalloc2 does not support spills larger than 2040 bytes and it's not really worth fixing it. 1024bit vectors ought to be enough for anybody.
cfallin submitted PR review:
Looks mostly good to me, thanks!
Just one possible improvement below but if you'd rather merge as-is and tackle it in a followup or leave as a TODO, I think that's fine too.
cfallin submitted PR review:
Looks mostly good to me, thanks!
Just one possible improvement below but if you'd rather merge as-is and tackle it in a followup or leave as a TODO, I think that's fine too.
cfallin created PR review comment:
This implementation looks correct, but FWIW the hope at least with the
PRegSet
definition was to allow for constants (they're small bitsets, u128 or now u256) to be returned by backends. I realize the existing backend also doesn't do this, so in the worst case we can keep the status quo. But this might even be a small but visible performance issue for function bodies with frequent calls. Would it be possible to define aconst fn
that computes aPRegSet
here (either from existing constants or with a straight-line "add reg ..." sequence)?
cfallin created PR review comment:
I think it's fine to have all vec-regs be equally preferred; if an instruction constrains an arg to require
v0
then RA2 will do the right thing and put other values in other registers as long as they're available (or split/spill otherwise). The preferred/non-preferred mechanism is mainly there to push allocations toward caller-saves (volatiles) and avoid prologue/epilogue instructions where possible.
afonso360 updated PR #6366.
afonso360 has enabled auto merge for PR #6366.
afonso360 merged PR #6366.
Last updated: Dec 23 2024 at 12:05 UTC