afonso360 opened PR #6240 from afonso360:riscv-vector
to bytecodealliance:main
:
:wave: Hey,
This is the initial PR for implementing SIMD types with RISC-V Vectors, as described in #6118.
This PR implements only
iadd
,load
,store
for SIMD types, the minimum required to get aniadd
runtest working.We currently only support 128bit SIMD types, but It's really easy to change that to 64bit or 32bit or any other width SIMD types as long as they fit in a single register, so I'm also planning on doing that in a future PR.
I didn't include any calling conventions changes, so we have a non standard calling convention for now. The RISC-V Calling Convention passes everything via stack and does not use vector registers to pass arguments. We've inherited the Floating Point part of the calling convention, so we are passing arguments in vector registers. I didn't want to make the initial PR too large, so I decided to make that part of a future PR.
QEMU's register size is set to 256 bits to avoid accidentally depending on 128b vector registers in the future.
afonso360 requested wasmtime-compiler-reviewers for a review on PR #6240.
afonso360 requested abrown for a review on PR #6240.
afonso360 requested wasmtime-default-reviewers for a review on PR #6240.
afonso360 created PR review comment:
This ends up generating a lot of instructions, and is far from ideal. But It's also what we do in regular loads/stores, so I figured it would be best to tackle those together in a future PR.
afonso360 edited PR #6240:
:wave: Hey,
This is the initial PR for implementing SIMD types with RISC-V Vectors, as described in #6118.
This PR implements only
iadd
,load
,store
for SIMD types, the minimum required to get aniadd
runtest working.We currently only support 128bit SIMD types, but It's really easy to change that to 64bit or 32bit or any other width SIMD types as long as they fit in a single register, so I'm also planning on doing that in a future PR.
I didn't include any calling conventions changes, so we have a non standard calling convention for now. The RISC-V Calling Convention passes everything via stack and does not use vector registers to pass arguments. We've inherited the Floating Point part of the calling convention, so we are passing arguments in vector registers. I didn't want to make the initial PR too large, so I decided to make that part of a future PR.
QEMU's register size is set to 256 bits to avoid accidentally depending on 128b vector registers in the future.
None of the vector instructions are recognized by capstone so they show up as
.byte
directives. I've been manually checking these againsllvm-objdump
.
afonso360 edited PR #6240:
:wave: Hey,
This is the initial PR for implementing SIMD types with RISC-V Vectors, as described in #6118.
This PR implements only
iadd
,load
,store
for SIMD types, the minimum required to get aniadd
runtest working.We currently only support 128bit SIMD types, but It's really easy to change that to 64bit or 32bit or any other width SIMD types as long as they fit in a single register, so I'm also planning on doing that in a future PR.
I didn't include any calling conventions changes, so we have a non standard calling convention for now. The RISC-V Calling Convention passes everything via stack and does not use vector registers to pass arguments. We've inherited the Floating Point part of the calling convention, so we are passing arguments in vector registers. I didn't want to make the initial PR too large, so I decided to make that part of a future PR.
QEMU's register size is set to 256 bits to avoid accidentally depending on 128b vector registers in the future.
None of the vector instructions are recognized by capstone so they show up as
.byte
directives. I've been manually checking these againstllvm-objdump
.
afonso360 updated PR #6240.
afonso360 updated PR #6240.
afonso360 updated PR #6240.
alexcrichton submitted PR review:
Some drive-by thoughts from me, but this great to see! Thanks for working on this!
alexcrichton submitted PR review:
Some drive-by thoughts from me, but this great to see! Thanks for working on this!
alexcrichton created PR review comment:
This might be a good candidate for a
(convert ...)
if it's going to show up a lot perhaps?
alexcrichton created PR review comment:
Food-for-thought question on this -- this is likely going to be quite common (everything gated by
has_v
), so one option would be to do what the x64 backend does today where ifenable_simd
is turned on then the code generator refuses to run unlesshas_v
is enabled (or in x64's case it's SSE3, SSSE3, and SSE4.1). That being said I'm working to lowering x64's requirement to SSE2, the baseline for the architecture, at which point I was actually hoping on removing theenable_simd
option altogether.In any case though wanted to throw this out there if you thought that adding
has_v
everywhere was going to be onerous. I suppose though x64 also asserts, for each emitted instruction, that the required instruction sets are all enabled so that would probably need to be added to the risc-v backend to double-check that thehas_v
guards, while not present, still didn't lead to accidentally emitting a vector instruction.
alexcrichton created PR review comment:
The corresponding
VecElementWidth::from_type
in Rust is infallible, so should this perhaps not bepartial
? (or should the Rust version returnOption
?)Sort of like above if this isn't
partial
it seems like it might be a good candidate for(convert ...)
alexcrichton created PR review comment:
We're on capstone 0.9 right now and it looks like the latest is 0.11, want to see if updating it would enable disassembling these instructions? If so I think it's reasonable to try to get in a
cargo vet
entry or an exemption since it's purely used during tests.
afonso360 created PR review comment:
I tried to upgrade and rerun the tests, but It doesn't seem to have helped. I also looked in the docs for capstone to check if we needed to enable some additional feature, but I only found one for the Compressed extension. So it looks like It still doesn't support it which is really unfortunate.
alexcrichton created PR review comment:
Ok now I'm a bit curious. Feel free to disregard this comment in the context of this PR as it's of course fine to have what's here and there's no need to deal with this for this PR.
First question though: how up-to-date is capstone in the Rust bindings? According to this the latest commit there is to update to https://github.com/capstone-engine/capstone/commit/3b2984212fe which is almost a year-old at this point.
...
Ok nevermind answered my own question. I saw
include/riscv.h
in the Rust bindings but didn't see it on themain
branch of thecapstone
repository. As the commit link above shows, though, there's anext
branch which it looks like the Rust bindings are drawing from which does indeed haveinclude/riscv.h
. A quick scan of that header and I don't think there's V-related instructions, or at least I don't see a huge wad ofV
-prefixed instructions in that header, so it's likely that capstone itself doesn't have support for this yet.
afonso360 created PR review comment:
Well, it is partial since CLIF technically supports
i128x{2,4,8,..}
vectors and we can't represent those here. But that seems like a weird edge case so I'm not sure if it's ok to ignore it.
afonso360 edited PR review comment.
afonso360 edited PR review comment.
alexcrichton created PR review comment:
From my naive point of view I'd say that i128-lanes should probably be removed rather than explicitly supported. That's just coming from my (limited) knowledge of the x64 and aarch64 backends which I think would quickly fall over if they were fed i128-lane vectors
afonso360 created PR review comment:
Checking
has_v
everywhere is quite verbose. But I'm also in favor of deleting theenable_simd
setting. (Andenable_float
!)I was planning on adding a
ty_vec_fits_in_reg
extractor after this PR, to enable some more vector types (i8x{2,4,..}
), I think restricting these instructions there would make some sense? i.e. if you don't have V, no vector type fits in a register.Having that extension assert at emit time sounds like a good idea either way.
alexcrichton created PR review comment:
Sounds reasonable to me!
afonso360 edited PR review comment.
afonso360 updated PR #6240.
afonso360 updated PR #6240.
afonso360 created PR review comment:
Sort of like above if this isn't partial it seems like it might be a good candidate for (convert ...)
I added the
convert
rule for this, but on a second thought, almost no instruction uses this. Most instructions get their element width fromvstate
. Loads and Stores are special in that they encode it in the opcode, with the reasoning being that, that would avoid unnecessarily switching states.So, I'm not sure it's worth adding that. It also felt slightly weird passing
ty
intovec_load
twice without any further description.
jameysharp created PR review comment:
When turning on Capstone disassembly in precise-output filetests recently, we discussed that Capstone isn't particularly up to date on most targets. Its main advantage is that it supports a lot of targets. I believe consensus was that we'd be happy to switch disassemblers but nobody wants to spend time investigating alternatives. Also I think we don't know of any one disassembler that supports all our targets, although it wouldn't be awful to use a different disassembler for each target, either.
alexcrichton submitted PR review:
Out of curiosity, @afonso360 do you have a planned reviewer or set-of-reviewers for the SIMD support for RISC-V? If not I'm giving my stamp of approval here as this is well designed, comprehensively tested, and follows the design of https://github.com/bytecodealliance/wasmtime/issues/6118 as well (at least all in my own opinion). I'm happy of course though to defer to the more seasoned Cranelift developers as well too.
cfallin submitted PR review:
I'll add a second r+ for the
EmitState
bits to update implicit vector-machine state: they are indeed exactly how I expected them to be as well, and the infra is minimally intrusive. Thanks!
jameysharp submitted PR review:
Oh, thanks for reminding me this is ready for review, Alex!
I have no concerns about the (quite minimal!) changes to the target-agnostic parts of Cranelift, and I think the implementation of the riscv-specific parts looks great too. I do have one question about the latter, but Afonso, I'll take your word for it either way. So feel free to merge if you're happy with it!
jameysharp submitted PR review:
Oh, thanks for reminding me this is ready for review, Alex!
I have no concerns about the (quite minimal!) changes to the target-agnostic parts of Cranelift, and I think the implementation of the riscv-specific parts looks great too. I do have one question about the latter, but Afonso, I'll take your word for it either way. So feel free to merge if you're happy with it!
jameysharp created PR review comment:
I'm not too familiar with how islands work, so: Does the emission of this instruction need to happen after
let mut start_off = sink.cur_offset();
below?
afonso360 created PR review comment:
I think calling
emit
before that check is correct. Since we recurse intoInst::emit
for thevsetivli
, we pass through that check again before emitting the instruction, and then once that is emitted we check it again for the original instruction.
afonso360 edited PR review comment.
afonso360 merged PR #6240.
Last updated: Jan 24 2025 at 00:11 UTC