wasmtime / PR #6240 riscv64: Initial SIMD Vector Implemen... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #6240 riscv64: Initial SIMD Vector Implemen...

Wasmtime GitHub notifications bot (Apr 19 2023 at 12:12):

afonso360 opened PR #6240 from afonso360:riscv-vector to bytecodealliance:main:

:wave: Hey,

This is the initial PR for implementing SIMD types with RISC-V Vectors, as described in #6118.

This PR implements only iadd,load,store for SIMD types, the minimum required to get an iadd runtest working.

We currently only support 128bit SIMD types, but It's really easy to change that to 64bit or 32bit or any other width SIMD types as long as they fit in a single register, so I'm also planning on doing that in a future PR.

I didn't include any calling conventions changes, so we have a non standard calling convention for now. The RISC-V Calling Convention passes everything via stack and does not use vector registers to pass arguments. We've inherited the Floating Point part of the calling convention, so we are passing arguments in vector registers. I didn't want to make the initial PR too large, so I decided to make that part of a future PR.

QEMU's register size is set to 256 bits to avoid accidentally depending on 128b vector registers in the future.

Wasmtime GitHub notifications bot (Apr 19 2023 at 12:12):

afonso360 requested wasmtime-compiler-reviewers for a review on PR #6240.

Wasmtime GitHub notifications bot (Apr 19 2023 at 12:12):

afonso360 requested abrown for a review on PR #6240.

Wasmtime GitHub notifications bot (Apr 19 2023 at 12:12):

afonso360 requested wasmtime-default-reviewers for a review on PR #6240.

Wasmtime GitHub notifications bot (Apr 19 2023 at 12:13):

afonso360 created PR review comment:

This ends up generating a lot of instructions, and is far from ideal. But It's also what we do in regular loads/stores, so I figured it would be best to tackle those together in a future PR.

Wasmtime GitHub notifications bot (Apr 19 2023 at 12:15):

afonso360 edited PR #6240:

:wave: Hey,

This is the initial PR for implementing SIMD types with RISC-V Vectors, as described in #6118.

This PR implements only iadd,load,store for SIMD types, the minimum required to get an iadd runtest working.

We currently only support 128bit SIMD types, but It's really easy to change that to 64bit or 32bit or any other width SIMD types as long as they fit in a single register, so I'm also planning on doing that in a future PR.

I didn't include any calling conventions changes, so we have a non standard calling convention for now. The RISC-V Calling Convention passes everything via stack and does not use vector registers to pass arguments. We've inherited the Floating Point part of the calling convention, so we are passing arguments in vector registers. I didn't want to make the initial PR too large, so I decided to make that part of a future PR.

QEMU's register size is set to 256 bits to avoid accidentally depending on 128b vector registers in the future.

None of the vector instructions are recognized by capstone so they show up as .byte directives. I've been manually checking these agains llvm-objdump.

Wasmtime GitHub notifications bot (Apr 19 2023 at 12:15):

afonso360 edited PR #6240:

:wave: Hey,

This is the initial PR for implementing SIMD types with RISC-V Vectors, as described in #6118.

This PR implements only iadd,load,store for SIMD types, the minimum required to get an iadd runtest working.

We currently only support 128bit SIMD types, but It's really easy to change that to 64bit or 32bit or any other width SIMD types as long as they fit in a single register, so I'm also planning on doing that in a future PR.

I didn't include any calling conventions changes, so we have a non standard calling convention for now. The RISC-V Calling Convention passes everything via stack and does not use vector registers to pass arguments. We've inherited the Floating Point part of the calling convention, so we are passing arguments in vector registers. I didn't want to make the initial PR too large, so I decided to make that part of a future PR.

QEMU's register size is set to 256 bits to avoid accidentally depending on 128b vector registers in the future.

None of the vector instructions are recognized by capstone so they show up as .byte directives. I've been manually checking these against llvm-objdump.

Wasmtime GitHub notifications bot (Apr 19 2023 at 12:18):

afonso360 updated PR #6240.

Wasmtime GitHub notifications bot (Apr 19 2023 at 12:22):

afonso360 updated PR #6240.

Wasmtime GitHub notifications bot (Apr 19 2023 at 12:35):

afonso360 updated PR #6240.

Wasmtime GitHub notifications bot (Apr 19 2023 at 16:00):

alexcrichton submitted PR review:

Some drive-by thoughts from me, but this great to see! Thanks for working on this!

Wasmtime GitHub notifications bot (Apr 19 2023 at 16:00):

alexcrichton submitted PR review:

Some drive-by thoughts from me, but this great to see! Thanks for working on this!

Wasmtime GitHub notifications bot (Apr 19 2023 at 16:00):

alexcrichton created PR review comment:

This might be a good candidate for a (convert ...) if it's going to show up a lot perhaps?

Wasmtime GitHub notifications bot (Apr 19 2023 at 16:00):

alexcrichton created PR review comment:

Food-for-thought question on this -- this is likely going to be quite common (everything gated by has_v), so one option would be to do what the x64 backend does today where if enable_simd is turned on then the code generator refuses to run unless has_v is enabled (or in x64's case it's SSE3, SSSE3, and SSE4.1). That being said I'm working to lowering x64's requirement to SSE2, the baseline for the architecture, at which point I was actually hoping on removing the enable_simd option altogether.

In any case though wanted to throw this out there if you thought that adding has_v everywhere was going to be onerous. I suppose though x64 also asserts, for each emitted instruction, that the required instruction sets are all enabled so that would probably need to be added to the risc-v backend to double-check that the has_v guards, while not present, still didn't lead to accidentally emitting a vector instruction.

Wasmtime GitHub notifications bot (Apr 19 2023 at 16:00):

alexcrichton created PR review comment:

The corresponding VecElementWidth::from_type in Rust is infallible, so should this perhaps not be partial? (or should the Rust version return Option?)

Sort of like above if this isn't partial it seems like it might be a good candidate for (convert ...)

Wasmtime GitHub notifications bot (Apr 19 2023 at 16:00):

alexcrichton created PR review comment:

We're on capstone 0.9 right now and it looks like the latest is 0.11, want to see if updating it would enable disassembling these instructions? If so I think it's reasonable to try to get in a cargo vet entry or an exemption since it's purely used during tests.

Wasmtime GitHub notifications bot (Apr 19 2023 at 16:29):

afonso360 created PR review comment:

I tried to upgrade and rerun the tests, but It doesn't seem to have helped. I also looked in the docs for capstone to check if we needed to enable some additional feature, but I only found one for the Compressed extension. So it looks like It still doesn't support it which is really unfortunate.

Wasmtime GitHub notifications bot (Apr 19 2023 at 16:41):

alexcrichton created PR review comment:

Ok now I'm a bit curious. Feel free to disregard this comment in the context of this PR as it's of course fine to have what's here and there's no need to deal with this for this PR.

First question though: how up-to-date is capstone in the Rust bindings? According to this the latest commit there is to update to https://github.com/capstone-engine/capstone/commit/3b2984212fe which is almost a year-old at this point.

...

Ok nevermind answered my own question. I saw include/riscv.h in the Rust bindings but didn't see it on the main branch of the capstone repository. As the commit link above shows, though, there's a next branch which it looks like the Rust bindings are drawing from which does indeed have include/riscv.h. A quick scan of that header and I don't think there's V-related instructions, or at least I don't see a huge wad of V-prefixed instructions in that header, so it's likely that capstone itself doesn't have support for this yet.

Wasmtime GitHub notifications bot (Apr 19 2023 at 17:10):

afonso360 created PR review comment:

Well, it is partial since CLIF technically supports i128x{2,4,8,..} vectors and we can't represent those here. But that seems like a weird edge case so I'm not sure if it's ok to ignore it.

Wasmtime GitHub notifications bot (Apr 19 2023 at 17:15):

afonso360 edited PR review comment.

Wasmtime GitHub notifications bot (Apr 19 2023 at 17:16):

afonso360 edited PR review comment.

Wasmtime GitHub notifications bot (Apr 19 2023 at 17:25):

alexcrichton created PR review comment:

From my naive point of view I'd say that i128-lanes should probably be removed rather than explicitly supported. That's just coming from my (limited) knowledge of the x64 and aarch64 backends which I think would quickly fall over if they were fed i128-lane vectors

Wasmtime GitHub notifications bot (Apr 19 2023 at 17:35):

afonso360 created PR review comment:

Checking has_v everywhere is quite verbose. But I'm also in favor of deleting the enable_simd setting. (And enable_float!)

I was planning on adding a ty_vec_fits_in_reg extractor after this PR, to enable some more vector types (i8x{2,4,..}), I think restricting these instructions there would make some sense? i.e. if you don't have V, no vector type fits in a register.

Having that extension assert at emit time sounds like a good idea either way.

Wasmtime GitHub notifications bot (Apr 19 2023 at 17:38):

alexcrichton created PR review comment:

Sounds reasonable to me!

Wasmtime GitHub notifications bot (Apr 19 2023 at 17:38):

afonso360 edited PR review comment.

Wasmtime GitHub notifications bot (Apr 19 2023 at 18:20):

afonso360 updated PR #6240.

Wasmtime GitHub notifications bot (Apr 19 2023 at 18:24):

afonso360 updated PR #6240.

Wasmtime GitHub notifications bot (Apr 19 2023 at 18:27):

afonso360 created PR review comment:

Sort of like above if this isn't partial it seems like it might be a good candidate for (convert ...)

I added the convert rule for this, but on a second thought, almost no instruction uses this. Most instructions get their element width from vstate. Loads and Stores are special in that they encode it in the opcode, with the reasoning being that, that would avoid unnecessarily switching states.

So, I'm not sure it's worth adding that. It also felt slightly weird passing ty into vec_load twice without any further description.

Wasmtime GitHub notifications bot (Apr 19 2023 at 19:08):

jameysharp created PR review comment:

When turning on Capstone disassembly in precise-output filetests recently, we discussed that Capstone isn't particularly up to date on most targets. Its main advantage is that it supports a lot of targets. I believe consensus was that we'd be happy to switch disassemblers but nobody wants to spend time investigating alternatives. Also I think we don't know of any one disassembler that supports all our targets, although it wouldn't be awful to use a different disassembler for each target, either.

Wasmtime GitHub notifications bot (Apr 20 2023 at 14:21):

alexcrichton submitted PR review:

Out of curiosity, @afonso360 do you have a planned reviewer or set-of-reviewers for the SIMD support for RISC-V? If not I'm giving my stamp of approval here as this is well designed, comprehensively tested, and follows the design of https://github.com/bytecodealliance/wasmtime/issues/6118 as well (at least all in my own opinion). I'm happy of course though to defer to the more seasoned Cranelift developers as well too.

Wasmtime GitHub notifications bot (Apr 20 2023 at 18:27):

cfallin submitted PR review:

I'll add a second r+ for the EmitState bits to update implicit vector-machine state: they are indeed exactly how I expected them to be as well, and the infra is minimally intrusive. Thanks!

Wasmtime GitHub notifications bot (Apr 20 2023 at 18:28):

jameysharp submitted PR review:

Oh, thanks for reminding me this is ready for review, Alex!

I have no concerns about the (quite minimal!) changes to the target-agnostic parts of Cranelift, and I think the implementation of the riscv-specific parts looks great too. I do have one question about the latter, but Afonso, I'll take your word for it either way. So feel free to merge if you're happy with it!

Wasmtime GitHub notifications bot (Apr 20 2023 at 18:28):

jameysharp submitted PR review:

Oh, thanks for reminding me this is ready for review, Alex!

I have no concerns about the (quite minimal!) changes to the target-agnostic parts of Cranelift, and I think the implementation of the riscv-specific parts looks great too. I do have one question about the latter, but Afonso, I'll take your word for it either way. So feel free to merge if you're happy with it!

Wasmtime GitHub notifications bot (Apr 20 2023 at 18:28):

jameysharp created PR review comment:

I'm not too familiar with how islands work, so: Does the emission of this instruction need to happen after let mut start_off = sink.cur_offset(); below?

Wasmtime GitHub notifications bot (Apr 20 2023 at 21:53):

afonso360 created PR review comment:

I think calling emit before that check is correct. Since we recurse into Inst::emit for the vsetivli, we pass through that check again before emitting the instruction, and then once that is emitted we check it again for the original instruction.

Wasmtime GitHub notifications bot (Apr 20 2023 at 21:54):

afonso360 edited PR review comment.

Wasmtime GitHub notifications bot (Apr 20 2023 at 22:25):

afonso360 merged PR #6240.

Last updated: Apr 18 2025 at 10:03 UTC