Stream: cranelift

Topic: Adding a RISC-V32 32-bit Cranelift backend


view this post on Zulip Nihal Pasham (Nov 01 2024 at 10:29):

Hi! Just to preface this, I’m a complete compiler newbie.

I was wondering if there might be value in adding a RISC-V 32-bit (RISCV32) backend to Cranelift. Since a Riscv64Backend already exists, I’m assuming it could be adapted to generate RV32 binaries. However, I’d like to confirm whether a dedicated 32-bit backend might actually produce more efficient binaries.

If so, I’ve been exploring the source code, and here is my initial plan. I’d appreciate it if someone could take a look and let me know if this approach seems reasonable.

Plan for Building a Backend in Cranelift. GitHub Gist: instantly share code, notes, and snippets.

view this post on Zulip Nihal Pasham (Nov 03 2024 at 02:39):

Just double-checking to avoid potential overlaps or gotchas – does it make sense to add a 32-bit RISC-V backend to Cranelift?

view this post on Zulip Alex Crichton (Nov 03 2024 at 04:22):

I don't speak for the rest of cranelift folks by any means, but at least personally I'd love to see a 32-bit backend for Cranelift for risc-v. It's perhaps worth cautioning though that this is going to be a relatively significant undertaking since there is no preexisting 32-bit backend that's complete (there's a 32-bit "pulley" backend but it's not fleshed out yet).

Some of the things you'll have to grapple with are:

Those are some things off the top of my head but it's probably not a complete list. We've talked in Cranelift about features to make things like this easier in the past, such as better target-specific legalization support to lower, for example, 64-bit operations to 32-bit operations in the mid-end instead of the backend. That work never finished though and has large-ish remaining open questions. I say this as an example of open-ended design work that doesn't already have an answer in Cranelift and would probably want to be fleshed out along the way.

To be clear though I don't say this to dissuade you, I'd still at least personally love to see this! If you'd like to continue to pursue this though what I might recommend is to attend a Cranelift meeting (they happen weekly on Wednesdays) and we can chat more about it. For example we might want to figure out how to review your work to get it all landed as well (it's hard to get a whole backend in one go). The review part may be sort of hard since cranelift folks are stretched pretty thin right now though.

view this post on Zulip Nihal Pasham (Nov 03 2024 at 05:27):

Thank you for the detailed insights! I was thinking it might be simpler to start with a 32-bit backend for Cranelift that supports only up to 32-bit types (ints, floats, atomics) and excludes Rust’s higher bit types, such as u64, f64, and AtomicI64, as part of an initial implementation. Unsupported types could trigger a compiler error, making the backend’s limitations clear. I hope this approach is acceptable.

I’d also be interested in joining the Cranelift meeting to discuss this further. Is there a way for me to add myself to the invite?

P.S. supporting types with higher bit precision at a higher level could simplify the introduction of additional backends, like Arm-Thumb2, without overcomplicating the backend itself.

view this post on Zulip Afonso Bordado (Nov 03 2024 at 11:22):

I'd also love to see a RISC-V 32 bit backend! I have some time to review PR's but not as much as I once had.

If we can find a way to lower the 64 bit instructions to 32 bit instructions in the mid end, I think we can share a lot of rules with the RISC-V 64 backend with minimal changes. But doing that looks like its going to be hard. It would also solve the 128bit operations problem.

I think we can reuse pretty much all of the instruction encodings that I recently started moving into a separate file. That file isn't complete, but it should help out.

Starting with just supporting 32bit ops in the backend seems like a good idea.

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Nihal Pasham (Nov 03 2024 at 12:49):

Afonso Bordado said:

I think we can reuse pretty much all of the instruction encodings that I recently started moving into a separate file. That file isn't complete, but it should help out.

My current plan is to begin with the base instruction set, RV32I, and progressively add the mfac extensions. Most of the encode.rs content should be reusable, as it already includes R, I, and S-type encodings. For RISCV32I, I guess I would only need to add the remaining types: B, U, and J.

My understanding of the instruction formats is as follows:

275CE856-BE94-4269-A490-55B94382A560.jpg

Quick question - I am assuming RV64I and RV32I share the same instruction formats and a very similar base instruction set architecture (ISA). Should encode.rs in riscv64 also include B, U, J formats when it’s complete.

Starting with just supporting 32bit ops in the backend seems like a good idea.

Thank you for clarifying. :folded_hands:

view this post on Zulip Afonso Bordado (Nov 03 2024 at 14:02):

Yeah, the instruction formats are the same, they just aren't present in encode.rs since that is a fairly recent addition, and I haven't had the time to migrate the rest of the instruction formats to that file.

view this post on Zulip Alex Crichton (Nov 03 2024 at 14:38):

I was thinking it might be simpler to start with a 32-bit backend for Cranelift that supports only up to 32-bit types

Makes sense to me!

Is there a way for me to add myself to the invite?

There's a bit more info here but tl;dr; is DM Chris Fallin on Zulip

Contribute to bytecodealliance/meetings development by creating an account on GitHub.

view this post on Zulip Chris Fallin (Nov 03 2024 at 18:29):

Nick owns the calendar event now fwiw (fitzgen on zulip). And +1 to the above in general; have more thoughts I’ll try to write out later

view this post on Zulip Chris Fallin (Nov 03 2024 at 19:43):

To add a bit more: the tracking issue for this is #8768, and last time this came up I left a comment that also links four previous times I braindumped a bit on the general state of things for adding new backends

error: Wasmtime is being compiled for an architecture that it does not support. If this architecture is one you would like to see supported you may file an issue on Wasmtime's issue tracker: https:...
error: Wasmtime is being compiled for an architecture that it does not support. If this architecture is one you would like to see supported you may file an issue on Wasmtime's issue tracker: https:...

view this post on Zulip Chris Fallin (Nov 03 2024 at 19:44):

In this case in particular I think there is a strong argument for reusing almost all of the encoding machinery. I wonder even how close we could get to "64-bitness is a backend option", and share the backend altogether? Then essentially we disable all rules that assume 64-bit registers under that flag (and replace them with a lowering in midend as suggested above). Might need a little more parameterization around things like constants but if we can pull it off, that'd be more maintainable than the duplication implied by separate riscv32/riscv64 backends, IMHO

view this post on Zulip Afonso Bordado (Nov 03 2024 at 21:28):

I think that might be doable, I'm going to have a quick peek at our current rules to see which cases are incompatible, but I expect that as long as we don't ever see 128/64bit ops it might not be too bad

view this post on Zulip Nihal Pasham (Nov 07 2024 at 12:18):

Quick question—are MInst variants (defined in inst.isle) meant to strictly represent instruction formats for the target ISA, or is there more to it? I was reviewing the RV64 implementation and noticed several variants using the same instruction format along with some pseudo-instructions. For example:

    ;; I-type Layout:
    ;; 0-------6-7-------11-12------14-15------19-20------------------31
    ;; | Opcode |   rd     |  width   |   rs1    |     Offset[11:0]    |

    ;; The I-type Instruction Format i.e. uses one register source, one immediate and a destination register.
    (AluRRImm12
      (alu_op AluOPRRI)
      (rd WritableReg)
      (rs Reg)
      (imm12 Imm12))

    ;; Loads use the I-type Instruction Format.
    ;; Each load instruction in RV32I takes two operands
    ;; - A destination register (e.g., rd), where the data will be loaded.
    ;; - A base register (e.g., rs1) and an immediate offset, which together specify the memory address to load from.
    (Load
      (rd WritableReg)
      (op LoadOP)
      (flags MemFlags)
      (from AMode))

    ;; Uses the I-type Instruction Format. In non-immediate CSR instructions (CSRRW, CSRRS, CSRRC), rs1 is used to specify
    ;; the register with the value to write.
    (CsrReg
      (op CsrRegOP)
      (rd WritableReg)
      (rs Reg)
      (csr CSR))

    ;; Uses the I-type Instruction Format. In immediate CSR instructions (CSRRWI, CSRRSI, CSRRCI), rs1 is replaced by a 5-bit
    ;; immediate value.
    (CsrImm
      (op CsrImmOP)
      (rd WritableReg)
      (imm UImm5)
      (csr CSR))

In short, how should I read MInst (just for my understanding)

Note: I have added my comments to the above just to highlight my point.

view this post on Zulip Alex Crichton (Nov 07 2024 at 15:28):

MInst is a bit of a mish-mash but one of the main guiding principles of it is how shapes affect register allocation. FOr example AluRRImm12 has a destination and source register while CsrImm doesn't (I think?).

Overall though it's sort of what works best in the ISLE code, afaik there's not a hard-and-fast rule one way or another

view this post on Zulip Chris Fallin (Nov 07 2024 at 16:19):

yes, definitely that, and also emission. Think of the base case as "every inst is separate" and then we group together instructions that are all the same except for details we can plumb through, like opcode bits

view this post on Zulip Nihal Pasham (Nov 14 2024 at 09:04):

Quick question: The prelude_lower.isle file contains an internal extractor named has_type. I was trying to find its corresponding Rust implementation in the generated isle-riscv64.rs file in the build-out directory but couldn’t locate it. Am I missing something?

from prelude_lower.isle

;; Extract the type of the instruction's first result and pass along the
;; instruction as well.
(spec (has_type ty arg)
      (provide (= result arg))
      (require (= ty (widthof arg))))
(decl has_type (Type Inst) Inst)
(extractor (has_type ty inst)
           (and (result_type ty)
                inst))

view this post on Zulip Nihal Pasham (Nov 14 2024 at 11:07):

If I understand correctly, internal extractors don’t actually exist (in the sense that they dont generate rust code); they simply map to external extractors. In this case, has_type maps to:

(and (result_type ty)
                inst)

where result_type itself is an internal extractor that maps to first_result, which is the actual external extractor:

;; Extract the first result value of the given instruction.
(decl first_result (Value) Inst)
(extern extractor first_result first_result)

;; Extract the type of the instruction's first result.
(decl result_type (Type) Inst)
(extractor (result_type ty)
           (first_result (value_type ty)))

Am I right? I’m still not sure, however, if and is a constructor or an ISLE keyword.

view this post on Zulip Nihal Pasham (Nov 14 2024 at 14:15):

For the and part, I believe it’s part of the ISLE language grammar i.e. it's part of the syntax.

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Nihal Pasham (Nov 14 2024 at 14:35):

Did I get this right or am I way off in my understanding?

view this post on Zulip Chris Fallin (Nov 14 2024 at 17:11):

Yep, that's all correct! internal extractors are expanded (inlined) in-place; and and is a language keyword. All of this results in a call to the value_type and first_result Rust function implementations

view this post on Zulip Nihal Pasham (Dec 07 2024 at 09:34):

Quick question: Does it make sense to introduce a new immediate type (e.g., Imm32) specifically for 32-bit architectures? From my understanding, it seems like we could just use Imm64. I wanted to double-check if I’m overlooking anything here.

For instance, could we use Imm64 in such cases without running into any issues? The following are some examples of what I have in mind:

u64_from_imm64 -> u32_from_imm64  // Extract a u32 from an Imm64
u64_uextend_imm64 -> u32_uextend_imm64  // Zero-extend an Imm64 to a u32
A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

view this post on Zulip bjorn3 (Dec 07 2024 at 21:33):

Imm64 is mostly used for iconst, which should keep accepting 64bit immediates even on 32bit platforms. If you do iconst.i64 on a 32bit target it would just store both halves of the immediate in separate registers.

view this post on Zulip Nihal Pasham (Dec 10 2024 at 06:14):

Sorry for the delay; I got pulled into something else.

My plan is to start with RV32I (just the base instruction set), without support for types or values larger than 32 bits (i.e., no 64-bit or higher). For this target, I assume it’s acceptable to explicitly error out if we encounter an iconst.i64, correct?

As I understand it, lowering an Iconst Opcode results in emitting an 8-byte value into an in-memory constant pool. Would it make sense to add a 32-bit variant to VCodeConstantData (like a VCodeConstantData::U32), or is that unnecessary?

P.S. Compiler engineering is quite new to me, so please let me know if I’m overlooking something obvious.

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

view this post on Zulip Chris Fallin (Dec 10 2024 at 18:30):

Sure, a 32-bit constant-pool entry could make sense. It might be worth looking at how other compilers handle loading arbitrary 32-bit values too: for example, is it possible to do it with two immediate-form instructions (load high bits then OR in low bits or similar)? Usually RISC ISAs try to make this fast without going to dcache via a memory load and so have a "somewhat canonical" way of loading constants

view this post on Zulip Chris Fallin (Dec 10 2024 at 18:31):

(To explore that, it might be worthwhile writing some C functions like uint32_t foo() { return 0x12345678; } and compiling with a RISC-V 32 toolchain, or using Compiler Explorer (godbolt.org, add --target riscv32-unknown-linux-gnu to the Clang or rustc command line))

Compiler Explorer is an interactive online compiler which shows the assembly output of compiled C++, Rust, Go (and many more) code.

view this post on Zulip Afonso Bordado (Dec 10 2024 at 22:22):

You might want to look at the rules we have for the RV64 backend, the immediate loading instructions are exactly the same ones used in RV32.

This now has a lot of rules but it essentially boils down: A combination ofaddi and/or lui can produce all values up to 32bits, and for larger stuff we use a load from a constant pool unless we find a shorter pattern.

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

view this post on Zulip Nihal Pasham (Dec 11 2024 at 06:42):

Chris Fallin said:

(To explore that, it might be worthwhile writing some C functions like uint32_t foo() { return 0x12345678; } and compiling with a RISC-V 32 toolchain, or using Compiler Explorer (godbolt.org, add --target riscv32-unknown-linux-gnu to the Clang or rustc command line))

thanks for this. I'll try this.

view this post on Zulip Nihal Pasham (Dec 11 2024 at 07:56):

Afonso Bordado said:

You might want to look at the rules we have for the RV64 backend, the immediate loading instructions are exactly the same ones used in RV32.

This now has a lot of rules but it essentially boils down: A combination ofaddi and/or lui can produce all values up to 32bits, and for larger stuff we use a load from a constant pool unless we find a shorter pattern.

I am primarily reusing the RV64 backend, with the main difference being that I’m starting with support for RV32I (excluding all standard extensions). My focus is on creating a bare-minimum RV32 backend to avoid the complexity of handling the entire architecture upfront and to submit a small, manageable PR.

A working implementation can be found here. It reuses the addi and lui rules (both part of the RV32I set) from the RV64 implementation. However, when it came to loading from a constant pool, I used a workaround, which I have yet to fully test.

From your explanation, it looks like the RV64 approach to loading constants might be sufficient to RV32 as well.

A fast and secure runtime for WebAssembly. Contribute to nihalpasham/wasmtime development by creating an account on GitHub.
A fast and secure runtime for WebAssembly. Contribute to nihalpasham/wasmtime development by creating an account on GitHub.

Last updated: Dec 23 2024 at 12:05 UTC