Stream: cranelift

Topic: Cranelift architecture overview - documentation


view this post on Zulip Nihal Pasham (Jun 10 2024 at 09:41):

Hi,

I'm wondering if there are any updates on this - https://github.com/bytecodealliance/rfcs/blob/main/accepted/cranelift-roadmap-2023.md#cranelift-porting-how-to-write-a-backend.

Context: I am new to compiler engineering (i.e., I've never worked on compiler development before but have some experience with Rust). I looked at the Cranelift repository and feel a bit lost. I'm trying to understand how we go from a given .clif file (assuming it's a single function) to actual machine code. So far, my understanding is as follows:

CLIF --> ISLE-based Lowering to arch-specific VCode --> Legalization --> Some e-graph-based mid-end optimizations --> Register allocation via regalloc2 --> Final lowering to VCode --> Emitting machine instructions

Additional questions:

RFC process for Bytecode Alliance projects. Contribute to bytecodealliance/rfcs development by creating an account on GitHub.

view this post on Zulip bjorn3 (Jun 10 2024 at 14:31):

There is first optimizations on clif ir which includes the e-graph based optimizations and somewhere in the middle of the optimization pipeline legalizations are performed. Then lowering to arch-specific VCode which does some peep-hole optimizations at the same time, then regalloc and finally emitting machine code.

view this post on Zulip Chris Fallin (Jun 10 2024 at 15:47):

hi @Nihal Pasham -- the pipeline you wrote out is a bit out of order; it goes something like:

CLIF -> legalization -> mid-end egraph rewrites if enabled (rules in ISLE) -> lowering to backend-specific VCode (rules in ISLE) -> regalloc -> binary emission

in general we should have better docs, though the module-level doc comments for certain pieces try to be fairly helpful at the low-to-mid level; I'd encourage reading through the code in machinst/ (and context.rs in the cranelift-codegen toplevel is where the compiler driver is, so you can trace from there too).

The specific "how to write a backend" tutorial idea you link to unfortunately never happened because, well, there's always too much to do; your best bet at this point would be reading through the other backends probably (I'm partial to aarch64 as maybe the cleanest at the moment?).

Finally, there's no reason Cranelift couldn't target Thumb-2 or any other ISA like that as long as it's not too "unusual": machine code with registers and jumps between instructions, etc. (For an example of a little too unusual, we've talked about targeting GPUs before, and there the differences in regalloc and the control flow handling will take some thought; but Thumb-2 is "just" another CPU ISA in comparison)

view this post on Zulip Nihal Pasham (Jun 11 2024 at 09:06):

@bjorn3 and @Chris Fallin - thank you.

For an example of a little too unusual, we've talked about targeting GPUs before, and there the differences in regalloc and the control flow handling will take some thought

view this post on Zulip Chris Fallin (Jun 11 2024 at 15:32):

Nihal Pasham said:

Yes, it implements LICM, DCE, GVN, as well as an alias analysis that feeds into redundant-load elimination, and finally (probably most importantly) it runs a bunch of rewrite rules in cranelift/codegen/src/opts/, including constant propagation/constant folding as well as a lot of algebraic identities.

You can find more detail on how the mid-end works in a talk I gave last year (video, slides), though we should probably write a paper or something eventually to describe all the inner workings. And again we try to leave reasonable comments that you can read (in this case in egraph.rs and egraph/elaborate.rs).

Got it. Does the same apply to more specialized hardware such as TPUs, NPUs, and DSPs, which come with specialized instruction sets designed specifically for neural network (tensor) operations or signal processing? Would these architectures be more or less amenable to Cranelift's current design?

DSPs are probably not too bad -- they're "normal" except for the VLIW instruction scheduling that most require, which we could handle in a single pass during instruction emission in the worst case -- but TPUs/NPUs are not really general-purpose processors (more like matrix-multiplication accelerators) and so are not suitable for running the general-purpose code that Cranelift compiles.

view this post on Zulip Jacob Lifshay (Jun 12 2024 at 04:24):

AMD's AIE NPU looks much more like a CPU with SIMD to me, it is VLIW with scalar and vector units, afaict it is turing complete (within the memory limits of course): https://docs.amd.com/r/en-US/am020-versal-aie-ml/AIE-ML-Architecture

view this post on Zulip Jacob Lifshay (Jun 12 2024 at 04:25):

so cranelift could probably target it, but you'd want to vectorize most of your code before feeding it in, or you'd end up with a small fraction of the performance due to just using scalar instructions.

view this post on Zulip bjorn3 (Jun 12 2024 at 07:14):

VLIW means that the compiler has to do instruction scheduling, which Cranelift doesn't currently do.

view this post on Zulip Kirp (Jun 12 2024 at 20:35):

Chris Fallin said:

Nihal Pasham said:

Yes, it implements LICM, DCE, GVN, as well as an alias analysis that feeds into redundant-load elimination, and finally (probably most importantly) it runs a bunch of rewrite rules in cranelift/codegen/src/opts/, including constant propagation/constant folding as well as a lot of algebraic identities.

I’m probably wrong but don’t DCE and GVN (or equivalents) actually come ‘free’ with the transform to aegraph?

view this post on Zulip Kirp (Jun 12 2024 at 20:35):

Chris Fallin said:

Nihal Pasham said:

Yes, it implements LICM, DCE, GVN, as well as an alias analysis that feeds into redundant-load elimination, and finally (probably most importantly) it runs a bunch of rewrite rules in cranelift/codegen/src/opts/, including constant propagation/constant folding as well as a lot of algebraic identities.

I’m probably wrong but don’t DCE and GVN (or equivalents) actually come ‘free’ with the transform to aegraph?

view this post on Zulip Chris Fallin (Jun 12 2024 at 21:27):

Yes, exactly, hence the mid-end implements them :-)

view this post on Zulip Chris Fallin (Jun 12 2024 at 21:28):

(LICM falls out of the way we do code placement during aegraph->CFG elaboration too, it requires intentional choices but it is also not a separate pass)


Last updated: Nov 22 2024 at 17:03 UTC