blog post on new backend · cranelift

(cc @Till Schneidereit ; I know I promised this all the way back in April or so; sorry for the delay :-) )

fitzgen (he/him) (Sep 18 2020 at 16:05):

Chris Fallin (Sep 18 2020 at 16:05):

I don't do twitter (hi it's me stuck in 2006! may change sometime) but please feel free to tweet it!

fitzgen (he/him) (Sep 19 2020 at 19:54):

@Chris Fallin The new backend is doing greedy search for isel right? No cost model and dynamic programming correct?

fitzgen (he/him) (Sep 19 2020 at 19:55):

@fitzgen Is there an explicit cost model for the instructions? If I am understanding correctly, this is doing greedy instruction selection (maximal munch), there's no dynamic programming involved.

- Varun Gandhi (@typesanitizer)

fitzgen (he/him) (Sep 19 2020 at 19:55):

Chris Fallin (Sep 19 2020 at 20:35):

Yes, exactly, the lowering for a given instruction and its operands is deterministic and maximal. We've had some discussions about heuristics to tune this, e.g. one rule that we tried (but quickly discarded) was to avoid merging operations with > 1 use. It turned out to be better at least for our patterns (on AArch64) to always be greedy, because the operations we're merging tend to be free (e.g. built-in register extends/shifts).

Chris Fallin (Sep 19 2020 at 20:36):

(it occurs to me that I should perhaps join tech-twitter someday so as not to rely on middlefolks such as yourself. Thanks for relaying the question!)

Julian Seward (Sep 19 2020 at 20:38):

That's true (re burg). IMO burg is/was useful for very complex insn sets, most particularly VAX, but in the modern era (arm64, riscv, and to a large extent the subset of x64 that executes fast), it's pretty much irrelevant.

Amanieu (Sep 19 2020 at 21:29):

Note that the AArch64 built-in shift/extend isn't actually free: it adds an extra cycle of latency to the instruction and reduces the max throughput from 3 to 1 since it needs to use the complex ALU execution unit.

Amanieu (Sep 19 2020 at 21:29):

Amanieu (Sep 19 2020 at 21:31):

So in theory if the result of the shift/extend is used by multiple instructions then you're better off leaving it as a separate instruction.

Chris Fallin (Sep 19 2020 at 21:48):

Chris Fallin (Sep 19 2020 at 21:51):

Thinking about this a bit more -- there are implications re: register pressure as well. If we can't spare a temp for the shifted/extended value (and the destination of the add or whatever is also a source, so we can't use it as a temp), then the builtin op saves us. Definitely something to consider more carefully if/when we have a more aggressively-optimized isel mode.

Amanieu (Sep 19 2020 at 21:52):

Another (more common) case is folding an extend into a load: you really don't want to have to duplicate the load! However I will admit that cases where a value is actually used multiple times are quite rare in practice, so it is possible that this won't matter much in real programs.

Amanieu (Sep 19 2020 at 21:56):

Could the register pressure issue be somewhat alleviated by adding rematerialization support to the register allocator?

Chris Fallin (Sep 19 2020 at 21:56):

Indeed. Re: loads, we consider those side-effecting now (both because of traps, and because we've punted on the memory-ordering question until we can think more carefully about it) so we won't ever duplicate. But it's certainly a matter to consider once we adapt e.g. x86 to use memory operands directly when it sees add-from-load.

Chris Fallin (Sep 19 2020 at 21:57):

Re-mat is something that could help in general, yup; another thing on the list that we haven't gotten to quiet yet...

Chris Fallin (Sep 19 2020 at 21:59):

(There are two levels, the first being direct-reload from a spillslot when an ISA supports mem operands; the second is the more general case where we track expressions of pure operators to some depth. Want to get to at least the former, and maybe a simple form of the latter)

bjorn3 (Sep 21 2020 at 12:01):

mental (Sep 21 2020 at 20:15):

I just finished reading the post (I noticed it floating around reddit) and I must say, it's very well done! I very much enjoyed reading it :D

Chris Fallin (Sep 21 2020 at 20:25):

Alyssa Rosenzweig (Oct 10 2020 at 02:05):

Alyssa Rosenzweig (Oct 10 2020 at 02:11):

I bumped into the "linked lists in Rust" problem trying to model backend IR, so I appreciate seeing there's a "better" way for that too. Still new to Rust but from what I've seen cranelift is a work of art. Nice to see it getting even better :heart_eyes:

Alyssa Rosenzweig (Oct 10 2020 at 02:15):

Use-counting is a nice touch as well... wouldn't mind dropping our own fixed-point backend DCE... (though for our stuff the real issue is RA/scheduling)

Chris Fallin (Oct 10 2020 at 05:09):

Alyssa Rosenzweig (Oct 10 2020 at 15:31):

Part of me was hoping to see one of those fancy polynomial-time SSA-based optimal RA algorithms c:

Alyssa Rosenzweig (Oct 10 2020 at 15:41):

Amanieu (Oct 10 2020 at 16:15):

@Alyssa Rosenzweig Actually the old cranelift backend does use SSA-based register allocation.

Alyssa Rosenzweig (Oct 10 2020 at 16:15):

Amanieu (Oct 10 2020 at 16:16):

bytecodealliance/wasmtime

Standalone JIT-style runtime for WebAssembly, using Cranelift - bytecodealliance/wasmtime

Amanieu (Oct 10 2020 at 16:18):

bytecodealliance/regalloc.rs

Modular register allocator algorithms. Contribute to bytecodealliance/regalloc.rs development by creating an account on GitHub.

Alyssa Rosenzweig (Oct 10 2020 at 16:18):

Alyssa Rosenzweig (Oct 10 2020 at 16:27):

There's also something to be said about register pressure / spills having more to do with scheduling than RA, for which the literature isn't quite there (at least on the GPU side). But I digress.

Alyssa Rosenzweig (Oct 10 2020 at 16:49):

@Amanieu --Alright, reading through the first link has successfully (re)convinced me to let other people do out-of-SSA and do RA after, thanks

Amanieu (Oct 10 2020 at 16:50):

Alyssa Rosenzweig (Oct 10 2020 at 16:54):

How many gotchas there are absolutely everywhere for real machines. Something I was prepared for, but underscored seeing a real production implementation.

Alyssa Rosenzweig (Oct 10 2020 at 16:55):

I have a regalloc algorithm implemented in a few hundred lines of C, with excellent support for aliasing and precolouring. It can't claim to be optimal in any sense, but the order-of-magnitude difference in complexity is... noted.

Alyssa Rosenzweig (Oct 10 2020 at 16:56):

(I admit that's an unfair comparison, since out-of-SSA is happening elsewhere with a sufficiently complicated algorithm, and the backend still drives that with its own support code. And liveness analysis is separate since it's used for DCE too. So maybe the gap isn't as aggressive as it looks.)

Alyssa Rosenzweig (Oct 10 2020 at 17:01):

[Maybe the real takeaway is that RA is complicated no matter how you try to hide the complexity... and the cranelift implementations are just more honest about that :sweat_smile: ]

Alyssa Rosenzweig (Oct 10 2020 at 17:08):

If nothing else - I appreciate seeing how production Rust compilers look like as I [redacted]

Stream: cranelift

Topic: blog post on new backend

Chris Fallin (Sep 18 2020 at 15:55):

Chris Fallin (Sep 18 2020 at 15:55):

fitzgen (he/him) (Sep 18 2020 at 16:05):

Chris Fallin (Sep 18 2020 at 16:05):

fitzgen (he/him) (Sep 19 2020 at 19:54):

fitzgen (he/him) (Sep 19 2020 at 19:55):

fitzgen (he/him) (Sep 19 2020 at 19:55):

Chris Fallin (Sep 19 2020 at 20:35):

Chris Fallin (Sep 19 2020 at 20:36):

Julian Seward (Sep 19 2020 at 20:38):

Amanieu (Sep 19 2020 at 21:29):

Amanieu (Sep 19 2020 at 21:29):

Amanieu (Sep 19 2020 at 21:31):

Chris Fallin (Sep 19 2020 at 21:48):

Chris Fallin (Sep 19 2020 at 21:51):

Amanieu (Sep 19 2020 at 21:52):

Amanieu (Sep 19 2020 at 21:56):

Chris Fallin (Sep 19 2020 at 21:56):

Chris Fallin (Sep 19 2020 at 21:57):

Chris Fallin (Sep 19 2020 at 21:59):

bjorn3 (Sep 21 2020 at 12:01):

mental (Sep 21 2020 at 20:15):

Chris Fallin (Sep 21 2020 at 20:25):

Alyssa Rosenzweig (Oct 10 2020 at 02:05):

Alyssa Rosenzweig (Oct 10 2020 at 02:11):

Alyssa Rosenzweig (Oct 10 2020 at 02:15):

Chris Fallin (Oct 10 2020 at 05:09):

Alyssa Rosenzweig (Oct 10 2020 at 15:31):

Alyssa Rosenzweig (Oct 10 2020 at 15:41):

Amanieu (Oct 10 2020 at 16:15):

Alyssa Rosenzweig (Oct 10 2020 at 16:15):

Amanieu (Oct 10 2020 at 16:16):

Amanieu (Oct 10 2020 at 16:18):

Alyssa Rosenzweig (Oct 10 2020 at 16:18):

Alyssa Rosenzweig (Oct 10 2020 at 16:27):

Alyssa Rosenzweig (Oct 10 2020 at 16:49):

Amanieu (Oct 10 2020 at 16:50):

Alyssa Rosenzweig (Oct 10 2020 at 16:54):

Alyssa Rosenzweig (Oct 10 2020 at 16:55):

Alyssa Rosenzweig (Oct 10 2020 at 16:56):

Alyssa Rosenzweig (Oct 10 2020 at 17:01):

Alyssa Rosenzweig (Oct 10 2020 at 17:08):