Cranelift regalloc in Google Summer of Code via Rust project · cranelift

For anyone interested here, a heads-up: via the Rust project's participation in Google Summer of Code, there is a project idea described here that would involve building a fast (very simple, non-backtracking, single-pass) register allocator behind the regalloc2 API to plug into Cranelift for a fast build mode. @Amanieu and I both volunteered to mentor anyone who wants to take on the project; see the Rust project's blog post for the process to apply / get involved.

GitHub - rust-lang/google-summer-of-code: Rust project ideas for Google Summer of Code

Rust project ideas for Google Summer of Code. Contribute to rust-lang/google-summer-of-code development by creating an account on GitHub.

Rust participates in Google Summer of Code 2024 | Rust Blog

Empowering everyone to build reliable and efficient software.

long_long_float (Feb 26 2024 at 01:46):

Hi, I'm interested in this project. To understand this project, are there any issues that I can work on at the beginning? Thank you.

Chris Fallin (Feb 26 2024 at 02:55):

@long_long_float we unfortunately don't really have any starter issues for this; in theory working on the existing (more complex, backtracking/liverange-splitting) allocator would give you a lot of background but it would also likely be a very painful ramp-up when the goal is to replace it with something simpler :-) It might be best to study the public API of the regalloc2 crate, as a starting point, along with various papers on regalloc (e.g. the classical linear scan paper, though at least I'm envisioning something even a little simpler still, without a pre-analysis to find liveranges but rather a single forward pass)

long_long_float (Feb 26 2024 at 03:13):

long_long_float (Mar 05 2024 at 16:11):

I have a question about the VReg in VCode. Can it express registers for float or SIMD other than integer?
I'm planning to implement the first allocator for only integers. I want to know whether it is easy to extend the allocator for floats.

Chris Fallin (Mar 05 2024 at 16:36):

@long_long_float yes it can; VCode must represent any code that Cranelift can generate. Two things though: (i) it’s totally fine to start during development with an allocator that just panics in cases it doesn’t handle yet; but also (ii) you can think of the regalloc problem as mostly N separate problems for N register classes, as they don’t really interact.

Chris Fallin (Mar 05 2024 at 16:48):

(That's not 100% true; spillslot allocation needs to be shared, but that can be as simple as "all classes ask the same allocator for space", and in fact I wouldn't recommend trying to share spillslots between classes)

long_long_float (Mar 06 2024 at 08:59):

Thank you! In my understanding, spillslot means memory region to place spilled registers. Is it right?
And does "all classes ask the same allocator for space" mean that the allocator prepares spillslots for each classes?

Chris Fallin (Mar 06 2024 at 14:09):

long_long_float (Mar 10 2024 at 10:27):

Hi, I have a question. We decided to use SSRA with Reverse Linear Scan Allocation. My understanding is that the allocation requires the instruction to be SSA. However VCode doesn't seem to be SSA form. Is it possible to make the output value of a VCode instruction a single definition?

Chris Fallin (Mar 10 2024 at 16:19):

@long_long_float VCode is actually SSA; a single instruction may have multiple defs, but any given register only has one def. (Various people might have different definitions of SSA, but the salient property for us at least is one-def-per-register.)

Chris Fallin (Mar 10 2024 at 16:20):

I don't think that should be problematic for an algorithm that wants to see defs/uses though -- you can iterate over the defs in the instruction when you scan it (as if there were multiple instructions)?

Squaaawk (Mar 10 2024 at 20:24):

In the past couple of days I've thrown together a simple implementation of SSRA with support for branching across multiple basic blocks. Right now, the project is directly substituted in place of regalloc2's current allocator, and supports just enough features for evaluation of a toy expression language (w/ conditionals) and the sample long script provided in Matt's blog, recreating the register pressure graph at the bottom.

Now I'm focusing on testing and guaranteeing correctness of the allocator. The ion_checker fuzzer appears to pass everything, even when I intentionally introduce known bugs. It does, however, catch known bugs I introduce into regalloc2. I'm clearly not understanding the tool here. Can you give me an overview of what guarantees the fuzzer should provide, and if there is any existing tooling around for testing and eventually benchmarking the project?

Chris Fallin (Mar 10 2024 at 20:32):

The ion_checker fuzz target is checking that the dataflow of the allocation result is equivalent to the dataflow of the pre-allocation program with virtual registers. Are you seeing oracle validation even when there is an allocation result you know to be incorrect (manually verified)? I'd check the basics too just to be sure, e.g. that the fuzz target is running your new allocator rather than ion.

The next testing + benchmarking step beyond fuzzing is use in the main (only public?) embedding of regalloc2, namely Cranelift. If you have a branch of RA2 that uses your allocator by default instead, and it is correct, dropping it in (by editing the cargo dep in a wasmtime tree to a local path, for example) should give you a wasmtime that can compile and run Wasms. We generally use Sightglass to benchmark that, but for compilation speed, any large wasm file will do.

(The final state of things will be that we release regalloc2 with the new allocator merged in and available as an option in RegallocOptions, and then add a Cranelift flag to set it, so this manual setup won't be necessary in the end.)

Chris Fallin (Mar 10 2024 at 20:33):

oh, and of course the cg_clif benchmarking side -- the other major use of Cranelift aside from Wasmtime -- @bjorn3 can say more about what is idiomatic to do that, I think

bjorn3 (Mar 11 2024 at 10:54):

For cg_clif, you can use ./y.sh bench in the cg_clif repo as quick benchmark. For anything serious you will need to use https://github.com/rust-lang/rustc-perf/ though to run the benchmark suite of rustc itself. Make sure to skip the opt benchmarks though as those don't make much sense for cg_clif and you probably want to skip the check benchmarks too as they should have identical performance independent of how fast the codegen backend is.

GitHub - rust-lang/rustc-perf: Website for graphing performance of rustc

Website for graphing performance of rustc. Contribute to rust-lang/rustc-perf development by creating an account on GitHub.

Squaaawk (Mar 14 2024 at 21:07):

I have a number of todo!() statements that were never triggered in a successful overnight of fuzz testing. How feature complete is the ion_checker fuzzer expected to be — should I expect it to do a "pretty good job" of testing pretty much every case for my allocator, or is there more work to do here? Additionally, is an overnight of fuzzing reasonable, or would it take significantly longer to stumble across every case?

Chris Fallin (Mar 14 2024 at 22:02):

@Squaaawk ion_fuzzer is meant to exercise nearly every feature of RA2. I have found fuzzbugs after several days of fuzzing on 32 cores, especially after things have already settled down (easy fuzzbugs fixed). Can you say what your todo!()'d cases are more specifically?

Squaaawk (Mar 14 2024 at 22:53):

Alright, it sounds like for the most part just more time fuzzing is needed. I'm not too woried about tracking down individual missed cases at the moment, as the code is going to go through a fair bit of refactoring and for some cases not yet supported I'm auto-passing them for the fuzzer. The todo!()s missed so far are specific combinations of reading instruction arguments from various locations (stack/register/first-time-encountering vreg).

Chris Fallin (Mar 14 2024 at 23:42):

When I was developing the existing algorithm the fuzzer was impressively effective at finding weird combinations of operands; so hopefully that sort of thing will become apparent with enough fuzzing time!

Chris Fallin (Mar 14 2024 at 23:43):

Oh, one random tip: I usually fuzz with -s none (turns off sanitizers), since in safe Rust asan doesn't really have any point and we're only concerned with the logical/algorithmic correctness. It buys a decent speedup in fuzz-rate. Also -j N for N multicore fuzzing

lengyijun (Mar 21 2024 at 11:12):

@Squaaawk I have a toy c compiler based on regalloc2, I can help you test over your new allocator if necessary

Squaaawk (Mar 22 2024 at 00:40):

@lengyijun That would be useful! Is it publically available? If not, don't worry about it — I have sufficient tooling in place to make forward progress at the moment

lengyijun (Mar 22 2024 at 00:46):

GitHub - lengyijun/kecc

Contribute to lengyijun/kecc development by creating an account on GitHub.

Squaaawk (Mar 22 2024 at 01:07):

I had to comment out #![deny(unused_qualifications)] in order to compile, and several tests fail using that command (with a fresh clone). Is this expected?

lengyijun (Mar 22 2024 at 01:11):

I tested on d1877a15f9d575c275977fc9724b2f8ed166209d
Some timeout warning is expected

Squaaawk (Mar 22 2024 at 01:13):

hash d1877a15f9d575c275977fc9724b2f8ed166209d
tests: 9 passed (2 slow), 4 failed, 0 skipped

badumbatish (Sep 16 2024 at 06:06):

Google Summer of Code

Google Summer of Code is a global program focused on bringing more developers into open source software development.

Chris Fallin (Sep 16 2024 at 14:54):

Chris Fallin (Sep 16 2024 at 14:55):

Fastalloc1 by d-sonuga · Pull Request #181 · bytecodealliance/regalloc2

This is the initial implementation of the fast register allocator in src/fastalloc. It's still a work in progress and I haven't done any kind of optimizations on it. Still using less-than-o...

Stream: cranelift

Topic: Cranelift regalloc in Google Summer of Code via Rust project

Chris Fallin (Feb 22 2024 at 07:39):

long_long_float (Feb 26 2024 at 01:46):

Chris Fallin (Feb 26 2024 at 02:55):

long_long_float (Feb 26 2024 at 03:13):

long_long_float (Mar 05 2024 at 16:11):

Chris Fallin (Mar 05 2024 at 16:36):

Chris Fallin (Mar 05 2024 at 16:48):

long_long_float (Mar 06 2024 at 08:59):

Chris Fallin (Mar 06 2024 at 14:09):

long_long_float (Mar 10 2024 at 10:27):

Chris Fallin (Mar 10 2024 at 16:19):

Chris Fallin (Mar 10 2024 at 16:20):

Squaaawk (Mar 10 2024 at 20:24):

Chris Fallin (Mar 10 2024 at 20:32):

Chris Fallin (Mar 10 2024 at 20:33):

bjorn3 (Mar 11 2024 at 10:54):

Squaaawk (Mar 14 2024 at 21:07):

Chris Fallin (Mar 14 2024 at 22:02):

Squaaawk (Mar 14 2024 at 22:53):

Chris Fallin (Mar 14 2024 at 23:42):

Chris Fallin (Mar 14 2024 at 23:43):

lengyijun (Mar 21 2024 at 11:12):

Squaaawk (Mar 22 2024 at 00:40):

lengyijun (Mar 22 2024 at 00:46):

Squaaawk (Mar 22 2024 at 01:07):

lengyijun (Mar 22 2024 at 01:11):

Squaaawk (Mar 22 2024 at 01:13):

badumbatish (Sep 16 2024 at 06:06):

Chris Fallin (Sep 16 2024 at 14:54):

Chris Fallin (Sep 16 2024 at 14:55):

Chris Fallin (Sep 16 2024 at 14:55):

badumbatish (Sep 17 2024 at 05:20):