For anyone interested here, a heads-up: via the Rust project's participation in Google Summer of Code, there is a project idea described here that would involve building a fast (very simple, non-backtracking, single-pass) register allocator behind the regalloc2 API to plug into Cranelift for a fast build mode. @Amanieu and I both volunteered to mentor anyone who wants to take on the project; see the Rust project's blog post for the process to apply / get involved.
Hi, I'm interested in this project. To understand this project, are there any issues that I can work on at the beginning? Thank you.
@long_long_float we unfortunately don't really have any starter issues for this; in theory working on the existing (more complex, backtracking/liverange-splitting) allocator would give you a lot of background but it would also likely be a very painful ramp-up when the goal is to replace it with something simpler :-) It might be best to study the public API of the regalloc2 crate, as a starting point, along with various papers on regalloc (e.g. the classical linear scan paper, though at least I'm envisioning something even a little simpler still, without a pre-analysis to find liveranges but rather a single forward pass)
Thank you! I will read the API of regalloc2 and papers about it first.
I have a question about the VReg
in VCode. Can it express registers for float or SIMD other than integer?
I'm planning to implement the first allocator for only integers. I want to know whether it is easy to extend the allocator for floats.
@long_long_float yes it can; VCode must represent any code that Cranelift can generate. Two things though: (i) it’s totally fine to start during development with an allocator that just panics in cases it doesn’t handle yet; but also (ii) you can think of the regalloc problem as mostly N separate problems for N register classes, as they don’t really interact.
(That's not 100% true; spillslot allocation needs to be shared, but that can be as simple as "all classes ask the same allocator for space", and in fact I wouldn't recommend trying to share spillslots between classes)
Thank you! In my understanding, spillslot means memory region to place spilled registers. Is it right?
And does "all classes ask the same allocator for space" mean that the allocator prepares spillslots for each classes?
Yes to both!
Hi, I have a question. We decided to use SSRA with Reverse Linear Scan Allocation. My understanding is that the allocation requires the instruction to be SSA. However VCode doesn't seem to be SSA form. Is it possible to make the output value of a VCode instruction a single definition?
@long_long_float VCode is actually SSA; a single instruction may have multiple defs, but any given register only has one def. (Various people might have different definitions of SSA, but the salient property for us at least is one-def-per-register.)
I don't think that should be problematic for an algorithm that wants to see defs/uses though -- you can iterate over the defs in the instruction when you scan it (as if there were multiple instructions)?
In the past couple of days I've thrown together a simple implementation of SSRA with support for branching across multiple basic blocks. Right now, the project is directly substituted in place of regalloc2's current allocator, and supports just enough features for evaluation of a toy expression language (w/ conditionals) and the sample long script provided in Matt's blog, recreating the register pressure graph at the bottom.
Now I'm focusing on testing and guaranteeing correctness of the allocator. The ion_checker fuzzer appears to pass everything, even when I intentionally introduce known bugs. It does, however, catch known bugs I introduce into regalloc2. I'm clearly not understanding the tool here. Can you give me an overview of what guarantees the fuzzer should provide, and if there is any existing tooling around for testing and eventually benchmarking the project?
@Squaaawk that's great you've gotten so far!
The ion_checker
fuzz target is checking that the dataflow of the allocation result is equivalent to the dataflow of the pre-allocation program with virtual registers. Are you seeing oracle validation even when there is an allocation result you know to be incorrect (manually verified)? I'd check the basics too just to be sure, e.g. that the fuzz target is running your new allocator rather than ion.
The next testing + benchmarking step beyond fuzzing is use in the main (only public?) embedding of regalloc2, namely Cranelift. If you have a branch of RA2 that uses your allocator by default instead, and it is correct, dropping it in (by editing the cargo dep in a wasmtime tree to a local path, for example) should give you a wasmtime
that can compile and run Wasms. We generally use Sightglass to benchmark that, but for compilation speed, any large wasm file will do.
(The final state of things will be that we release regalloc2 with the new allocator merged in and available as an option in RegallocOptions
, and then add a Cranelift flag to set it, so this manual setup won't be necessary in the end.)
oh, and of course the cg_clif
benchmarking side -- the other major use of Cranelift aside from Wasmtime -- @bjorn3 can say more about what is idiomatic to do that, I think
For cg_clif, you can use ./y.sh bench
in the cg_clif repo as quick benchmark. For anything serious you will need to use https://github.com/rust-lang/rustc-perf/ though to run the benchmark suite of rustc itself. Make sure to skip the opt benchmarks though as those don't make much sense for cg_clif and you probably want to skip the check benchmarks too as they should have identical performance independent of how fast the codegen backend is.
I have a number of todo!()
statements that were never triggered in a successful overnight of fuzz testing. How feature complete is the ion_checker fuzzer expected to be — should I expect it to do a "pretty good job" of testing pretty much every case for my allocator, or is there more work to do here? Additionally, is an overnight of fuzzing reasonable, or would it take significantly longer to stumble across every case?
@Squaaawk ion_fuzzer
is meant to exercise nearly every feature of RA2. I have found fuzzbugs after several days of fuzzing on 32 cores, especially after things have already settled down (easy fuzzbugs fixed). Can you say what your todo!()
'd cases are more specifically?
Alright, it sounds like for the most part just more time fuzzing is needed. I'm not too woried about tracking down individual missed cases at the moment, as the code is going to go through a fair bit of refactoring and for some cases not yet supported I'm auto-passing them for the fuzzer. The todo!()
s missed so far are specific combinations of reading instruction arguments from various locations (stack/register/first-time-encountering vreg).
When I was developing the existing algorithm the fuzzer was impressively effective at finding weird combinations of operands; so hopefully that sort of thing will become apparent with enough fuzzing time!
Oh, one random tip: I usually fuzz with -s none
(turns off sanitizers), since in safe Rust asan doesn't really have any point and we're only concerned with the logical/algorithmic correctness. It buys a decent speedup in fuzz-rate. Also -j N
for N multicore fuzzing
@Squaaawk I have a toy c compiler based on regalloc2, I can help you test over your new allocator if necessary
@lengyijun That would be useful! Is it publically available? If not, don't worry about it — I have sufficient tooling in place to make forward progress at the moment
Squaaawk said:
lengyijun That would be useful! Is it publically available? If not, don't worry about it — I have sufficient tooling in place to make forward progress at the moment
https://github.com/lengyijun/kecc/tree/regalloc2
Try RUST_MIN_STACK=33554432 cargo nextest run
I had to comment out #![deny(unused_qualifications)]
in order to compile, and several tests fail using that command (with a fresh clone). Is this expected?
Squaaawk said:
I had to comment out
#![deny(unused_qualifications)]
in order to compile, and several tests fail using that command (with a fresh clone). Is this expected?
I tested on d1877a15f9d575c275977fc9724b2f8ed166209d
Some timeout warning is expected
hash d1877a15f9d575c275977fc9724b2f8ed166209d
tests: 9 passed (2 slow), 4 failed, 0 skipped
hello, is this project still on-going? i saw theres a contributor on the gsoc paper but i'm not sure about new on-going work https://summerofcode.withgoogle.com/programs/2024/projects/zxxeGZMt
Yes, Demilade is still working on it!
See ongoing PR to merge his implementation here
most discussion happens over in the rust-lang Zulip
ah ok i see it now, tysm
Last updated: Dec 23 2024 at 12:05 UTC