Cranelift regalloc on arm64 · cranelift

Note that at line 24 of the asssembly, cranelift decides to move a saved register into x1, dereference it, then overwrite x1 with something else. Is there something weird going on that I'm not aware of around regalloc in cranelift or something like that? Or maybe I'm missing some option (I set opt_level to speed and regalloc_algorithm to backtracking, but the behavior persists...)

Chris Fallin (Nov 17 2025 at 17:27):

The short answer is that regalloc (in general, not just ours) is full of heuristics and sometimes does the wrong thing because it's an NP-hard binpacking problem. Unfortunately I don't have time to look at specific cases at the moment, but if you're willing to, PRs welcome. (Sorry, I don't mean to sound dismissive with that, just trying to level-set -- we aren't at a level of staffing that can allow for fine-tuning at this level at the moment)

pitust (Nov 17 2025 at 17:30):

Yeah, I know it's an NP-hard problem and all, but this specific case is also not a very complex one (there are more available machine registers than values, and not all of the values even need a register of their own, and this definitely has the "some heuristic broke" vibe rather than the "this is just a really hard problem to solve" vibe), so I figured I'd ask anyway in case it's like a known bug or something.

Chris Fallin (Nov 17 2025 at 17:31):

It's a fair question to ask! If you decide to dig into this, I'm at least happy to point at relevant bits of code

pitust (Nov 17 2025 at 17:31):

Chris Fallin (Nov 17 2025 at 17:32):

yes, the trace-log feature in cranelift-codegen guards a bunch of logging to eliminate dynamic checks, so enable that, then WASMTIME_LOG=trace will give you gigabytes of stuff; probably also RAYON_NUM_THREADS=1 so parallel compilation doesn't confuse you

pitust (Nov 17 2025 at 17:33):

Chris Fallin (Nov 17 2025 at 17:33):

so RUST_LOG=trace (or whatever your logging frontend is in your embedding) too

pitust (Nov 17 2025 at 17:37):

pitust (Nov 17 2025 at 17:44):

I think I know what might be going wrong. It seems like cranelift tries really hard to keep the arguments in the same registers as they were passed in, even if that makes no sense...

pitust (Nov 17 2025 at 17:54):

Yeah, the bad codegen disappears if i set opt_level to none and mangle the pointer by running let ctx = b.ins().band_imm(ctx, (!1u64) as i64);...

pitust (Nov 17 2025 at 17:55):

With optimizations enabled, then the bitand gets rematerialized and the bad regalloc is back.

Chris Fallin (Nov 17 2025 at 17:58):

You aren't going to like the answer but unfortunately it's "understand the regalloc's algorithms and see where they are doing something suboptimal" :-/

Chris Fallin (Nov 17 2025 at 17:59):

It's not exactly that, more that regalloc has constraints, the fixed-reg constraints apply to the args, and there are heuristics for when one wants to split liveranges

pitust (Nov 17 2025 at 18:03):

Yeah, I figured. What I think happens is that cranelift decides to not split the liverange but just spill, and then after unspilling it maintains the same register assignment (for whatever reason), and then later it decides to move it into a different register instead of spilling, and then move elision fails for whatever reason.

Chris Fallin (Nov 17 2025 at 18:05):

a few things just to help: (i) there aren't any spills here, just moves between registers (those are distinct); (ii) move elision will remove moves that are redundant, but it's not going to see that A was copied to B and we can use A directly instead of B

pitust (Nov 17 2025 at 18:09):

pitust (Nov 17 2025 at 18:55):

I think the issue is around split_and_requeue_bundle: when a bundle is split because one fixed register constraint contradicts another, then the register allocator should explicitly try to not use that fixed register when allocating, instead of having it be the preferred register...

Chris Fallin (Nov 17 2025 at 18:58):

when a split happens, the split-off portion has a preference but not a hard requirement anymore -- the requirements for each half are recomputed

pitust (Nov 17 2025 at 18:59):

pitust (Nov 17 2025 at 19:00):

The ranges may have been split because that register cannot be used, in which case, the register is explicitly the one you don't want to use.

Chris Fallin (Nov 17 2025 at 19:01):

OK -- if there is a conflict, or if other liveranges have stronger preference, the weights should move the split-off half to another register instead

pitust (Nov 17 2025 at 19:01):

Chris Fallin (Nov 17 2025 at 19:01):

pitust (Nov 17 2025 at 19:01):

Chris Fallin (Nov 17 2025 at 19:03):

Well, it's all heuristics. There are use-weights and there are costs that are considered during assignment. A simplistic "I would simply not use that register" heuristic stacked on top isn't going to be the answer, because there is already a fairly complex cost machinery considering the tradeoffs

pitust (Nov 17 2025 at 19:03):

In the split off LR, the hint is checked as the first thing to be processed. It can work (because the thing stopping the register from being viable is between the two LRs), but it results in some extra shuffling (because to get the register from one LR to the other, some other register needs to be used).

Chris Fallin (Nov 17 2025 at 19:04):

This is the backtracking part of the allocator: when another use comes along that can make use of the register, it has the opportunity to undo that choice

Chris Fallin (Nov 17 2025 at 19:04):

pitust (Nov 17 2025 at 19:04):

Chris Fallin (Nov 17 2025 at 19:04):

I'd caution against simplistic answers here though: "don't consider the hint" is going to break a lot of other stuff

Chris Fallin (Nov 17 2025 at 19:04):

pitust (Nov 17 2025 at 19:23):

I'm confused by something. When cranelift splits the liverange into two parts, the log it prints is

split bundle LiveBundleIndex(0) at progpoint4-pre and requeue with reg hint (for first part) PReg(hw = 0, class = Int, index = 0)

Chris Fallin (Nov 17 2025 at 19:31):

I believe the hint is just for the first part, but the code would be the definitive answer. Unfortunately I can't give much more time to this this morning, but best of luck (sorry!)

pitust (Nov 17 2025 at 19:33):

Amanieu (Nov 18 2025 at 04:36):

I looked into this: the root cause from regalloc2's choice to split bundles into 3 parts instead of 2. It moves the middle part of the live range between 2 conflicting uses directly to the spill bundle (which is then allocated to a register in the second chance allocation pass). This results in the unnecessary move because the middle and end bundles end up assigned different registers since regalloc2 doesn't know they are related.

Amanieu (Nov 18 2025 at 04:44):

I also tried the example code with regalloc3 (disassembly since it has a completely different live range splitting algorithm. However it avoids the splitting in the first place by not putting a fixed-register constraint on the whole live range, instead only treating the constraint as a hint (which only applies to the first part of the live range if it is split).

I also looked into why regalloc2 spills more than regalloc3 and it's because of regalloc2's randomization of the probe order. This will tend to spread bundles over all available registers instead of minimizing the number of registers used.

gist:54d5d89c119ca74409a4b458c4fa013b

GitHub Gist: instantly share code, notes, and snippets.

pitust (Nov 18 2025 at 10:15):

Yeah, the regaloc3 codegen looks much more reasonable. Out of curiousity, is it possible to use regalloc3 with cranelift? (I know it's not yet production-ready, but this isn't very security-critical code and I don't mind getting crashes).

Chris Fallin (Nov 18 2025 at 17:22):

We talked about that over the summer in Cranelift meetings; the conclusion was that we who use Wasmtime+Cranelift in security-critical contexts (i) would need to review regalloc3 but (ii) don't have the bandwidth to do so to the degree that we would understand the codebase and be able to fix it in an emergency or add features as needed. Basically, regalloc is too load-bearing to outsource

pitust (Nov 18 2025 at 17:26):

Ah, and I guess the burden of maintaining both concurrently is too great? That is understandable, then.

Chris Fallin (Nov 18 2025 at 17:28):

Unfortunately, yeah. If we had a large compiler team it would be an obvious choice to ramp up on it, but, sadly, that's not the case

Amanieu (Nov 18 2025 at 17:44):

Add option to use regalloc3 as the register allocator by Amanieu · Pull Request #11213 · bytecodealliance/wasmtime

This is based on bytecodealliance/regalloc2#230 which adds regalloc3 as a back-end to regalloc2. Only one change had to be made to codegen: tail calls were reserving r11 as a scratch register by ma...

Stream: cranelift

Topic: Cranelift regalloc on arm64

pitust (Nov 17 2025 at 17:23):

Chris Fallin (Nov 17 2025 at 17:27):

pitust (Nov 17 2025 at 17:30):

Chris Fallin (Nov 17 2025 at 17:31):

pitust (Nov 17 2025 at 17:31):

Chris Fallin (Nov 17 2025 at 17:32):

pitust (Nov 17 2025 at 17:33):

Chris Fallin (Nov 17 2025 at 17:33):

Chris Fallin (Nov 17 2025 at 17:33):

pitust (Nov 17 2025 at 17:37):

pitust (Nov 17 2025 at 17:44):

pitust (Nov 17 2025 at 17:54):

pitust (Nov 17 2025 at 17:55):

Chris Fallin (Nov 17 2025 at 17:58):

Chris Fallin (Nov 17 2025 at 17:59):

pitust (Nov 17 2025 at 18:03):

Chris Fallin (Nov 17 2025 at 18:05):

pitust (Nov 17 2025 at 18:09):

pitust (Nov 17 2025 at 18:55):

Chris Fallin (Nov 17 2025 at 18:58):

pitust (Nov 17 2025 at 18:59):

pitust (Nov 17 2025 at 19:00):

Chris Fallin (Nov 17 2025 at 19:01):

pitust (Nov 17 2025 at 19:01):

Chris Fallin (Nov 17 2025 at 19:01):

pitust (Nov 17 2025 at 19:01):

pitust (Nov 17 2025 at 19:01):

Chris Fallin (Nov 17 2025 at 19:03):

pitust (Nov 17 2025 at 19:03):

Chris Fallin (Nov 17 2025 at 19:04):

Chris Fallin (Nov 17 2025 at 19:04):

pitust (Nov 17 2025 at 19:04):

Chris Fallin (Nov 17 2025 at 19:04):

Chris Fallin (Nov 17 2025 at 19:04):

pitust (Nov 17 2025 at 19:23):

Chris Fallin (Nov 17 2025 at 19:31):

pitust (Nov 17 2025 at 19:33):

Amanieu (Nov 18 2025 at 04:36):

Amanieu (Nov 18 2025 at 04:44):

pitust (Nov 18 2025 at 10:15):

Chris Fallin (Nov 18 2025 at 17:22):

pitust (Nov 18 2025 at 17:26):

Chris Fallin (Nov 18 2025 at 17:28):

Amanieu (Nov 18 2025 at 17:44):