Stream: cranelift

Topic: RA2: Clobbers on branch instructions ignored?


view this post on Zulip Thijs Molendijk (Jul 24 2024 at 20:20):

Hi all,

I've been toying with regalloc2 for some local dev but I've ran into a bit of a snag. I have the following instruction pseudocode that I'd like to register allocate using regalloc2 (I'm aware that it's not SSA, but I run into the same exact problem if I model this using block parameters instead):

block0:
    mov r0, 1
    jz r0, block1, block2

block1:
    mov r1, 10
    jmp block3

block2:
    mov r1, 20
    jmp block3

block3:
    exit r1

Now, the odditiy of the architecture that I'd like to compile to is that both jz and jmp clobber a set of registers.

The above pseudocode is fed to regalloc2 as follows (format is insn id | insn | operands | clobbers):

== block 0; preds = [], succs = [Block(1), Block(2)] | params = [] ==
insn 0 | MovImm64(r0, 0) | [Def: v0i reg] | []
insn 1 | Jz(r0, BlockId(1), BlockId(2)) | [Use: v0i reg] | ["RAX", "RCX", "RDX", "R8", "R9", "R10", "R11"]

== block 1; preds = [Block(0)], succs = [Block(3)] | params = [] ==
insn 2 | MovImm64(r3, 10) | [Def: v3i reg] | []
insn 3 | Jmp(BlockId(3)) | [] | ["RAX", "RCX", "RDX", "R8", "R9", "R10", "R11"]

== block 2; preds = [Block(0)], succs = [Block(3)] | params = [] ==
insn 4 | MovImm64(r3, 20) | [Def: v3i reg] | []
insn 5 | Jmp(BlockId(3)) | [] | ["RAX", "RCX", "RDX", "R8", "R9", "R10", "R11"]

== block 3; preds = [Block(1), Block(2)], succs = [] | params = [] ==
insn 6 | Exit(r3) | [Use: v3i fixed(p10i)] | []

RA2 successfully computes a result for this, but it's entirely wrong! It outputs the following InstOrEdit trace:

== block 0 ==
MovImm64(r0, 0) [p0i]
Jz(r0, BlockId(1), BlockId(2)) [p0i]

== block 1 ==
MovImm64(r3, 10) [p10i]
Move { from: p10i, to: p4i }
Jmp(BlockId(3)) []

== block 2 ==
MovImm64(r3, 20) [p10i]
Move { from: p10i, to: p4i }
Jmp(BlockId(3)) []

== block 3 ==
Exit(r3) [p10i]

Here, p0i is RAX, p4i is RSI, and p10i is R10. Effectively, it's output the following assembly:

block0:
    mov rax, 1
    jz rax, block1, block2

block1:
    mov r10, 10
    mov rsi, r10
    jmp block3

block2:
    mov r10, 20
    mov rsi, r10
    jmp block3

block3:
    exit r10

It's assigned the r10 register to the 10/20 constants, because exit wants its argument to be in that physical register specifically. The main problem here of course is that the jmp instructions will clobber r10, so the final call to exit will contain garbage. RA2 seemingly is aware of this, because it attempts to back up the value into RSI (which isn't clobbered) first. But then it proceeds to never restore r10 with mov r10, rsi after the branch?

(Additionally, the choice of the initial mov r10, [10/20] is weird too. Why doesn't it immediately assign RSI to these cases, so it doesn't have to move around immediately after?)

At first, I thought this was an issue with my input not being in SSA form. However, restructuring this so that block3 has a block parameter and assigning that appropriately results in exactly the same broken code.

I suspect that the issue has something to do with RA2 not properly considering clobbers if the instruction doing the clobbering is a branch/block terminator. To illustrate, I added a simple NopClobber instruction to my ISA that does nothing except clobber. I placed this after each branch (i.e., its the first instruction in any block with a nonzero amount of predecessors). Re-running RA2 on that input results in the following assignment which, while suboptimal, at least is correct:

block0:
    mov rax, 1
    jz rax, block1, block2

block1:
    nop-clobber
    mov r10, 10
    mov rdi, r10
    jmp block3

block2:
    nop-clobber
    mov r10, 20
    mov rdi, r10
    jmp block3

block3:
    nop-clobber
    mov r10, rdi
    exit r10

This workaround is fine for now, but it results in a suboptimal allocation and is probably indicative of _some_ issue in RA2. Figured I'd reach out and poke you, to see if this is something that RA2 should/could be handling better.

view this post on Zulip Chris Fallin (Jul 24 2024 at 20:52):

Ah, what's going on here is that the blockparam assignment causes edge-moves to be generated, and edge-moves go before the predecessor block's jump because there are multiple preds (block1 and block2) for one successor (block3)

view this post on Zulip Chris Fallin (Jul 24 2024 at 20:52):

In general, branches are special -- there are some restrictions on how they can use parameters as well -- because of the need to insert these moves that logically happen only on the edge actually before the branch over that edge

view this post on Zulip Chris Fallin (Jul 24 2024 at 20:52):

The bug honestly is that we don't error out when branches have clobbers specified...

view this post on Zulip Chris Fallin (Jul 24 2024 at 20:54):

To address this by making the allocator more flexible (actually fully supporting clobbers and all kinds of arguments on branches) would be quite nontrivial because it would mean we actually need to split the edge -- something we don't do currently (the CFG is immutable) -- or else require an empty block on every edge, which would be quite expensive

view this post on Zulip Chris Fallin (Jul 24 2024 at 20:54):

so RA2 was designed with the tradeoff that branches are limited, which worked for the real ISAs we were targeting. Could you say more about why branches have clobbers (maybe there's some other solution we can think of)?

view this post on Zulip Thijs Molendijk (Jul 24 2024 at 21:01):

Fair enough. I can emulate clobbers on branches through the nop-clobber trick so losing that isn't too bad for my usecases.

Am I correct in thinking that RA2 does fixed-register choices before it attempts to compensate for clobbers? That'd explain the suboptimal choice for it to first move the constants into r10 (the fixed register for the exit insn), then work around the clobber that it encounters with the swap in/out of rdi.

My branch instructions cause clobbers because the final assembly also does a function call before the branch for instrumentation purposes, and this function call ends up doing the actual branch, instead of the real x86 instruction (you'll notice that the clobber set in my trace above is exactly the same as the set of caller-save registers on Win64). The reason for not having some kind of Insn::InstrumentCall before the branches is that in the case of Jz I don't want RA2 to spend efforts on ensuring that the argument isn't clobbered when it doesn't suffer from the clobbering (the clobbering happens after the conditional register has been read)

view this post on Zulip Thijs Molendijk (Jul 24 2024 at 21:05):

I suppose the alternative would be to just have the instrumentation back up all registers (i.e. a non-clobbering call) so that I don't have to worry about clobbers in edges at all.

view this post on Zulip Amanieu (Jul 24 2024 at 21:53):

@Thijs Molendijk You can put clobbers on branches if the destination block has more than 1 predecessor. If the destination block only has 1 predecessor then you need to use the nop clobber trick.

view this post on Zulip Chris Fallin (Jul 24 2024 at 22:59):

Am I correct in thinking that RA2 does fixed-register choices before it attempts to compensate for clobbers?

No, this isn't how it works: RA2 collects all requirements, including fixed-register constraints and clobber conflicts, and handles them all together in the main allocation loop. If you had put the same constraints and clobbers on a regular instruction in straight line code, you'd see it work properly. The only issue is with branches, as I mentioned; branches are special because they can come after an edge move that they logically should come before (so their uses are heavily restricted).

view this post on Zulip Thijs Molendijk (Jul 25 2024 at 21:56):

Just to confirm, I'm not allowed to have a block that contains a branching instruction that isn't the final instruction, right? I just ran into some oddities where regallocs were weird/incorrect when I had a block that ended with jz x, a; jmp b, but I presume I need to model that as two blocks, one direct branch and one implicit fallthrough?

view this post on Zulip Thijs Molendijk (Jul 25 2024 at 21:56):

(if that's the case, it'd probably also be good to add an assertion/check for that)

view this post on Zulip Thijs Molendijk (Jul 25 2024 at 21:59):

Hm, nevermind. That still results in a bad allocation. Let me reduce it and see if I can find what is causing it

view this post on Zulip Chris Fallin (Jul 25 2024 at 22:03):

RA2 itself doesn't care about whether the last 1 or last N instructions are branches; it gets its canonical CFG from what you provide re: successor info

view this post on Zulip Thijs Molendijk (Jul 25 2024 at 22:24):

Hm okay, I ran into another bug, but I'm not sure where. I have the following input:

== block 0 | pred = [] | succ = [Block(1), Block(2)] ==
insn 0 | MovImm64(r1, 2147483647) | operands=[Def: v1i reg] | clobbers=[]
insn 1 | Syscall(r2, ...) | operands=[Def: v2i fixed(p0i), Use: v1i fixed(p9i)] | clobbers=["RAX", "RCX", "RDX", "R8", "R9", "R10", "R11"]
insn 2 | Jz(r2, BlockId(1)) | operands=[Use@Late: v2i reg, Def: v5i fixed(p3i)] | clobbers=[]
insn 3 | Jmp(BlockId(2)) | operands=[] | clobbers=[]

== block 1 | pred = [Block(0)] | succ = [] ==
insn 4 | NopClobber | operands=[] | clobbers=["RAX", "RCX", "RDX", "R8", "R9", "R10", "R11"]
insn 5 | Exit(r2) | operands=[Use: v2i fixed(p3i)] | clobbers=[]

== block 2 | pred = [Block(0)] | succ = [] ==
insn 6 | NopClobber | operands=[] | clobbers=["RAX", "RCX", "RDX", "R8", "R9", "R10", "R11"]
insn 7 | Exit(r2) | operands=[Use: v2i fixed(p3i)] | clobbers=[]

This is roughly equivalent to the following assembly:

block0:
    mov r1, 2147483647
    syscall r2, [r1]
    jz r2, block1
    jmp block2

block1:
    exit r2

block2:
    exit r2

There are only clobbers on non-branch instructions here, so we should be fine w.r.t. avoiding the clobbering issues described earlier. I've automatically inserted a NopClobber in both branches, even if it might not be necessary (since they both only have a single predecessor).

Running RA2 on this errors with "Could not allocate minimal bundle, but the allocation problem should be possible to solve" (https://github.com/bytecodealliance/regalloc2/blob/main/src/ion/process.rs#L1266).

Now, if I drop the RAX clobber from the syscall instruction, it successfully completes the allocation with a proper working allocation:

== block 0 | pred = [] | succ = [Block(1), Block(2)] ==
insn 0 | MovImm64(r1, 2147483647) | operands=[Def: v1i reg] | clobbers=[] | allocs=[p9i]
insn 1 | Syscall(r2, ...) | operands=[Def: v2i fixed(p0i), Use: v1i fixed(p9i)] | clobbers=["RCX", "RDX", "R8", "R9", "R10", "R11"] | allocs=[p0i, p9i]
insn 2 | Jz(r2, BlockId(1)) | operands=[Use@Late: v2i reg, Def: v5i fixed(p3i)] | clobbers=[] | allocs=[p0i, p3i]
Move { from: p0i, to: p4i }
insn 3 | Jmp(BlockId(2)) | operands=[] | clobbers=[] | allocs=[]

== block 1 | pred = [Block(0)] | succ = [] ==
insn 4 | NopClobber | operands=[] | clobbers=["RAX", "RCX", "RDX", "R8", "R9", "R10", "R11"] | allocs=[]
Move { from: p4i, to: p3i }
insn 5 | Exit(r2) | operands=[Use: v2i fixed(p3i)] | clobbers=[] | allocs=[p3i]

== block 2 | pred = [Block(0)] | succ = [] ==
insn 6 | NopClobber | operands=[] | clobbers=["RAX", "RCX", "RDX", "R8", "R9", "R10", "R11"] | allocs=[]
Move { from: p4i, to: p3i }
insn 7 | Exit(r2) | operands=[Use: v2i fixed(p3i)] | clobbers=[] | allocs=[p3i]

(well, it's not actually working but that's the result of the NopClobber hack; I'm lacking a NopClobber after the Jz if branch is not taken, so RA2 only moves RAX into RDI after Jz, at which point RAX is already clobbered. But that's my fault, not RA2's).

The docs for Program::inst_clobbers say "Note that it is legal for a register to be both a clobber and an actual def (via pinned vreg or via operand constrained to the reg). This is for convenience: e.g., a call instruction might have a constant clobber set determined by the ABI, but some of those clobbered registers are sometimes return value(s).". The behavior above seems to contradict this.

A new register allocator. Contribute to bytecodealliance/regalloc2 development by creating an account on GitHub.

view this post on Zulip Chris Fallin (Jul 25 2024 at 22:30):

Unfortunately I've run out of spare cycles for support here; I'd encourage you to look at how Cranelift uses RA2 and continue to debug, and I may be able to return later

view this post on Zulip Thijs Molendijk (Jul 25 2024 at 22:33):

No worries, thanks for the help. I'll file an issue on the repo if nothing else, perhaps I'll have time myself to look into it deeper.


Last updated: Nov 22 2024 at 16:03 UTC