Rematerialization: egraph vs regalloc · cranelift

I've been reading up on @Chris Fallin's very impressive work on using an egraph for optimizations. One part specifically caught my eye:

While I can see some value in sinking unary and nullary operations down to their first use to shrink their live range. I think that unconditionally re-emitting them in each block that uses them might be both overly pessimistic (can't re-use a value across blocks) and overly pessimistic (there might still be unnecessary spills within the block).

I believe that regalloc2 might be the best place to perform remateralization since that's the point where we can reliably determine whether it is needed due to register pressure. However it can also be pretty complicated to implement: https://github.com/bytecodealliance/regalloc2/issues/8.

Rematerialization · Issue #8 · bytecodealliance/regalloc2

As an alternative to spilling and reloading, a vreg value can sometimes be recomputed from other available values. There are two ways this can be supported in regalloc2: Simple rematerialization Su...

Chris Fallin (Sep 26 2022 at 16:08):

@Amanieu I've thought a lot about this as well and I agree that in a perfect world, the register allocator is the place to consider the actual register pressure against available rematerializations and their costs (and their implications on liverange extensions for their inputs).

In the absence of some framework for that I opted for a simpler approach that "should never lose" at least wrt register pressure impact: we only remat ops with zero or one inputs, so we are at worst replacing one liverange (result of the remat'd op) with another (the one input to the remat'd op). The cost of the op itself is not taken into account here, but on this question I relied on the macro-level benchmark results: code got significantly faster when I applied this approach.

So in summary, I agree with you on the long-term solution, and I think we'd want to represent this to RA2 in terms of "rematerializable with X cost" metadata on instructions: it's ok to duplicate an instruction and it carries X cost, in the same unit as spillcost maybe. But that needs a lot more thought and we get short-term benefit, as well as exercising the code-motion capabilities during egraph elaboration, with the current approach. The current approach may someday be superseded if we can build the proper RA-integrated approach. Does that seem reasonable to you?

Amanieu (Sep 26 2022 at 16:14):

That sounds great! I just wanted to know what your future plans were regarding rematerialization because it is such a critical optimization. Unfortunately it will probably be a while until I get around to working on remat in regalloc2 since it is quite complex and I have many other things to work on in the meantime.

Amanieu (Sep 26 2022 at 16:15):

Also, I would like to at least wait for non-SSA support to be removed from regalloc2, which would simplify the codebase significantly.

Chris Fallin (Sep 26 2022 at 16:16):

Yes absolutely; a few of the recent fuzzbug fixes have convinced me further that we need to move away from the "no overlap" invariant and solve some of the constraint-fixup issues by allowing overlaps instead. That's only really reasonable to think about with one def per vreg; so we'll want the SSA transition for that reason as well

Chris Fallin (Sep 26 2022 at 16:17):

(that also lets us get rid of the redundant-move eliminator, probably; it was an awkward hack to avoid thinking about whether spillslot values were up-to-date or not more explicitly)

Olivier FAURE (Sep 28 2022 at 16:38):

Chris Fallin (Sep 28 2022 at 17:02):

in RA2, we currently have the invariant that liveranges for a single vreg or within an allocation bundle cannot overlap; in other words, a value lives in only one place at a time

Chris Fallin (Sep 28 2022 at 17:02):

allowing overlap instead lets us solve some constraint-rewriting cases easier and more efficiently

Chris Fallin (Sep 28 2022 at 17:03):

SSA is required for this to happen, because if you have a value in more than one place, you need to generate moves, and tracking which is the most up-to-date is far easier if there is only one def (hence SSA)

Stream: cranelift

Topic: Rematerialization: egraph vs regalloc

Amanieu (Sep 26 2022 at 15:35):

Chris Fallin (Sep 26 2022 at 16:08):

Amanieu (Sep 26 2022 at 16:14):

Amanieu (Sep 26 2022 at 16:15):

Chris Fallin (Sep 26 2022 at 16:16):

Chris Fallin (Sep 26 2022 at 16:17):

Olivier FAURE (Sep 28 2022 at 16:38):

Chris Fallin (Sep 28 2022 at 17:02):

Chris Fallin (Sep 28 2022 at 17:02):

Chris Fallin (Sep 28 2022 at 17:03):