cfallin opened issue #8783:
Currently, Cranelift with regalloc2 supports a maximum of
2^21
(2097152, a little over two million) virtual registers in VCode; in turn this limits the number of SSA values we support. This implementation limit is generally not an issue (we've lived with it for two years at least!) but sometimes does come up with giant functions produced by tooling -- for example, long initialization sequences, or giant state machines, or overeager inlining.The origin of this limit is the dense packing of a
regalloc2::Operand
into 32 bits. The dense packing in turn is extremely important for performance: earlier performance experiments with regalloc2 showed in the neighborhood of 5-10% degradations from moving to a 64-bit Operand. It turns out to be very important for cache locality to have the vreg, its kind and location of mention (early/late use/def), and any constraint (physical reg, stack, reuse, ...) all in one place and in a dense list in each liverange.That said, if the alternative is to fail allocation entirely, a little slowdown doesn't seem like such a bad alternative: so this issue proposes a hybrid solution where we instantiate regalloc2 twice, with a trait parameter that can plug in two implementations of
Operand
, and choose which of these to use at the Cranelift level. At the point that we are creatingOperand
s, in a pass after lowering to build a linear array of them, we know how many vregs have been used; and earlier stages use fullu32
s to store vreg numbers, so we can support large counts up to that pass. The idea would be to include twoVec
s in VCode --Vec<Operand>
andVec<BigOperand>
or similar -- and fill one or the other in the collection pass, then invoke a statically-monomorphized entry point to regalloc2 with aregalloc2::Function
implementation on one or the other operand kind. (This may require a little trickery to "specialize" to reading from one vec or the other, maybe via an adapter struct, I'm not sure; but seems doable.)This way, we would retain exactly the same behavior and performance as today for all functions that successfully compile today (except for an extra 24 bytes in
VCode
for that emptyVec
); and we could support a full 32-bit range for vregs as well.
Last updated: Dec 23 2024 at 12:05 UTC