alexcrichton transferred Issue #627:
I've done some performance analysis of cranelift that shows handling of live values, particularly the
partition_slicefunction, can be hot. For example, in theoryol-quadbenchmark the live value list frequently has over 500 values in it, and having to repeatedly partition it is slow.I have a sketch of a redesign for
LiveValueVecthat might speed things up. It's a bit complicated becauseLiveValueVecgets used in three different ways.First, in
LiveValueTracker::ebb_top, theLiveValueVecis first cleared. Then we add the live_in values, followed by the args. We subsequently iterate over the live_in values, then the args. So this suggests a structure like so:live_ins: Vec<LiveValue>, args: Vec<LiveValue>Second, within
process_inst, we need to partition theLiveValues by endpoint (givingthroughsandkills) and then add zero, one or twodefs. This suggests a structure like so:throughs: FxHashMap<Inst, Vec<LiveValue>>, kills: Vec<LiveValue>, defs: Vec<LiveValue>,
throughsis pre-partitioned -- allLiveValues that share the same endpoint will be in the sameVec. That way partitioning it to extract all theLiveValues with a particular endpoint is just a hash table removal. That removedVecthen becomeskills. And we can just append the defs.Third, once
drop_deadexits, we don't need thethroughs/kills/defsdistinction, so we would just combine those three pieces into a single piece like so:values: FxhashMap<Inst, Vec<LiveValue>>This is then ready for the next call to
process_inst.I hope this will speed things up because partitioning is an O(1) operation (a hash table lookup) rather than O(n). But there will be more allocations for the nested data structure, and there's some fiddling about to convert between the three forms as necessary. Also, it's not entirely clear to me what the best way to represent this is... maybe an enum with three variants?
Last updated: Dec 06 2025 at 05:03 UTC