alexcrichton transferred Issue #627:
I've done some performance analysis of cranelift that shows handling of live values, particularly the
partition_slice
function, can be hot. For example, in theoryol-quad
benchmark the live value list frequently has over 500 values in it, and having to repeatedly partition it is slow.I have a sketch of a redesign for
LiveValueVec
that might speed things up. It's a bit complicated becauseLiveValueVec
gets used in three different ways.First, in
LiveValueTracker::ebb_top
, theLiveValueVec
is first cleared. Then we add the live_in values, followed by the args. We subsequently iterate over the live_in values, then the args. So this suggests a structure like so:live_ins: Vec<LiveValue>, args: Vec<LiveValue>Second, within
process_inst
, we need to partition theLiveValue
s by endpoint (givingthroughs
andkills
) and then add zero, one or twodefs
. This suggests a structure like so:throughs: FxHashMap<Inst, Vec<LiveValue>>, kills: Vec<LiveValue>, defs: Vec<LiveValue>,
throughs
is pre-partitioned -- allLiveValue
s that share the same endpoint will be in the sameVec
. That way partitioning it to extract all theLiveValue
s with a particular endpoint is just a hash table removal. That removedVec
then becomeskills
. And we can just append the defs.Third, once
drop_dead
exits, we don't need thethroughs
/kills
/defs
distinction, so we would just combine those three pieces into a single piece like so:values: FxhashMap<Inst, Vec<LiveValue>>This is then ready for the next call to
process_inst
.I hope this will speed things up because partitioning is an O(1) operation (a hash table lookup) rather than O(n). But there will be more allocations for the nested data structure, and there's some fiddling about to convert between the three forms as necessary. Also, it's not entirely clear to me what the best way to represent this is... maybe an enum with three variants?
Last updated: Jan 24 2025 at 00:11 UTC