cfallin opened PR #3989 from regalloc2-clean
to main
:
This is a draft PR for now, meant to serve as a discussion-starter. I'll work on splitting this into logically separate commits next week, but wanted to get the initial thing up first.
All tests pass on x86-64, aarch64, and s390x (at least locally on Linux, modulo any CI surprises) and performance numbers from #3942 still apply.
There is a summary of the design changes in this document (I'll turn that into more permanent documentation before this merges).
cfallin edited PR #3989 from regalloc2-clean
to main
:
This is a draft PR for now, meant to serve as a discussion-starter. I'll work on splitting this into logically separate commits next week, but wanted to get the initial thing up first.
All tests pass on x86-64, aarch64, and s390x (at least locally on Linux, modulo any CI surprises) and performance numbers from #3942 still apply.
There is a summary of the design changes in this document (I'll turn that into more permanent documentation before this merges).
Closes #3942.
cfallin edited PR #3989 from regalloc2-clean
to main
:
This is a draft PR for now, meant to serve as a discussion-starter. I'll work on splitting this into logically separate commits next week, but wanted to get the initial thing up first.
All tests pass on x86-64, aarch64, and s390x (at least locally on Linux, modulo any CI surprises) and performance numbers from #3942 still apply.
There is a summary of the design changes in this document (I'll turn that into more permanent documentation before this merges).
Closes #3942.
Requires bytecodealliance/regalloc2#38 and subsequent crate version bump/release.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
How to enable the regalloc checker without this option? And what about potential other regallocs?
cfallin submitted PR review.
cfallin created PR review comment:
what about potential other regallocs
Right now regalloc2 doesn't have any other algorithms, but we could certainly add this option back if it gains any. (And, for maximal clarity here, retaining the existing options is unfortunately not really possible because regalloc.rs is so different in certain ways from regalloc2 -- we can't make it a runtime option, we have to switch over all at once.)
How to enable the regalloc checker
I actually need to do a bit more work if we want the checker enabled in Cranelift (as opposed to in the regalloc2 fuzzing harness) -- it doesn't support pinned vregs currently, which the Cranelift glue uses to model RealRegs. I could do that, I just haven't had time yet; not sure if it should block an initial merge or not (open to opinions on this...).
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin has marked PR #3989 as ready for review.
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin edited PR #3989 from regalloc2-clean
to main
:
This is a draft PR for now, meant to serve as a discussion-starter. I'll work on splitting this into logically separate commits next week, but wanted to get the initial thing up first.
All tests pass on x86-64, aarch64, and s390x (at least locally on Linux, modulo any CI surprises) and performance numbers from #3942 still apply.
There is a summary of the design changes in this document (I'll turn that into more permanent documentation before this merges).
Closes #3942.
~Requires bytecodealliance/regalloc2#38 and subsequent crate version bump/release.~ (done.)
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Can you keep this when debug assertions are enabled?
bjorn3 created PR review comment:
Will the disassembly now show virtual registers or still use physical registers?
bjorn3 submitted PR review.
bjorn3 edited PR review comment.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Why did you remove those?
bjorn3 created PR review comment:
Same, why are these removed?
bjorn3 submitted PR review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
I see it still uses physical registers.
cfallin submitted PR review.
cfallin created PR review comment:
This actually became a significant O(n^2) slowdown in debug builds because of the additional edge blocks introduced in the block-lowering order, and was causing some test timeouts. IMHO we've tested and relied on this code enough over the past almost-2-years that enabling the checks only in fuzzing and not in all debug builds is probably OK...
cfallin submitted PR review.
cfallin created PR review comment:
Ah, regalloc2 currently doesn't have serde support so I removed this to get things building, and then forgot to come back to it later... I'll add this.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
/// physical registers: this means that they are always constrained
fitzgen created PR review comment:
impl From<VReg> for Reg
andimpl From<Reg> for VReg
? This is a little bit more idiomatic and allows using it in generics, plus you get the correspondingInto
blanket impl.
fitzgen created PR review comment:
Should this debug assert that the allocation matches the real reg?
fitzgen created PR review comment:
Rather than open coding the
< PINNED_VREGS
logic here and elsewhere, can we define a helper and make this something likeif is_pinned(self.0.vreg()) { Some(RealReg(self.0)) } else { None }
and similar below. This way if we change how we encode pinned registers, we don't need to update a bunch of different open-coded places.
fitzgen submitted PR review.
fitzgen created PR review comment:
FWIW, we see ~60x slowdown when fuzzing and only have ~1 minute per fuzz input (so roughly 1s budget on non-fuzz builds), so it seems semi-likely that these checks will be too expensive for fuzzing as well if they are too expensive for
debug_assert!
.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
Can we do
enum VCodeDirection { Forwards, Backwards, }
to avoid these boolean arguments that need comments explaining what they are?
fitzgen created PR review comment:
I could see this API being hard to implement with a slice return. For example, if an inst variant is like
Foo { clobber_a: Writable<Reg>, clobber_b: Writable<Reg> },
then you can't actually return both
clobber_{a,b}
as a slice, since you have no vec or array to borrow from.Perhaps this should be a
SmallVec
return? Alternatively, the caller could pass in a&mut Vec
and let the callee fill it, and the caller could reuse the same space for every inst.
fitzgen created PR review comment:
This seems like a sketchy API given that multiple types can lower to the same register class (ie both vector types and floats map to the same register class). How do callees know which type to return? How can callers communicate their expectations?
fitzgen created PR review comment:
Don't want to factor this out into ranges within a single allocation like other bits have done?
// From instruction index to `(start, end)` indices of the clobbers in `clobber_data`. clobbers: FxHashMap<InsnIndex, (u32, u32)>, clobber_data: Vec<PReg>,
fitzgen created PR review comment:
Also all of these
(start, end)
index pairs could beu32
s (aVec
can only holdusize::MAX / 2
items anyways; in practice we won't get anywhere near that either) and we will reduce space further.
fitzgen created PR review comment:
Should this method invert
self.direction
(neself.backward
)? Seems like things could otherwise get out of sync.
fitzgen created PR review comment:
Make this an
Option<String>
for clarity? I know an empty string won't heap allocate, but the API is confusing that this is a "pretty-printed disassembly, if any" but is non-optional.
fitzgen created PR review comment:
Similar story here: we can have a single
vregs: Vec<regalloc2::VReg>
that both block args and block params index into.
fitzgen created PR review comment:
/// This builder has the ability to accept instructions in either
fitzgen created PR review comment:
Might as well consolidate
block_preds
andblock_succs
into a single allocation, no? We never iterate over all blocks that are a predecessor to any other block, only a particular block's predecessors after we've grabbed that sub-range, and similar for successors, so it seems like they could all live in the same backingVec<regalloc2::Block>
and have just one big allocation rather than two big allocations.
fitzgen created PR review comment:
This pattern is cropping up repeatedly all over the place. Might make sense to define a
disasm!(state, "format string", format_args...)
macro to encapsulate it.
fitzgen created PR review comment:
Split into
debug_info: FxHashMap<ValueLabel, (u32, u32)>, debug_info_data: Vec<(InsnIndex, InsnIndex, VReg)>,
like elsewhere?
cfallin created PR review comment:
This one is a bit of an interesting case: it's factoring out from ad-hoc code in several places that was generating loads, stores, or moves of whole registers by choosing some appropriate type. In some sense it's a problem caused earlier in the design by writing a bunch of instruction helpers (for moves, loads, stores) that can only take
Type
s, notRegClass
s, so when we drop down to the regalloc abstraction level and care only about whole registers, but need to insert instructions for data movement, we need to translate back.I'd be happy to rename this to something like
type_that_fully_contains_rc
orcanonical_type_for_rc
or somesuch to make that clearer, if you'd prefer!
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
I was starting to write an answer here discussing allocations and how I'd prefer to avoid them, and the tradeoffs vs flexibility and maybe
Cow
but that's awkward, etc., then realized we've been down this road before -- we have all the same tradeoffs for ordinary operands. So I think maybe the better answer is to actually go through the operand-collector interface (reg_clobber
alongsidereg_use
,reg_def
and friends), what do you think?
cfallin submitted PR review.
cfallin created PR review comment:
The intent was for it to be called only once, not for it to be possible to flip directions repeatedly; the latter would certainly break many invariants, even if we try to invert
self.direction
!This was meant to be enforced by (i)
reverse
being a private helper method, and (ii)reverse
only being called by the build-function that consumesself
. I'm happy to namereverse
something better, likereverse_backwards_emission_and_finalize
or something like that...
fitzgen submitted PR review.
fitzgen created PR review comment:
Yes,
reg_clobber
makes a lot of sense.
fitzgen submitted PR review.
fitzgen created PR review comment:
I guess
canonical_type_for_rc
makes sense.
fitzgen submitted PR review.
fitzgen created PR review comment:
reverse_and_finalize
? Maybe also some comments for the method explaining that the builder can't be used again after this method is called.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
Sort of strange API-wise to give a
Vec
here -- is there a reason we aren't giving a&[Writable<RealReg>]
? Do impls take ownership/mutate/grow the vec internally?
fitzgen created PR review comment:
ditto
fitzgen created PR review comment:
ditto
fitzgen created PR review comment:
ditto
fitzgen submitted PR review.
fitzgen created PR review comment:
I think we can remove the
mov_mitosis
function now as well.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
/// this for xsp / xzr; we have two special registers for those.
cfallin updated PR #3989 from regalloc2-clean
to main
.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
Ah okay, we remove it in this commit.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
This is out of date now. Not sure whether we want to keep this whole doc around or not?
fitzgen submitted PR review.
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Fixed!
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin created PR review comment:
Good idea; I've refactored in this direction. This particular helper (
pinned_vreg_to_preg
) is the canonical helper that defines the index split, along withfirst_user_vreg_index()
below (index needed for vreg index allocation during lowering, but exposing it with this name only makes its intended use more clear).
cfallin submitted PR review.
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
Yes, good point, done!
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Done! The
Forward
option is actually dead, which totally makes sense, because we... don't have any forward VCode-to-VCode passes yet. I expect we will eventually, but IMHO it makes more sense to omit the enum arm (with a TODO explaining why) so that we can add it back only when we adequately test it.(I'd just like to note as an aside that the dead-code analysis working to find unconstructed enum variants is excellent.)
cfallin submitted PR review.
cfallin created PR review comment:
Done! This is also actually dead right now, as calls use phantom defs (a legacy from regalloc.rs). I'll create a followup issue to track using actual clobbers, which should be very marginally faster.
cfallin submitted PR review.
cfallin created PR review comment:
#4027
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
Done, good idea, thanks!
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
Here it's actually a little more subtle: we append to both of these separately while building a block, so it'd take a bit more state-tracking to append to one consolidated array and keep the ranges straight. I think it's technically possible but it's a bit error-prone and I think I'd prefer to keep the two arrays separate for now.
That said, it might be nice to eventually have a typesafe wrapper around this idiom -- maybe with multiple range-lists pointing into one pool-list and encoding that only one appender is outstanding at a time. Building that sounds like a nice followup task at some point.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
Fixed!
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
Done, with a closure.
cfallin submitted PR review.
cfallin created PR review comment:
The reason is to avoid an allocation: when we compute it, we incrementally build it in a
Vec
; the only use is to callset_clobbered
, and the ABI object needs to subsequently retain it. So if we passed a slice we'd force a copy. Slightly non-idiomatic for the general API case but here the method just ties together two specific points in the code :-)
cfallin submitted PR review.
cfallin created PR review comment:
Likewise, the instruction retains the register list, and we build it in a
Vec
in at least one callsite, so we want to avoid the re-allocation/copy here.
cfallin submitted PR review.
cfallin created PR review comment:
Fixed!
cfallin created PR review comment:
Ah, here we actually append to arbitrary lists as we go, i.e. the ranges are not contiguous, so I think we need to keep separate vecs. We can't do the alternative "build a commingled pool then sort" trick either, as we need to examine the in-progress sequence when we append new content.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Fixed!
cfallin submitted PR review.
cfallin created PR review comment:
Fixed!
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Yeah, it was mostly just to help the reviewers (ie, you) as a delta, rather than a snapshot of the whole design, so I'll go ahead and delete it I think. It'll still be around in my branch (I'll delete it in a separate commit) if anyone wants to dig it up later.
cfallin submitted PR review.
cfallin created PR review comment:
Filed #4028 for this.
cfallin submitted PR review.
cfallin created PR review comment:
Filed bytecodealliance/regalloc2#42 for this.
cfallin updated PR #3989 from regalloc2-clean
to main
.
cfallin merged PR #3989.
Last updated: Dec 23 2024 at 12:05 UTC