elliottt opened PR #5843 from trevor/reuse-postorder-blockorder
to main
:
Rework
BlockLoweringOrder
construction to use the postorder traversal created
by theDominatorTree
. Additionally, construct fewer critical edge splits by
identifying edges up-front that will need splitting, and otherwise allowing
regalloc2 to place moves on either the entry or exit edges of a block.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
elliottt edited PR #5843 from trevor/reuse-postorder-blockorder
to main
:
Rework
BlockLoweringOrder
construction to use the postorder traversal created by theDominatorTree
. Additionally, construct fewer critical edge splits by identifying edges up-front that will need splitting, and otherwise allowing
regalloc2 to place moves on either the entry or exit edges of a block.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
elliottt edited PR #5843 from trevor/reuse-postorder-blockorder
to main
:
Rework
BlockLoweringOrder
construction to use the postorder traversal created by theDominatorTree
. Additionally, construct fewer critical edge splits by identifying edges up-front that will need splitting, and otherwise allowing regalloc2 to place moves on either the entry or exit edges of a block.The meat of this change is in the first two commits. The first commit changes the lowering order implementation, while the second propagates the heuristic of processing child nodes in reverse order during postorder traversal of the CFG back to the DominatorTree's implementation. The second change caused a bit of churn in the dominator tree tests.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
elliottt edited PR #5843 from trevor/reuse-postorder-blockorder
to main
:
Rework
BlockLoweringOrder
construction to use the postorder traversal created by theDominatorTree
. Additionally, construct fewer critical edge splits by identifying edges up-front that will need splitting, and otherwise allowing regalloc2 to place moves on either the entry or exit edges of a block. Identifying critical edges up-front yields fewer critical edges in many cases than the previous algorithm, and produces lowering orders that usually have fewer blocks in them. This change leads to small but measurable improvements in sightglass benchmarks for compilation on bothbz2
andspidermonkey
wasm.The meat of this change is in the first two commits. The first commit changes the lowering order implementation, while the second propagates the heuristic of processing child nodes in reverse order during postorder traversal of the CFG back to the DominatorTree's implementation. The second change caused a bit of churn in the dominator tree tests.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
elliottt updated PR #5843 from trevor/reuse-postorder-blockorder
to main
.
elliottt edited PR #5843 from trevor/reuse-postorder-blockorder
to main
:
Rework
BlockLoweringOrder
construction to use the postorder traversal created by theDominatorTree
. Additionally, construct fewer critical edge splits by identifying edges up-front that will need splitting, and otherwise allowing regalloc2 to place moves on either the entry or exit edges of a block. Identifying critical edges up-front yields fewer critical edges in many cases than the previous algorithm, and produces lowering orders that usually have fewer blocks in them. This change leads to small but measurable improvements in sightglass benchmarks for compilation on bothbz2
andspidermonkey
wasm.The meat of this change is in the first commit, which both changes the block lowering order implementation and propagates a heuristic from the old implementation's postorder construction to the DominatorTree's implementation. This heuristic (reversing the children before pushing them on the work stack) is responsible for much of the test change fallout in this PR, but is important in that without it the new lowering order sees significantly more cache misses on the bz2 benchmark.
<!--Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
elliottt updated PR #5843 from trevor/reuse-postorder-blockorder
to main
.
elliottt edited PR #5843 from trevor/reuse-postorder-blockorder
to main
:
Rework
BlockLoweringOrder
construction to use the postorder traversal created by theDominatorTree
. Additionally, construct fewer critical edge splits by identifying edges up-front that will need splitting, and otherwise allowing regalloc2 to place moves on either the entry or exit edges of a block. Identifying critical edges up-front often yields fewer critical edge splits than the previous algorithm, leading to lowering orders with fewer blocks overall. This change leads to small but measurable improvements in sightglass benchmarks for compilation on bothbz2
andspidermonkey
wasm.The meat of this change is in the first commit, which both changes the block lowering order implementation and propagates a heuristic from the old implementation's postorder construction to the DominatorTree's implementation. This heuristic (reversing the children before pushing them on the work stack) is responsible for much of the test change fallout in this PR, but is important in that without it the new lowering order sees significantly more cache misses on the bz2 benchmark.
<!--Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
This isn't quite the same as the existing logic, though today it evaluates the same: the existing logic is that if there are any fixed args (ie, non-blockparam args), we need to ensure no edge-moves to succs end up in this block. Branch tables with empty tables can result in this, but I'd prefer to encode the actual restriction for robustness (in other words, I want correctness written explicitly rather than an emergent property). Can we take the
f.dfg.insts[inst].opcode().is_branch() && f.dfg.inst_fixed_args(inst).len() > 0
condition and use it here?
cfallin created PR review comment:
Is this the index of the branch, or the index of the target within the single branch that can exist? (I prefer the existing
succ_idx
name for that, if that's what it is)
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Ah, we added this
branch_idx
field a while after deleting the previoussucc_idx
field when we discovered that it didn't work otherwise. You're right, the old name is fine, and the comment that went with it would be helpful here:/// The successor index in this edge, to distinguish multiple /// edges between the same block pair. succ_idx: usize,
jameysharp created PR review comment:
I'm sorry the following comment is long. I couldn't figure out how to make it shorter and still clear.
I agree, I prefer capturing correctness conditions explicitly instead of checking things which accidentally match the right conditions. But I'm not sure the existing code does that.
This condition is written in terms of the CLIF instructions, but it's trying to establish an RA2 precondition on the machine instructions. At this point lowering hasn't happened yet, and the backend could do anything, so it's not obvious to me that there's any way to define the true correctness criteria at this point.
But let's say we make assumptions about what a reasonable backend might do. Even then it's not obvious to me that the number of
inst_fixed_args
captures the condition we'd need to be safe against future CLIF extensions. We could add a weird branch instruction with aValueList
, or add some entirely different way to pass values into a branch.We could make the additional assumption that we won't invent CLIF instructions like that in the future. But at that point I don't see much difference between encoding our assumptions here versus encoding them somewhere else—especially if that "somewhere else" is right at the point where we call
lower_branch
, which it occurs to me is a great place to legalize away these special cases while staying close to RA2-related code.Since anything we do at the level of CLIF still can't guarantee that the backend never breaks the rules, at the end of the day I think we still need to rely on RA2's "disallowed branch arg" check and our fuzzing and tests to catch bugs around this.
I can see one other option: defer deciding where to insert edge splits until after we call
lower_branch
and can do the same check RA2 would do on the machine instructions while they're temporarily buffered inself.ir_insts
. I don't love that, but I think it's the only way to be sure we're checking the true correctness condition. And since this PR separates the computation of RPO from the decision about where to insert edge blocks, I think it's at least not impossible.
cfallin submitted PR review.
cfallin created PR review comment:
The third option, just to record it for completeness here, is to make RA2 handle this edge (pun intended) case properly, by considering a branch's uses to come after edge-moves if they're inserted before the branch. I'm not advocating for that by any means, but I just wanted to note that end of the spectrum exists and is, in some manner of thinking, the cleanest :-)
Thanks for writing out the reasoning here, in more depth than I had considered -- given that we have the backstop of the checks in RA2 itself, running during fuzzing, I'm fine with this as-written for now. So, LGTM with just the above
succ_idx
change.
elliottt has marked PR #5843 as ready for review.
elliottt updated PR #5843 from trevor/reuse-postorder-blockorder
to main
.
elliottt updated PR #5843 from trevor/reuse-postorder-blockorder
to main
.
elliottt created PR review comment:
I changed this back to
succ_idx
, and also documented that it represents the index into the order specified byvisit_block_succs
.
elliottt submitted PR review.
elliottt merged PR #5843.
Last updated: Dec 23 2024 at 13:07 UTC