wasmtime / PR #5843 Reuse the DominatorTree postorder tra... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #5843 Reuse the DominatorTree postorder tra...

Wasmtime GitHub notifications bot (Feb 20 2023 at 20:02):

elliottt opened PR #5843 from trevor/reuse-postorder-blockorder to main:

Rework BlockLoweringOrder construction to use the postorder traversal created
by the DominatorTree. Additionally, construct fewer critical edge splits by
identifying edges up-front that will need splitting, and otherwise allowing
regalloc2 to place moves on either the entry or exit edges of a block.

Wasmtime GitHub notifications bot (Feb 20 2023 at 20:03):

elliottt edited PR #5843 from trevor/reuse-postorder-blockorder to main:

Rework BlockLoweringOrder construction to use the postorder traversal created by the DominatorTree. Additionally, construct fewer critical edge splits by identifying edges up-front that will need splitting, and otherwise allowing
regalloc2 to place moves on either the entry or exit edges of a block.

Wasmtime GitHub notifications bot (Feb 20 2023 at 20:05):

elliottt edited PR #5843 from trevor/reuse-postorder-blockorder to main:

Rework BlockLoweringOrder construction to use the postorder traversal created by the DominatorTree. Additionally, construct fewer critical edge splits by identifying edges up-front that will need splitting, and otherwise allowing regalloc2 to place moves on either the entry or exit edges of a block.

The meat of this change is in the first two commits. The first commit changes the lowering order implementation, while the second propagates the heuristic of processing child nodes in reverse order during postorder traversal of the CFG back to the DominatorTree's implementation. The second change caused a bit of churn in the dominator tree tests.

Wasmtime GitHub notifications bot (Feb 20 2023 at 20:08):

elliottt edited PR #5843 from trevor/reuse-postorder-blockorder to main:

Rework BlockLoweringOrder construction to use the postorder traversal created by the DominatorTree. Additionally, construct fewer critical edge splits by identifying edges up-front that will need splitting, and otherwise allowing regalloc2 to place moves on either the entry or exit edges of a block. Identifying critical edges up-front yields fewer critical edges in many cases than the previous algorithm, and produces lowering orders that usually have fewer blocks in them. This change leads to small but measurable improvements in sightglass benchmarks for compilation on both bz2 and spidermonkey wasm.

The meat of this change is in the first two commits. The first commit changes the lowering order implementation, while the second propagates the heuristic of processing child nodes in reverse order during postorder traversal of the CFG back to the DominatorTree's implementation. The second change caused a bit of churn in the dominator tree tests.

Wasmtime GitHub notifications bot (Feb 20 2023 at 20:14):

elliottt updated PR #5843 from trevor/reuse-postorder-blockorder to main.

Wasmtime GitHub notifications bot (Feb 20 2023 at 20:16):

elliottt edited PR #5843 from trevor/reuse-postorder-blockorder to main:

Rework BlockLoweringOrder construction to use the postorder traversal created by the DominatorTree. Additionally, construct fewer critical edge splits by identifying edges up-front that will need splitting, and otherwise allowing regalloc2 to place moves on either the entry or exit edges of a block. Identifying critical edges up-front yields fewer critical edges in many cases than the previous algorithm, and produces lowering orders that usually have fewer blocks in them. This change leads to small but measurable improvements in sightglass benchmarks for compilation on both bz2 and spidermonkey wasm.

The meat of this change is in the first commit, which both changes the block lowering order implementation and propagates a heuristic from the old implementation's postorder construction to the DominatorTree's implementation. This heuristic (reversing the children before pushing them on the work stack) is responsible for much of the test change fallout in this PR, but is important in that without it the new lowering order sees significantly more cache misses on the bz2 benchmark.

Wasmtime GitHub notifications bot (Feb 20 2023 at 23:18):

elliottt updated PR #5843 from trevor/reuse-postorder-blockorder to main.

Wasmtime GitHub notifications bot (Feb 20 2023 at 23:54):

elliottt edited PR #5843 from trevor/reuse-postorder-blockorder to main:

Rework BlockLoweringOrder construction to use the postorder traversal created by the DominatorTree. Additionally, construct fewer critical edge splits by identifying edges up-front that will need splitting, and otherwise allowing regalloc2 to place moves on either the entry or exit edges of a block. Identifying critical edges up-front often yields fewer critical edge splits than the previous algorithm, leading to lowering orders with fewer blocks overall. This change leads to small but measurable improvements in sightglass benchmarks for compilation on both bz2 and spidermonkey wasm.

The meat of this change is in the first commit, which both changes the block lowering order implementation and propagates a heuristic from the old implementation's postorder construction to the DominatorTree's implementation. This heuristic (reversing the children before pushing them on the work stack) is responsible for much of the test change fallout in this PR, but is important in that without it the new lowering order sees significantly more cache misses on the bz2 benchmark.

Wasmtime GitHub notifications bot (Feb 21 2023 at 21:55):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 21 2023 at 21:55):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 21 2023 at 21:55):

cfallin created PR review comment:

This isn't quite the same as the existing logic, though today it evaluates the same: the existing logic is that if there are any fixed args (ie, non-blockparam args), we need to ensure no edge-moves to succs end up in this block. Branch tables with empty tables can result in this, but I'd prefer to encode the actual restriction for robustness (in other words, I want correctness written explicitly rather than an emergent property). Can we take the f.dfg.insts[inst].opcode().is_branch() && f.dfg.inst_fixed_args(inst).len() > 0 condition and use it here?

Wasmtime GitHub notifications bot (Feb 21 2023 at 21:55):

cfallin created PR review comment:

Is this the index of the branch, or the index of the target within the single branch that can exist? (I prefer the existing succ_idx name for that, if that's what it is)

Wasmtime GitHub notifications bot (Feb 22 2023 at 02:03):

jameysharp submitted PR review.

Wasmtime GitHub notifications bot (Feb 22 2023 at 02:03):

jameysharp submitted PR review.

Wasmtime GitHub notifications bot (Feb 22 2023 at 02:03):

jameysharp created PR review comment:

Ah, we added this branch_idx field a while after deleting the previous succ_idx field when we discovered that it didn't work otherwise. You're right, the old name is fine, and the comment that went with it would be helpful here:
        /// The successor index in this edge, to distinguish multiple
        /// edges between the same block pair.
        succ_idx: usize,

Wasmtime GitHub notifications bot (Feb 22 2023 at 02:03):

jameysharp created PR review comment:

I'm sorry the following comment is long. I couldn't figure out how to make it shorter and still clear.

I agree, I prefer capturing correctness conditions explicitly instead of checking things which accidentally match the right conditions. But I'm not sure the existing code does that.

This condition is written in terms of the CLIF instructions, but it's trying to establish an RA2 precondition on the machine instructions. At this point lowering hasn't happened yet, and the backend could do anything, so it's not obvious to me that there's any way to define the true correctness criteria at this point.

But let's say we make assumptions about what a reasonable backend might do. Even then it's not obvious to me that the number of inst_fixed_args captures the condition we'd need to be safe against future CLIF extensions. We could add a weird branch instruction with a ValueList, or add some entirely different way to pass values into a branch.

We could make the additional assumption that we won't invent CLIF instructions like that in the future. But at that point I don't see much difference between encoding our assumptions here versus encoding them somewhere else—especially if that "somewhere else" is right at the point where we call lower_branch, which it occurs to me is a great place to legalize away these special cases while staying close to RA2-related code.

Since anything we do at the level of CLIF still can't guarantee that the backend never breaks the rules, at the end of the day I think we still need to rely on RA2's "disallowed branch arg" check and our fuzzing and tests to catch bugs around this.

I can see one other option: defer deciding where to insert edge splits until after we call lower_branch and can do the same check RA2 would do on the machine instructions while they're temporarily buffered in self.ir_insts. I don't love that, but I think it's the only way to be sure we're checking the true correctness condition. And since this PR separates the computation of RPO from the decision about where to insert edge blocks, I think it's at least not impossible.

Wasmtime GitHub notifications bot (Feb 22 2023 at 02:09):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 22 2023 at 02:09):

cfallin created PR review comment:

The third option, just to record it for completeness here, is to make RA2 handle this edge (pun intended) case properly, by considering a branch's uses to come after edge-moves if they're inserted before the branch. I'm not advocating for that by any means, but I just wanted to note that end of the spectrum exists and is, in some manner of thinking, the cleanest :-)

Thanks for writing out the reasoning here, in more depth than I had considered -- given that we have the backstop of the checks in RA2 itself, running during fuzzing, I'm fine with this as-written for now. So, LGTM with just the above succ_idx change.

Wasmtime GitHub notifications bot (Feb 22 2023 at 16:28):

elliottt has marked PR #5843 as ready for review.

Wasmtime GitHub notifications bot (Feb 23 2023 at 21:12):

elliottt updated PR #5843 from trevor/reuse-postorder-blockorder to main.

Wasmtime GitHub notifications bot (Feb 23 2023 at 21:19):

elliottt updated PR #5843 from trevor/reuse-postorder-blockorder to main.

Wasmtime GitHub notifications bot (Feb 23 2023 at 22:00):

elliottt created PR review comment:

I changed this back to succ_idx, and also documented that it represents the index into the order specified by visit_block_succs.

Wasmtime GitHub notifications bot (Feb 23 2023 at 22:00):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 23 2023 at 22:49):

elliottt merged PR #5843.

Last updated: Apr 17 2025 at 17:03 UTC