wasmtime / issue #5843 Reuse the DominatorTree postorder ... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #5843 Reuse the DominatorTree postorder ...

Wasmtime GitHub notifications bot (Feb 20 2023 at 20:47):

I'll review this in detail tomorrow, but one thing I think would be good to explore / understand (if only because I sometimes find that fully explaining why a problem occurred leads to uncovering other surprises, or bugs, or ... basically, Chesterton's Fence): what cases did the existing code not cover that this does?

I'm somewhat surprised because the existing implementation already was supposed to be creating lowered blocks only when necessary: its Orig / Edge / EdgeAndOrig / OrigAndEdge descriptors allow for merging the edge-block into either pred or succ, and for any non-critical edge (such that either pred or succ has at most one succ or pred, respectively), this merging should be possible. In other words, the problem description here "now we only create edge blocks for critical edges" is surprising to me because we were supposed to already be doing this. Was there a case missing or overly conservative, for example? Or is the new implementation not accounting for something?

I see that tests are failing with RA2 rejecting some lowered code ("disallowed branch arg", which occurs when the CFG forces edge moves to the pred side of an edge but the pred ends in a conditional: in this case, we have nowhere to put the moves for just one side of the conditional), which IMHO means we should be extra-careful in understanding exactly what the difference is and what new cases we are now allowing!

Wasmtime GitHub notifications bot (Feb 20 2023 at 23:27):

elliottt commented on issue #5843:

The RA2 error was from treating br_table with an empty jump table as an unconditional jump. I ported forward the workaround from the previous implementation, which was to mark blocks that end in br_table as having at least 2 outgoing edges.

Wasmtime GitHub notifications bot (Feb 20 2023 at 23:52):

elliottt commented on issue #5843:

There were a few different motivations for this PR, that came out of discussions with @jameysharp, and recent refactorings that get us to a state where we can now assume all branch instructions terminate basic blocks.

Starting with the latter first, I was idly investigating what it would take to have the output of BlockLoweringOrder::succ_indices return a single instruction along with a slice of the indices it branches to. This would allow simplifications in a few other places where we jump through hoops to assert that they are all the same instruction now. This was a pretty daunting refactoring with the previous implementation, as the LoweredBlock type makes use of the branch instruction in many places, and restructuring metadata used during the construction of the postorder traversal resisted multiple attempts by both me and @jameysharp.

As to the former, @jameysharp and I had been discussing how it would be nice to reuse the postorder traversal that's cached by the DominatorTree to drive the block lowering order. This is nice for a few reasons: it's not unnecessarily recomputing the order which while cheap to compute is not nothing; it allows us to specify explicitly that we only split critical edges, which was easier for me to think about than the state machine in the current implementation.

Additionally this implementation allows us to avoid splitting edges for unconditional jumps leaving one block but targeting another that has multiple inputs, which in some cases does produce lowering orders with fewer critical edge nodes. One concrete case is the diamond from the test suite: an entry with a brif whose target blocks both branch to the same exit node. Previously that would introduce two edge blocks on the unconditional branches to the exit block, but with this implementation those introduced nodes are not necessary. Investigating how the LoweredBlock values are used by the RA2 interface is what led @jameysharp and I to believe that this would be a reasonable change to make, as it treats all LoweredBlock cases the same unless they are the pure Edge case.

Wasmtime GitHub notifications bot (Feb 21 2023 at 00:23):

cfallin commented on issue #5843:

One concrete case is the diamond from the test suite: an entry with a brif whose target blocks both branch to the same exit node. Previously that would introduce two edge blocks on the unconditional branches to the exit block, but with this implementation those introduced nodes are not necessary.

Hmm... so, as silly as it may seem in the minimal example, in general we do actually need separate edge blocks here, because of blockparams. Specifically, imagine that each edge from A to B assigned a different list of values to the blockparams in B: then RA2 needs a location to do that permutation for each edge. This is why the definition of edge-criticality in the existing code has to do with number of successors and predecessors, without collapsing.

Wasmtime GitHub notifications bot (Feb 21 2023 at 00:33):

elliottt commented on issue #5843:

One concrete case is the diamond from the test suite: an entry with a brif whose target blocks both branch to the same exit node. Previously that would introduce two edge blocks on the unconditional branches to the exit block, but with this implementation those introduced nodes are not necessary.

Hmm... so, as silly as it may seem in the minimal example, in general we _do_ actually need separate edge blocks here, because of blockparams. Specifically, imagine that each edge from A to B assigned a different list of values to the blockparams in B: then RA2 needs a location to do that permutation for each edge. This is why the definition of edge-criticality in the existing code has to do with number of successors and predecessors, without collapsing.

I think that's still easier to implement in the context of these changes, as we're currently adding an edge block only for critical edges; it would be really easy to rephrase this to add an edge block when there's a critical edge, or the successor block has multiple predecessors.

Wasmtime GitHub notifications bot (Feb 21 2023 at 01:35):

cfallin commented on issue #5843:

I think that's still easier to implement in the context of these changes, as we're currently adding an edge block only for critical edges; it would be really easy to rephrase this to add an edge block when there's a critical edge, or the successor block has multiple predecessors.

That's possibly equivalent to the current code, but phrased in a (to me) more confusing way. Given a CFG with two edges from A to B, these edges are critical edges; no need to distinguish the special case with multiple preds. Strictly speaking I guess one needs to see the CFG as a hypergraph (every edge has an identity) -- and this is why the edges use branch instruction IDs to disambiguate.

That suggests a different fix to me: why not start with the current code, but use successor-index, rather than branch instruction, to disambiguate edges?

Given the above fix, I suspect this PR would produce exactly equivalent code to the current code. I'm interested to see whether this is actually the case though.

Wasmtime GitHub notifications bot (Feb 21 2023 at 21:02):

jameysharp commented on issue #5843:

I'd really like to drop the "Treat br_table with an empty jump table as multiple block exits" workaround. The problem it's dealing with is that the input to such instructions is a register use that RA2 would have to account for when emitting moves between blocks. (I didn't understand that when looking at this with Trevor last week.) However in the only case where this can occur, that use is effectively dead: it can't have any effect on the execution of the program. I don't know if we should just make it illegal to have a br_table with an empty jump table (you can always replace it with a jump), or do a legalization pass at some point, but I don't like kludging the block-lowering order as a "solution".

Regarding the edge-splits that this PR removes compared to the existing implementation: The current implementation supports fusing EdgeAndOrig or OrigAndEdge pairs, but if both fusions are legal it picks only one instead of emitting something like EdgeAndOrigAndEdge. That's the case this PR improves on. This is mentioned in the module comment:
//! (note that the edges into CLIF blocks 1 and 2 could be merged with those
//! blocks' original bodies, but the out-edges could not because for simplicity
//! in the successor-function definition, we only ever merge an edge onto one
//! side of an original CLIF block.)
The key observation, I think, is that RA2 can't actually distinguish between Orig, EdgeOrig, and OrigEdge blocks. The only two kinds of blocks we hand it in machinst/lower.rs are either pure critical-edge splitting blocks without any original instructions in them, or original blocks where RA2 may safely add moves to the beginning and end as needed. This PR makes that distinction explicit.

I think you two are talking past each other regarding which edges are critical.

I think the case Chris is talking about is if there are only two blocks (A and B), and A has two edges to B. In that case, yes, both of those edges are critical: they're both edges from a block with multiple successors and to a block with multiple predecessors. As you pointed out, the two edges may have different block parameters so we can't in general place the appropriate moves in either A or B. Those predecessors and successors aren't distinct if we look only at block IDs, but they are distinct edges, and that's what this PR looks at.

The case which Trevor is talking about is when there are four blocks (A, B, C, D), with a conditional branch from A to either B or C, and then B and C both unconditionally branch to D. In that case there are no critical edges, even with block parameters. All of the moves can be attached to block B or C, because they each have exactly one predecessor and one successor. The existing implementation inserts unnecessary edge splits in this case.

I believe this PR already captures the definition that you intended, Chris. It's much simpler than the existing implementation, in my opinion, because it only needs to iterate over all the edges and decide whether each edge is critical or not. That decision is simple after counting all the predecessors and successors of each block.

The only tricky bit in this PR, I think, is in deciding where to insert the edge-splitting blocks in the pre-computed RPO. This PR places them immediately after their predecessors, if I remember correctly, but several other choices would be equally reasonable, I think. There's no "good" choice, as far as I can figure, but this is one choice that I think still produces a valid RPO, which is a reasonable heuristic.

Last updated: Apr 17 2025 at 10:03 UTC