fitzgen requested cfallin for a review on PR #13267.
fitzgen requested wasmtime-compiler-reviewers for a review on PR #13267.
fitzgen requested wasmtime-core-reviewers for a review on PR #13267.
fitzgen opened PR #13267 from fitzgen:cranelift-simplify-block-terminators to bytecodealliance:main:
This requires two tweaks to the egraph pass's machinery:
We maintain a set of reachable blocks. This is initialized to the entry block. After optimizing a block's intructions, we look at the terminator and mark any other block it branches to as reachable. When visiting blocks to optimize their instructions, we skip unreachable blocks and remove them and their instructions from the function's layout.
We always visit blocks in reverse-post order, rather than dominator-tree order (which is related but slightly weaker). This ensures that we always visit predecessors before successors, and therefore that if a block is not marked reachable, then it really is unreachable and is safe to remove. See code comments for the details of when dominator-tree order breaks down.
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
fitzgen updated PR #13267.
:speech_balloon: cfallin created PR review comment:
In all the new tests we have blockcalls with no args; can we ensure that they all have nontrivial (and different-per-path) block args, so that we verify that e.g. when we fold to the else-branch, we get those args and not the first ones?
:thumbs_up: cfallin submitted PR review:
Nice -- I'm happy to see how this is actually pretty straightforward given all of the primitives that we have! The RPO traversal change makes sense; just a comment below on how we describe/justify it. And a small request for testing. Otherwise LGTM!
:speech_balloon: cfallin created PR review comment:
Can we clarify that RPO is a domtree preorder? (There are many domtree preorders as the order of children is unspecified; there are also many RPOs; but the set of all possible RPOs is a subset of the set of all possible domtree preorders)
This is not just academic but important for correctness, as the domtree preorder visit strategy is needed for the scoped-hashmap GVN to work correctly. So I'd rather describe it as "reverse postorder, which is a valid domtree preorder (required for correctness) that also ensures some direct predecessor of a reachable block is always visited before that block."
:memo: fitzgen submitted PR review.
:speech_balloon: fitzgen created PR review comment:
Can we clarify that RPO _is a_ domtree preorder?
Hm, this is what I initially assumed, but actually isn't quite right.
Consider:
block0(v0: i32, v1: i32): brif v0, block1, block3 block1: brif v1, block2, block3 block2: ... block3: ...RPOs:
- 0, 1, 3, 2
- 0, 1, 2, 3
Dominator tree:
- 0
- 1
- 2
- 3
Dominator-tree orderings:
- 0, 1, 2, 3
- 0, 3, 1, 2
So not all RPOs are dominator-tree orderings and vice versa.
Need to think about whether there is always some RPO that is also a dominator-tree ordering, and, if so, how we can compute it cheaply. I think the spanning tree inside the dominator tree might be able to help us out here. Need to think a bit.
:memo: cfallin submitted PR review.
:speech_balloon: cfallin created PR review comment:
So for that domtree,
0, 1, 3, 2is a valid domtree preorder: domtree preorder just means that a node is visited before its children, and0, 1, 3, 2satisfies that condition for all edges in the domtree (0->1: yes; 1->2: yes; 0->3: yes). The children don't need to be ordered contiguously (i.e. domtree preorder doesn't prescribe DFS or BFS).Here's a proof that all RPOs are domtree preorders:
- Consider any pair of blocks A and B in a CFG such that A dom B.
- Domtree preorder requires that A come before B in the sequence.
- RPO generates a sequence by doing a DFS from entry. This DFS will necessarily, under any path choice, visit A before B (by definition of dominance).
- Thus postorder will emit B before A. Thus reverse postorder will emit A before B.
- Requirements for domtree preorder are met.
The reverse is not true.
:memo: fitzgen submitted PR review.
:speech_balloon: fitzgen created PR review comment:
I guess 0132 is a valid preorder if you consider BFS, but the problem is that BFS doesn't allow us to properly reuse scopes in the GVN map, unless I am misunderstanding.
:repeat: cfallin submitted PR review.
:speech_balloon: cfallin created PR review comment:
Wait, hang on -- that is then a reason to avoid the approach in this PR.
Stated another way: the hard requirement for GVN to work is that we visit dominating blocks before dominated ones; that's "domtree preorder" (loose definition / no order preference). That was my initial point above -- this PR is still correct because the RPO we're doing is in fact still a domtree preorder. But it's not a DFS domtree preorder. Given that, I actually am having trouble thinking through the scope-popping below: how do we know that it will visit the dom-subtree in the right order (i.e. pick an RPO that is a DFS domtree preorder)?
:memo: fitzgen submitted PR review.
:speech_balloon: fitzgen created PR review comment:
Yes, this is exactly what I'm getting at / trying to resolve. Sorry for not being more clear.
:memo: cfallin submitted PR review.
:speech_balloon: cfallin created PR review comment:
Got it -- I'll chew on this some more too. Very subtle!
github-actions[bot] added the label cranelift on PR #13267.
github-actions[bot] added the label isle on PR #13267.
github-actions[bot] commented on PR #13267:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
fitzgen updated PR #13267.
:memo: fitzgen submitted PR review.
:speech_balloon: fitzgen created PR review comment:
Fixed the traversal ordering in 6cfa3c70a6
:memo: fitzgen submitted PR review.
:speech_balloon: fitzgen created PR review comment:
Fixed in f9f5a488b8
fitzgen requested cfallin for a review on PR #13267.
:thumbs_up: cfallin submitted PR review:
Looks great -- thanks for the very detailed proof!
:speech_balloon: cfallin created PR review comment:
OK so after thinking about this more I have two conclusions:
- I think we could potentially come up with RPO that is also a DFS domtree preorder and thus satisfies the requirement here; BUT
that's a lot of complexity and subtlety required for correctness. Is there a reason other than "it's nice to optimize multiple things at once" that we have to skip unreachable blocks here?
(Merely processing the block shouldn't cause side-effects that alter reachable code, I think; even code-motion like LICM doesn't happen until we elaborate, so if we never elab an unreachable subset of the skeleton, I think we're OK. And we can compute reachability separately with a fairly cheap DFS over blocks and their terminators before elaboration; of course we should measure that compile-time cost but hopefully it's small...)
cfallin commented on PR #13267:
(Merge conflict to resolve before this can merge)
:memo: fitzgen submitted PR review.
:speech_balloon: fitzgen created PR review comment:
- that's a lot of complexity and subtlety required for correctness. Is there a reason other than "it's nice to optimize multiple things at once" that we _have_ to skip unreachable blocks here?
(Merely processing the block shouldn't cause side-effects that alter reachable code, I think; even code-motion like LICM doesn't happen until we elaborate, so if we never elab an unreachable subset of the skeleton, I think we're OK. And we can compute reachability separately with a fairly cheap DFS over blocks and their terminators before elaboration; of course we should measure that compile-time cost but hopefully it's small...)(Already addressed and approved, but I missed this comment, so replying now for posterity)
We indeed do not have to skip unreachable blocks, but it is pretty nice to avoid spending time optimizing instructions that can't be reached.
Regarding the subtlety around correctness: I believe we have gotten a pretty good handle on it by factoring out the traversal iterator into its own type (and documented the necessary invariants pretty well in its doc comments), so I'm not really concerned about this subtlety anymore. And I think (hope) you also agree with those statements given your PR approval -- if not, then let's talk more!
:memo: cfallin submitted PR review.
:speech_balloon: cfallin created PR review comment:
Yep, that comment predates the latest review, and I'm generally happy with it now. Thanks!
fitzgen updated PR #13267.
fitzgen has enabled auto merge for PR #13267.
fitzgen updated PR #13267.
fitzgen added PR #13267 Cranelift: Rewrite conditional branches with constant conditions into unconditional jumps to the merge queue.
:check: fitzgen merged PR #13267.
fitzgen removed PR #13267 Cranelift: Rewrite conditional branches with constant conditions into unconditional jumps from the merge queue.
Last updated: Jun 01 2026 at 09:49 UTC