jameysharp opened issue #8529:
Feature
Currently, the egraph optimization pass first generates a bunch of alternative instructions, and then does a separate "elaboration" pass to decide which of those instructions to emit and in what order. Then, during lowering, we make some related decisions about which instructions to emit, but we have different information available.
I propose to delay the work of the elaboration pass until lowering and fuse it into the backwards lowering pass.
Benefit
This is motivated by #7321 where we wanted to move some common patterns from backend-specific lowering rules to the legalization phase, but the elaboration pass undid the legalization steps, producing code which the backend then couldn't compile. By making backend-specific lowering rules decide which alternative to pick from each e-class, we avoid that problem.
Independent of that, I suspect there are benefits to compile time from combining two passes over the CLIF into one.
Also, an e-class represents a proof about certain kinds of dataflow facts. Having those proofs available during instruction selection might allow us to select better instructions sometimes.
Implementation
We'd do this by allowing the pattern-matches in
lower
to iterate over all the equivalent instructions, then take the first combination which makes a rule match. If multiple rules match a particular combination, take the highest priority rule. When iterating over an e-class we'd sort by best cost first, which ensures we pick the best implementation according to our cost model. Note that if an alternative with worse cost matches a higher-priority rule, this plan means we'll pick a lower-priority rule; that's a little confusing but I think it's the right choice.When the egraph pass is disabled, this behaves exactly like current lowering, because then there are no union nodes in the data flow graph, so each e-class is a singleton set.
Lowering already tracks which CLIF values are used by later instructions which have been lowered, and skips lowering pure instructions whose results are unused. Elaboration has the same effect, except it combines it with instruction scheduling.
To remove the elaboration step we'd need to either:
- add all alternative instructions to the layout during the egraph pass while respecting dominance, or
- allow lowering to add instructions to the layout (which acts as the work-queue for lowering) as it goes, using the rules that elaboration currently implements.
Conceptually, I like option 1 because it would be great to be able to see all the alternatives that the egraph pass generated when we print CLIF for debugging, and it might be nice to separate instruction scheduling into a new pass. But practically I think option 2 is straightforward and more efficient.
Alternatives
I would love to say that the cost model should also be part of lowering, but that is expensive to do precisely, and the dynamic-programming heuristic approximation has to be a forward pass, while lowering and elaboration have to be a backward pass. So, at least for now, the egraph pass should still compute a "best cost" for each value based on a cost model that is much simpler than the full lowering rules.
cc: @alexcrichton @cfallin
cfallin commented on issue #8529:
This would be really neat! I've definitely thought that the backend's lowering rules serve as the best sort of cost function actually (some thoughts in my slides slide 185): they naturally encapsulate which units of work we can do as one machine instruction or small group of instructions, and we've even encoded preferences with priorities already. (I suspect that we may actually not have too many cost vs. priority conflicts that you describe -- we tend to write higher-priority rules when we know it really is the best option, i.e., incentives are aligned here -- but that's just a hunch.)
The main hurdle at least for me thinking through this is the pass-direction issue: elaboration is a forward pass, while lowering is backward. The main aspect of elaboration that requires forward-direction processing is the implicit GVN: we need to see a block before its domtree subtree so any values computed higher in the domtree are in the scoped hashmap for subtree blocks. That combined with the multi-choice behavior of eclasses (if we no longer extract first) feels like it might encode some nonlocal optimization problem, but I'm not sure. In any case, we should see if we can rethink elab as a backward pass while preserving GVN.
cfallin edited a comment on issue #8529:
This would be really neat! I've definitely thought that the backend's lowering rules serve as the best sort of cost function actually (some thoughts in my slides slide 185 -- I should've filed an issue too, sorry): they naturally encapsulate which units of work we can do as one machine instruction or small group of instructions, and we've even encoded preferences with priorities already. (I suspect that we may actually not have too many cost vs. priority conflicts that you describe -- we tend to write higher-priority rules when we know it really is the best option, i.e., incentives are aligned here -- but that's just a hunch.)
The main hurdle at least for me thinking through this is the pass-direction issue: elaboration is a forward pass, while lowering is backward. The main aspect of elaboration that requires forward-direction processing is the implicit GVN: we need to see a block before its domtree subtree so any values computed higher in the domtree are in the scoped hashmap for subtree blocks. That combined with the multi-choice behavior of eclasses (if we no longer extract first) feels like it might encode some nonlocal optimization problem, but I'm not sure. In any case, we should see if we can rethink elab as a backward pass while preserving GVN.
Last updated: Nov 22 2024 at 16:03 UTC