jameysharp opened issue #6106:
Feature
We should allow the egraph pass to rewrite instructions which have no result values.
This resembles #5908, in that no-result instructions have side effects and that currently prevents us from optimizing them. However the implementation details are quite different.
Benefit
Our current ISLE
simplify
term isn't usable for this because it can only rewrite an SSA value into other SSA values, so instructions without any value results have nothing to replace.The only rewrites which need to be in the egraph pass are those which either pattern-match on SSA values or replace SSA values. But instructions which don't produce value results can still be in both of those categories. For example, if we pattern match the condition operand to
brif
and find that it's a constant, we can rewrite it to an unconditionaljump
. Removing the unreachable edge from the CFG can then mean that some block parameter is always equivalent to another SSA value, enabling more simplification rules.The biggest benefits only come in combination with other optimizations that we haven't written yet. Notably, we need to integrate the "remove constant phis" pass into the egraph pass before branch-folding will change any later optimization results.
Implementation
Which no-result instructions can benefit from rewriting? I claim it's the ones which have no fixed value results, but have at least one fixed value operand. They could have variable operands, variable results, or block-call parameters, but we won't touch any of those.
A rewrite pattern is always based on knowing something about the meaning of a value operand. For variable operands like the function parameters in a
call
, the return values in areturn
, or the block parameters in ajump
, there isn't any generic pattern we can apply to them because we don't know what they mean. Put another way, we don't need any opcode-specific handling for instructions where all we can do is update their value operands to reflect other rewrites.Among those instructions, we also need to exclude store instructions, because we need alias analysis to rewrite those correctly.
The current instructions which meet these criteria are:
- Brif
- BrTable
- Trapz
- Trapnz
- ResumableTrapnz
- CallIndirect
- ReturnCallIndirect
- SetPinnedReg
Conditional branches, either as block terminators or conditional traps, are definitely interesting to rewrite.
Brif
can turn intoJump
;BrTable
can turn into eitherBrif
orJump
;- and
Trapz
/Trapnz
can turn into eitherTrap
orNop
.
Rewriting traps resembles #6080 but in this case we're already replacing a no-result instruction so the mechanism looks pretty different.Indirect calls are interesting to rewrite if we can prove that the call target is a constant.
I don't think SetPinnedReg has any useful rewrites based on the value operand, because it doesn't have any defined meaning in CLIF. It's just however the frontend wants to use it.
Alternatives
There are some implementation choices we haven't decided on.
I'd like to do this in ISLE rules because we may want to pattern-match arbitrary subtrees of the data-flow graph. However @cfallin has argued for doing this in pure Rust until we have good reason to implement rewrite rules that are complicated enough to justify hooking up ISLE.
We might find we have several alternative instructions to choose from, like we currently do for the
simplify
term. We could:
- Save a list of alternative instructions during the equality saturation phase, but that makes it difficult to do further optimizations based on simplifying the CFG.
- Defer rewriting these instructions until elaboration when we have the cost of every operand, but then it's not just difficult but impossible to make use of CFG simplifications during equality saturation.
- Compute instruction costs incrementally during equality saturation so we can pick the best option immediately and optimize the CFG based on the result. @cfallin says that's possible but had reasons for not doing it.
- Require these rewrite rules to pick a single alternative based on a static cost model. This is certainly easiest and may be good enough.
jameysharp commented on issue #6106:
Follow-on work for this would ideally update the CFG after each control-flow rewrite, including updating the post-order and dom-tree. This isn't necessary initially though.
I believe the post-order can be updated incrementally as long as edges are only removed from the CFG, never added. I think the post-order remains valid without any changes unless blocks become unreachable. If they do, removing all the unreachable blocks from the post-order while preserving the order of the remaining blocks should produce a new valid post-order.
I think that when CFG edges are only removed, a block can only move down the dom-tree. I think only the blocks which were the target of a removed edge can have their immediate dominators change. And I have a gut feeling that as long as the post-order is accurate, recomputing the immediate dominator of each block is a local operation that doesn't require re-doing the fixpoint.
I'm not especially confident about any of the above though.
Meanwhile, here are some examples of valid rewrites which are more complicated than just constant-propagation. I don't know if any of these specific patterns occur much or at all. It's just that since I can think of a bunch pretty quickly I'm inclined to guess that we will find patterns of this level of complexity in real code.
For
brif
,trapz
, ortrapnz
:
- when controlled by an
icmp
against zero, we can remove theicmp
in many cases- when controlled by a
bor
orumax
against a non-zero value, orsmax
against a positive value, orsmin
against a negative value, we can treat the condition as always non-zeroFor
br_table
:
- when controlled by a
select
where both the true and false cases are constants, we can replace withbrif
controlled by the same input as the originalselect
- when controlled by a
umin
with a constant, we can make its jump-table length equal the constant, and remove theumin
- when controlled by an
icmp
, we can replace withbrif
since the result oficmp
is either 0 or 1- when controlled by a
urem
with a constant (or an equivalentband
), we can truncate its jump-table- when controlled by an
isub
with a small constant N, we can insert N copies of the default target at the beginning of the jump-table and remove theisub
- when controlled by an
ishl
with a constant shift N, we can remove jump-table targets which aren't a multiple of 2^N and replace theishl
with aband
- when controlled by a
umul
with a constant and we can prove the multiply doesn't overflow, we can remove jump-table targets which aren't a multiple of the constant and remove theumul
In many of these cases we can remove pure instructions, but only because of the context in which they're used, not because they're inherently equivalent to anything else. So purely value-based equality saturation can't implement these rules.
Many of these
br_table
rules can produce new instructions which can match other rules. Some, like theisub
rule, may not always be improvements but may expose other opportunities which are. In short, these look a lot like the sorts of cases we want equality saturation for in the first place.Some of these examples could be subsumed by either value range analysis or known-bits analysis, but not all.
jameysharp labeled issue #6106:
Feature
We should allow the egraph pass to rewrite instructions which have no result values.
This resembles #5908, in that no-result instructions have side effects and that currently prevents us from optimizing them. However the implementation details are quite different.
Benefit
Our current ISLE
simplify
term isn't usable for this because it can only rewrite an SSA value into other SSA values, so instructions without any value results have nothing to replace.The only rewrites which need to be in the egraph pass are those which either pattern-match on SSA values or replace SSA values. But instructions which don't produce value results can still be in both of those categories. For example, if we pattern match the condition operand to
brif
and find that it's a constant, we can rewrite it to an unconditionaljump
. Removing the unreachable edge from the CFG can then mean that some block parameter is always equivalent to another SSA value, enabling more simplification rules.The biggest benefits only come in combination with other optimizations that we haven't written yet. Notably, we need to integrate the "remove constant phis" pass into the egraph pass before branch-folding will change any later optimization results.
Implementation
Which no-result instructions can benefit from rewriting? I claim it's the ones which have no fixed value results, but have at least one fixed value operand. They could have variable operands, variable results, or block-call parameters, but we won't touch any of those.
A rewrite pattern is always based on knowing something about the meaning of a value operand. For variable operands like the function parameters in a
call
, the return values in areturn
, or the block parameters in ajump
, there isn't any generic pattern we can apply to them because we don't know what they mean. Put another way, we don't need any opcode-specific handling for instructions where all we can do is update their value operands to reflect other rewrites.Among those instructions, we also need to exclude store instructions, because we need alias analysis to rewrite those correctly.
The current instructions which meet these criteria are:
- Brif
- BrTable
- Trapz
- Trapnz
- ResumableTrapnz
- CallIndirect
- ReturnCallIndirect
- SetPinnedReg
Conditional branches, either as block terminators or conditional traps, are definitely interesting to rewrite.
Brif
can turn intoJump
;BrTable
can turn into eitherBrif
orJump
;- and
Trapz
/Trapnz
can turn into eitherTrap
orNop
.
Rewriting traps resembles #6080 but in this case we're already replacing a no-result instruction so the mechanism looks pretty different.Indirect calls are interesting to rewrite if we can prove that the call target is a constant.
I don't think SetPinnedReg has any useful rewrites based on the value operand, because it doesn't have any defined meaning in CLIF. It's just however the frontend wants to use it.
Alternatives
There are some implementation choices we haven't decided on.
I'd like to do this in ISLE rules because we may want to pattern-match arbitrary subtrees of the data-flow graph. However @cfallin has argued for doing this in pure Rust until we have good reason to implement rewrite rules that are complicated enough to justify hooking up ISLE.
We might find we have several alternative instructions to choose from, like we currently do for the
simplify
term. We could:
- Save a list of alternative instructions during the equality saturation phase, but that makes it difficult to do further optimizations based on simplifying the CFG.
- Defer rewriting these instructions until elaboration when we have the cost of every operand, but then it's not just difficult but impossible to make use of CFG simplifications during equality saturation.
- Compute instruction costs incrementally during equality saturation so we can pick the best option immediately and optimize the CFG based on the result. @cfallin says that's possible but had reasons for not doing it.
- Require these rewrite rules to pick a single alternative based on a static cost model. This is certainly easiest and may be good enough.
jameysharp labeled issue #6106:
Feature
We should allow the egraph pass to rewrite instructions which have no result values.
This resembles #5908, in that no-result instructions have side effects and that currently prevents us from optimizing them. However the implementation details are quite different.
Benefit
Our current ISLE
simplify
term isn't usable for this because it can only rewrite an SSA value into other SSA values, so instructions without any value results have nothing to replace.The only rewrites which need to be in the egraph pass are those which either pattern-match on SSA values or replace SSA values. But instructions which don't produce value results can still be in both of those categories. For example, if we pattern match the condition operand to
brif
and find that it's a constant, we can rewrite it to an unconditionaljump
. Removing the unreachable edge from the CFG can then mean that some block parameter is always equivalent to another SSA value, enabling more simplification rules.The biggest benefits only come in combination with other optimizations that we haven't written yet. Notably, we need to integrate the "remove constant phis" pass into the egraph pass before branch-folding will change any later optimization results.
Implementation
Which no-result instructions can benefit from rewriting? I claim it's the ones which have no fixed value results, but have at least one fixed value operand. They could have variable operands, variable results, or block-call parameters, but we won't touch any of those.
A rewrite pattern is always based on knowing something about the meaning of a value operand. For variable operands like the function parameters in a
call
, the return values in areturn
, or the block parameters in ajump
, there isn't any generic pattern we can apply to them because we don't know what they mean. Put another way, we don't need any opcode-specific handling for instructions where all we can do is update their value operands to reflect other rewrites.Among those instructions, we also need to exclude store instructions, because we need alias analysis to rewrite those correctly.
The current instructions which meet these criteria are:
- Brif
- BrTable
- Trapz
- Trapnz
- ResumableTrapnz
- CallIndirect
- ReturnCallIndirect
- SetPinnedReg
Conditional branches, either as block terminators or conditional traps, are definitely interesting to rewrite.
Brif
can turn intoJump
;BrTable
can turn into eitherBrif
orJump
;- and
Trapz
/Trapnz
can turn into eitherTrap
orNop
.
Rewriting traps resembles #6080 but in this case we're already replacing a no-result instruction so the mechanism looks pretty different.Indirect calls are interesting to rewrite if we can prove that the call target is a constant.
I don't think SetPinnedReg has any useful rewrites based on the value operand, because it doesn't have any defined meaning in CLIF. It's just however the frontend wants to use it.
Alternatives
There are some implementation choices we haven't decided on.
I'd like to do this in ISLE rules because we may want to pattern-match arbitrary subtrees of the data-flow graph. However @cfallin has argued for doing this in pure Rust until we have good reason to implement rewrite rules that are complicated enough to justify hooking up ISLE.
We might find we have several alternative instructions to choose from, like we currently do for the
simplify
term. We could:
- Save a list of alternative instructions during the equality saturation phase, but that makes it difficult to do further optimizations based on simplifying the CFG.
- Defer rewriting these instructions until elaboration when we have the cost of every operand, but then it's not just difficult but impossible to make use of CFG simplifications during equality saturation.
- Compute instruction costs incrementally during equality saturation so we can pick the best option immediately and optimize the CFG based on the result. @cfallin says that's possible but had reasons for not doing it.
- Require these rewrite rules to pick a single alternative based on a static cost model. This is certainly easiest and may be good enough.
jameysharp labeled issue #6106:
Feature
We should allow the egraph pass to rewrite instructions which have no result values.
This resembles #5908, in that no-result instructions have side effects and that currently prevents us from optimizing them. However the implementation details are quite different.
Benefit
Our current ISLE
simplify
term isn't usable for this because it can only rewrite an SSA value into other SSA values, so instructions without any value results have nothing to replace.The only rewrites which need to be in the egraph pass are those which either pattern-match on SSA values or replace SSA values. But instructions which don't produce value results can still be in both of those categories. For example, if we pattern match the condition operand to
brif
and find that it's a constant, we can rewrite it to an unconditionaljump
. Removing the unreachable edge from the CFG can then mean that some block parameter is always equivalent to another SSA value, enabling more simplification rules.The biggest benefits only come in combination with other optimizations that we haven't written yet. Notably, we need to integrate the "remove constant phis" pass into the egraph pass before branch-folding will change any later optimization results.
Implementation
Which no-result instructions can benefit from rewriting? I claim it's the ones which have no fixed value results, but have at least one fixed value operand. They could have variable operands, variable results, or block-call parameters, but we won't touch any of those.
A rewrite pattern is always based on knowing something about the meaning of a value operand. For variable operands like the function parameters in a
call
, the return values in areturn
, or the block parameters in ajump
, there isn't any generic pattern we can apply to them because we don't know what they mean. Put another way, we don't need any opcode-specific handling for instructions where all we can do is update their value operands to reflect other rewrites.Among those instructions, we also need to exclude store instructions, because we need alias analysis to rewrite those correctly.
The current instructions which meet these criteria are:
- Brif
- BrTable
- Trapz
- Trapnz
- ResumableTrapnz
- CallIndirect
- ReturnCallIndirect
- SetPinnedReg
Conditional branches, either as block terminators or conditional traps, are definitely interesting to rewrite.
Brif
can turn intoJump
;BrTable
can turn into eitherBrif
orJump
;- and
Trapz
/Trapnz
can turn into eitherTrap
orNop
.
Rewriting traps resembles #6080 but in this case we're already replacing a no-result instruction so the mechanism looks pretty different.Indirect calls are interesting to rewrite if we can prove that the call target is a constant.
I don't think SetPinnedReg has any useful rewrites based on the value operand, because it doesn't have any defined meaning in CLIF. It's just however the frontend wants to use it.
Alternatives
There are some implementation choices we haven't decided on.
I'd like to do this in ISLE rules because we may want to pattern-match arbitrary subtrees of the data-flow graph. However @cfallin has argued for doing this in pure Rust until we have good reason to implement rewrite rules that are complicated enough to justify hooking up ISLE.
We might find we have several alternative instructions to choose from, like we currently do for the
simplify
term. We could:
- Save a list of alternative instructions during the equality saturation phase, but that makes it difficult to do further optimizations based on simplifying the CFG.
- Defer rewriting these instructions until elaboration when we have the cost of every operand, but then it's not just difficult but impossible to make use of CFG simplifications during equality saturation.
- Compute instruction costs incrementally during equality saturation so we can pick the best option immediately and optimize the CFG based on the result. @cfallin says that's possible but had reasons for not doing it.
- Require these rewrite rules to pick a single alternative based on a static cost model. This is certainly easiest and may be good enough.
Last updated: Dec 23 2024 at 12:05 UTC