Stream: git-wasmtime

Topic: wasmtime / issue #6109 cranelift: Integrate "remove const...


view this post on Zulip Wasmtime GitHub notifications bot (Mar 27 2023 at 23:15):

jameysharp opened issue #6109:

Feature

Integrate cranelift/codegen/src/remove_constant_phis.rs into the egraph pass instead of running it as a separate pass beforehand.

Benefit

Equality saturation may reveal that multiple values are equivalent even though they were defined by different instructions, so it's useful to defer checking whether a block param is defined by equivalent values at every block-call until then. But aliasing block parameters to a common definition can expose more patterns for equality saturation. Therefore this optimization is most effective if it's interleaved with equality saturation.

Integrating this makes #6106 and #5908 more useful: both of those optimizations can prune control-flow edges, making it more likely that a block parameter has only one definition.

Also, during the egraph pass we're iterating over all blocks and updating all instructions anyway, so folding another pass in avoids extra traversals.

Challenges

The "remove constant phis" pass is based on a forward dataflow analysis, which computes a fixpoint over the whole function. The egraph pass visits each basic block only once without reaching a fixpoint. Without revisiting loops, this pass can only be an approximation. However, that's probably good enough, at least for now. Someday we may offer a mode where the egraph does run multiple iterations, when more thorough optimization is desired.

Even in the absence of loops, the egraph pass doesn't currently guarantee that all predecessors of a block have been visited before the block itself, only that a block is visited before any block it dominates. Without that guarantee, we'll often miss optimization opportunities. I think we could establish this guarantee but it's currently tricky. Fixing #5908 may make this more feasible because I think then we'll have more freedom in block visit order; #5800 limited the traversal order to dom-tree pre-order.

Implementation

On starting equality saturation for a block, loop over the block parameters. For each one, check the corresponding argument in all block-calls which reference this block. If they're all the same value, replace the block parameter with an alias to that value, removing that position from the block header and from all the block-calls.

One improvement is that if some predecessor passes the block parameter itself as the argument, then ignore that predecessor for this purpose. In other words, assume that the block parameter always comes from the same definition, then try to disprove that assumption. (It isn't immediately clear to me whether the existing "remove constant phis" pass implements this improvement; I think it does, but if so it's expressed in a way I don't entirely understand.)

A further improvement is to identify groups of block parameters that can be merged into one. (Again, I _think_ the current implementation does this.) For example, in the following program, v1 and v2 can both be replaced with v0:

block1(v0: i32):
    jump block2(v0, v0)
block2(v1: i32, v2: i32):
    ; ...
    jump block2(v2, v1)

Alternatives

We can leave this pass separate, or even run it both before and after the egraph pass. We could even keep the standalone pass while also adding similar functionality to the egraph pass; the standalone pass can resolve identities which require a full fixpoint, and the egraph pass can resolve loop-free cases using the additional information from equality saturation.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 27 2023 at 23:15):

jameysharp labeled issue #6109:

Feature

Integrate cranelift/codegen/src/remove_constant_phis.rs into the egraph pass instead of running it as a separate pass beforehand.

Benefit

Equality saturation may reveal that multiple values are equivalent even though they were defined by different instructions, so it's useful to defer checking whether a block param is defined by equivalent values at every block-call until then. But aliasing block parameters to a common definition can expose more patterns for equality saturation. Therefore this optimization is most effective if it's interleaved with equality saturation.

Integrating this makes #6106 and #5908 more useful: both of those optimizations can prune control-flow edges, making it more likely that a block parameter has only one definition.

Also, during the egraph pass we're iterating over all blocks and updating all instructions anyway, so folding another pass in avoids extra traversals.

Challenges

The "remove constant phis" pass is based on a forward dataflow analysis, which computes a fixpoint over the whole function. The egraph pass visits each basic block only once without reaching a fixpoint. Without revisiting loops, this pass can only be an approximation. However, that's probably good enough, at least for now. Someday we may offer a mode where the egraph does run multiple iterations, when more thorough optimization is desired.

Even in the absence of loops, the egraph pass doesn't currently guarantee that all predecessors of a block have been visited before the block itself, only that a block is visited before any block it dominates. Without that guarantee, we'll often miss optimization opportunities. I think we could establish this guarantee but it's currently tricky. Fixing #5908 may make this more feasible because I think then we'll have more freedom in block visit order; #5800 limited the traversal order to dom-tree pre-order.

Implementation

On starting equality saturation for a block, loop over the block parameters. For each one, check the corresponding argument in all block-calls which reference this block. If they're all the same value, replace the block parameter with an alias to that value, removing that position from the block header and from all the block-calls.

One improvement is that if some predecessor passes the block parameter itself as the argument, then ignore that predecessor for this purpose. In other words, assume that the block parameter always comes from the same definition, then try to disprove that assumption. (It isn't immediately clear to me whether the existing "remove constant phis" pass implements this improvement; I think it does, but if so it's expressed in a way I don't entirely understand.)

A further improvement is to identify groups of block parameters that can be merged into one. (Again, I _think_ the current implementation does this.) For example, in the following program, v1 and v2 can both be replaced with v0:

block1(v0: i32):
    jump block2(v0, v0)
block2(v1: i32, v2: i32):
    ; ...
    jump block2(v2, v1)

Alternatives

We can leave this pass separate, or even run it both before and after the egraph pass. We could even keep the standalone pass while also adding similar functionality to the egraph pass; the standalone pass can resolve identities which require a full fixpoint, and the egraph pass can resolve loop-free cases using the additional information from equality saturation.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 27 2023 at 23:15):

jameysharp labeled issue #6109:

Feature

Integrate cranelift/codegen/src/remove_constant_phis.rs into the egraph pass instead of running it as a separate pass beforehand.

Benefit

Equality saturation may reveal that multiple values are equivalent even though they were defined by different instructions, so it's useful to defer checking whether a block param is defined by equivalent values at every block-call until then. But aliasing block parameters to a common definition can expose more patterns for equality saturation. Therefore this optimization is most effective if it's interleaved with equality saturation.

Integrating this makes #6106 and #5908 more useful: both of those optimizations can prune control-flow edges, making it more likely that a block parameter has only one definition.

Also, during the egraph pass we're iterating over all blocks and updating all instructions anyway, so folding another pass in avoids extra traversals.

Challenges

The "remove constant phis" pass is based on a forward dataflow analysis, which computes a fixpoint over the whole function. The egraph pass visits each basic block only once without reaching a fixpoint. Without revisiting loops, this pass can only be an approximation. However, that's probably good enough, at least for now. Someday we may offer a mode where the egraph does run multiple iterations, when more thorough optimization is desired.

Even in the absence of loops, the egraph pass doesn't currently guarantee that all predecessors of a block have been visited before the block itself, only that a block is visited before any block it dominates. Without that guarantee, we'll often miss optimization opportunities. I think we could establish this guarantee but it's currently tricky. Fixing #5908 may make this more feasible because I think then we'll have more freedom in block visit order; #5800 limited the traversal order to dom-tree pre-order.

Implementation

On starting equality saturation for a block, loop over the block parameters. For each one, check the corresponding argument in all block-calls which reference this block. If they're all the same value, replace the block parameter with an alias to that value, removing that position from the block header and from all the block-calls.

One improvement is that if some predecessor passes the block parameter itself as the argument, then ignore that predecessor for this purpose. In other words, assume that the block parameter always comes from the same definition, then try to disprove that assumption. (It isn't immediately clear to me whether the existing "remove constant phis" pass implements this improvement; I think it does, but if so it's expressed in a way I don't entirely understand.)

A further improvement is to identify groups of block parameters that can be merged into one. (Again, I _think_ the current implementation does this.) For example, in the following program, v1 and v2 can both be replaced with v0:

block1(v0: i32):
    jump block2(v0, v0)
block2(v1: i32, v2: i32):
    ; ...
    jump block2(v2, v1)

Alternatives

We can leave this pass separate, or even run it both before and after the egraph pass. We could even keep the standalone pass while also adding similar functionality to the egraph pass; the standalone pass can resolve identities which require a full fixpoint, and the egraph pass can resolve loop-free cases using the additional information from equality saturation.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 27 2023 at 23:15):

jameysharp labeled issue #6109:

Feature

Integrate cranelift/codegen/src/remove_constant_phis.rs into the egraph pass instead of running it as a separate pass beforehand.

Benefit

Equality saturation may reveal that multiple values are equivalent even though they were defined by different instructions, so it's useful to defer checking whether a block param is defined by equivalent values at every block-call until then. But aliasing block parameters to a common definition can expose more patterns for equality saturation. Therefore this optimization is most effective if it's interleaved with equality saturation.

Integrating this makes #6106 and #5908 more useful: both of those optimizations can prune control-flow edges, making it more likely that a block parameter has only one definition.

Also, during the egraph pass we're iterating over all blocks and updating all instructions anyway, so folding another pass in avoids extra traversals.

Challenges

The "remove constant phis" pass is based on a forward dataflow analysis, which computes a fixpoint over the whole function. The egraph pass visits each basic block only once without reaching a fixpoint. Without revisiting loops, this pass can only be an approximation. However, that's probably good enough, at least for now. Someday we may offer a mode where the egraph does run multiple iterations, when more thorough optimization is desired.

Even in the absence of loops, the egraph pass doesn't currently guarantee that all predecessors of a block have been visited before the block itself, only that a block is visited before any block it dominates. Without that guarantee, we'll often miss optimization opportunities. I think we could establish this guarantee but it's currently tricky. Fixing #5908 may make this more feasible because I think then we'll have more freedom in block visit order; #5800 limited the traversal order to dom-tree pre-order.

Implementation

On starting equality saturation for a block, loop over the block parameters. For each one, check the corresponding argument in all block-calls which reference this block. If they're all the same value, replace the block parameter with an alias to that value, removing that position from the block header and from all the block-calls.

One improvement is that if some predecessor passes the block parameter itself as the argument, then ignore that predecessor for this purpose. In other words, assume that the block parameter always comes from the same definition, then try to disprove that assumption. (It isn't immediately clear to me whether the existing "remove constant phis" pass implements this improvement; I think it does, but if so it's expressed in a way I don't entirely understand.)

A further improvement is to identify groups of block parameters that can be merged into one. (Again, I _think_ the current implementation does this.) For example, in the following program, v1 and v2 can both be replaced with v0:

block1(v0: i32):
    jump block2(v0, v0)
block2(v1: i32, v2: i32):
    ; ...
    jump block2(v2, v1)

Alternatives

We can leave this pass separate, or even run it both before and after the egraph pass. We could even keep the standalone pass while also adding similar functionality to the egraph pass; the standalone pass can resolve identities which require a full fixpoint, and the egraph pass can resolve loop-free cases using the additional information from equality saturation.


Last updated: Oct 23 2024 at 20:03 UTC