wasmtime / issue #6080 cranelift/egraphs: Allow rewriting... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #6080 cranelift/egraphs: Allow rewriting...

Wasmtime GitHub notifications bot (Mar 21 2023 at 18:05):

jameysharp opened issue #6080:

Feature

When a mid-end optimization rule in ISLE matches an instruction which has a result value, it should be possible to replace that instruction with an unconditional trap. For example, (udiv _ _ (iconst _ 0)) should rewrite to a trap with code int_divz.

This is only possible once we resolve #5908; until then, ISLE rules never fire for instructions which could trap.

Benefit

This particular rewrite doesn't fit in our current framework, which only supports replacing a value with another value.

Traps, specifically, are special. They make the rest of the current block unreachable, as well as any block dominated by the current block. It's important to drop all dominated blocks because those are exactly the blocks which may have used the result of the original instruction. It's also useful to drop all dominated instructions because then we can avoid running all the egraph machinery on any of them.

Like the branch optimizations that we aren't doing yet, discarding dominated branches may move other blocks down the dominator tree and make some block parameters known. We don't have to update the dominator tree when that happens, but it's useful to do because it makes more information available to the affected blocks, which can lead to better optimizations.

Implementation

The ISLE simplify term can only produce instructions which have a result Value, so it can't directly produce a trap instruction. I think we should change its return type to a new ValueOrTrap enum, and define an implicit conversion from Value to ValueOrTrap. When a rule returns the Trap variant with a trap code, the caller needs to remove the remaining instructions in the current block and insert the appropriate trap instruction into the data-flow graph.

If _any_ rewrite rule says that an instruction is equivalent to a trap, then we can ignore all the other rewrites and just take the trap. This is sort of like how subsume works.

Things would get a little weird if a trap were generated somewhere other than the top-level right-hand side of a rule. All instructions which use the result would be unreachable, and those instructions' other operands would be unused, so we'd end up deleting all the instructions created by the rule except for the trap. Making the return type of simplify be the only place where a trap can appear means we can statically prevent writing rules which do this extra work.

It's also weird if this happens while we're doing recursive simplification on instructions which were newly-created by other rewrites. However, I think that case isn't actually useful and we should just prohibit it, since that means a rule created an instruction which would trap, without being equivalent to any possibly-trapping instruction in the original program. We could panic if we see a rewrite to a trap during recursive simplification.

Alternatives

I haven't thought of any besides keeping the _status quo_, but I think we should do something like this.

Wasmtime GitHub notifications bot (Mar 21 2023 at 18:55):

jameysharp commented on issue #6080:

Also, maybe the block where the trap occurs should be automatically marked cold. It obviously is not going to execute more than once per invocation of the program; blocks don't get much more cold than that.

Wasmtime GitHub notifications bot (Mar 21 2023 at 18:56):

jameysharp labeled issue #6080:

Feature

When a mid-end optimization rule in ISLE matches an instruction which has a result value, it should be possible to replace that instruction with an unconditional trap. For example, (udiv _ _ (iconst _ 0)) should rewrite to a trap with code int_divz.

This is only possible once we resolve #5908; until then, ISLE rules never fire for instructions which could trap.

Benefit

This particular rewrite doesn't fit in our current framework, which only supports replacing a value with another value.

Traps, specifically, are special. They make the rest of the current block unreachable, as well as any block dominated by the current block. It's important to drop all dominated blocks because those are exactly the blocks which may have used the result of the original instruction. It's also useful to drop all dominated instructions because then we can avoid running all the egraph machinery on any of them.

Like the branch optimizations that we aren't doing yet, discarding dominated branches may move other blocks down the dominator tree and make some block parameters known. We don't have to update the dominator tree when that happens, but it's useful to do because it makes more information available to the affected blocks, which can lead to better optimizations.

Implementation

The ISLE simplify term can only produce instructions which have a result Value, so it can't directly produce a trap instruction. I think we should change its return type to a new ValueOrTrap enum, and define an implicit conversion from Value to ValueOrTrap. When a rule returns the Trap variant with a trap code, the caller needs to remove the remaining instructions in the current block and insert the appropriate trap instruction into the data-flow graph.

If _any_ rewrite rule says that an instruction is equivalent to a trap, then we can ignore all the other rewrites and just take the trap. This is sort of like how subsume works.

Things would get a little weird if a trap were generated somewhere other than the top-level right-hand side of a rule. All instructions which use the result would be unreachable, and those instructions' other operands would be unused, so we'd end up deleting all the instructions created by the rule except for the trap. Making the return type of simplify be the only place where a trap can appear means we can statically prevent writing rules which do this extra work.

It's also weird if this happens while we're doing recursive simplification on instructions which were newly-created by other rewrites. However, I think that case isn't actually useful and we should just prohibit it, since that means a rule created an instruction which would trap, without being equivalent to any possibly-trapping instruction in the original program. We could panic if we see a rewrite to a trap during recursive simplification.

Alternatives

I haven't thought of any besides keeping the _status quo_, but I think we should do something like this.

Wasmtime GitHub notifications bot (Mar 21 2023 at 18:56):

jameysharp labeled issue #6080:

Feature

When a mid-end optimization rule in ISLE matches an instruction which has a result value, it should be possible to replace that instruction with an unconditional trap. For example, (udiv _ _ (iconst _ 0)) should rewrite to a trap with code int_divz.

This is only possible once we resolve #5908; until then, ISLE rules never fire for instructions which could trap.

Benefit

This particular rewrite doesn't fit in our current framework, which only supports replacing a value with another value.

Traps, specifically, are special. They make the rest of the current block unreachable, as well as any block dominated by the current block. It's important to drop all dominated blocks because those are exactly the blocks which may have used the result of the original instruction. It's also useful to drop all dominated instructions because then we can avoid running all the egraph machinery on any of them.

Like the branch optimizations that we aren't doing yet, discarding dominated branches may move other blocks down the dominator tree and make some block parameters known. We don't have to update the dominator tree when that happens, but it's useful to do because it makes more information available to the affected blocks, which can lead to better optimizations.

Implementation

The ISLE simplify term can only produce instructions which have a result Value, so it can't directly produce a trap instruction. I think we should change its return type to a new ValueOrTrap enum, and define an implicit conversion from Value to ValueOrTrap. When a rule returns the Trap variant with a trap code, the caller needs to remove the remaining instructions in the current block and insert the appropriate trap instruction into the data-flow graph.

If _any_ rewrite rule says that an instruction is equivalent to a trap, then we can ignore all the other rewrites and just take the trap. This is sort of like how subsume works.

Things would get a little weird if a trap were generated somewhere other than the top-level right-hand side of a rule. All instructions which use the result would be unreachable, and those instructions' other operands would be unused, so we'd end up deleting all the instructions created by the rule except for the trap. Making the return type of simplify be the only place where a trap can appear means we can statically prevent writing rules which do this extra work.

It's also weird if this happens while we're doing recursive simplification on instructions which were newly-created by other rewrites. However, I think that case isn't actually useful and we should just prohibit it, since that means a rule created an instruction which would trap, without being equivalent to any possibly-trapping instruction in the original program. We could panic if we see a rewrite to a trap during recursive simplification.

Alternatives

I haven't thought of any besides keeping the _status quo_, but I think we should do something like this.

Wasmtime GitHub notifications bot (Mar 21 2023 at 18:56):

jameysharp labeled issue #6080:

Feature

When a mid-end optimization rule in ISLE matches an instruction which has a result value, it should be possible to replace that instruction with an unconditional trap. For example, (udiv _ _ (iconst _ 0)) should rewrite to a trap with code int_divz.

This is only possible once we resolve #5908; until then, ISLE rules never fire for instructions which could trap.

Benefit

This particular rewrite doesn't fit in our current framework, which only supports replacing a value with another value.

Traps, specifically, are special. They make the rest of the current block unreachable, as well as any block dominated by the current block. It's important to drop all dominated blocks because those are exactly the blocks which may have used the result of the original instruction. It's also useful to drop all dominated instructions because then we can avoid running all the egraph machinery on any of them.

Like the branch optimizations that we aren't doing yet, discarding dominated branches may move other blocks down the dominator tree and make some block parameters known. We don't have to update the dominator tree when that happens, but it's useful to do because it makes more information available to the affected blocks, which can lead to better optimizations.

Implementation

The ISLE simplify term can only produce instructions which have a result Value, so it can't directly produce a trap instruction. I think we should change its return type to a new ValueOrTrap enum, and define an implicit conversion from Value to ValueOrTrap. When a rule returns the Trap variant with a trap code, the caller needs to remove the remaining instructions in the current block and insert the appropriate trap instruction into the data-flow graph.

If _any_ rewrite rule says that an instruction is equivalent to a trap, then we can ignore all the other rewrites and just take the trap. This is sort of like how subsume works.

Things would get a little weird if a trap were generated somewhere other than the top-level right-hand side of a rule. All instructions which use the result would be unreachable, and those instructions' other operands would be unused, so we'd end up deleting all the instructions created by the rule except for the trap. Making the return type of simplify be the only place where a trap can appear means we can statically prevent writing rules which do this extra work.

It's also weird if this happens while we're doing recursive simplification on instructions which were newly-created by other rewrites. However, I think that case isn't actually useful and we should just prohibit it, since that means a rule created an instruction which would trap, without being equivalent to any possibly-trapping instruction in the original program. We could panic if we see a rewrite to a trap during recursive simplification.

Alternatives

I haven't thought of any besides keeping the _status quo_, but I think we should do something like this.

Wasmtime GitHub notifications bot (Mar 21 2023 at 18:56):

jameysharp labeled issue #6080:

Feature

When a mid-end optimization rule in ISLE matches an instruction which has a result value, it should be possible to replace that instruction with an unconditional trap. For example, (udiv _ _ (iconst _ 0)) should rewrite to a trap with code int_divz.

This is only possible once we resolve #5908; until then, ISLE rules never fire for instructions which could trap.

Benefit

This particular rewrite doesn't fit in our current framework, which only supports replacing a value with another value.

Traps, specifically, are special. They make the rest of the current block unreachable, as well as any block dominated by the current block. It's important to drop all dominated blocks because those are exactly the blocks which may have used the result of the original instruction. It's also useful to drop all dominated instructions because then we can avoid running all the egraph machinery on any of them.

Like the branch optimizations that we aren't doing yet, discarding dominated branches may move other blocks down the dominator tree and make some block parameters known. We don't have to update the dominator tree when that happens, but it's useful to do because it makes more information available to the affected blocks, which can lead to better optimizations.

Implementation

The ISLE simplify term can only produce instructions which have a result Value, so it can't directly produce a trap instruction. I think we should change its return type to a new ValueOrTrap enum, and define an implicit conversion from Value to ValueOrTrap. When a rule returns the Trap variant with a trap code, the caller needs to remove the remaining instructions in the current block and insert the appropriate trap instruction into the data-flow graph.

If _any_ rewrite rule says that an instruction is equivalent to a trap, then we can ignore all the other rewrites and just take the trap. This is sort of like how subsume works.

Things would get a little weird if a trap were generated somewhere other than the top-level right-hand side of a rule. All instructions which use the result would be unreachable, and those instructions' other operands would be unused, so we'd end up deleting all the instructions created by the rule except for the trap. Making the return type of simplify be the only place where a trap can appear means we can statically prevent writing rules which do this extra work.

It's also weird if this happens while we're doing recursive simplification on instructions which were newly-created by other rewrites. However, I think that case isn't actually useful and we should just prohibit it, since that means a rule created an instruction which would trap, without being equivalent to any possibly-trapping instruction in the original program. We could panic if we see a rewrite to a trap during recursive simplification.

Alternatives

I haven't thought of any besides keeping the _status quo_, but I think we should do something like this.

Wasmtime GitHub notifications bot (Apr 08 2025 at 18:51):

fitzgen commented on issue #6080:

While we gained some initial support for optimizing side-effectful instructions in https://github.com/bytecodealliance/wasmtime/pull/10524, that PR specifically disallows anything that changes the CFG because of the complication mentioned in this OP:

Traps, specifically, are special. They make the rest of the current block unreachable, as well as any block dominated by the current block.

Note that brif(true, then, else) ==> jump(then) actually has these same complications, not just traps, due to potentially making the else block no longer dominated by the current block, despite potentially having uses of values defined earlier in the current block.

For posterity, here is the comment about this in the code right now:

https://github.com/bytecodealliance/wasmtime/blob/7bf317237ca2b3903ceebb4143c349b3b3ea0ecf/cranelift/codegen/src/egraph.rs#L515-L562

Chris and I spit-balled a couple of solutions:

As a potential initial milestone, we could make a non-terminator version of trap and this could also define dummy values for every value defined in the rest of the block. This would mean that we don't need to actually deal with code that becomes unreachable or do anything to fix up uses of values that are no longer defined because they are still defined (even tho the uses will never be reachable).

However, in an ideal world, we would eagerly terminate the block with trap and then rather than fixing up uses of values that are no longer defined, we would completely avoid traversing those uses' now-unreachable blocks in the rest of the egraph pass.

In the PR linked above, Chris said:

The branch-folding cases are a bit harder; ideally we do actually skip traversing the newly-unreachable blocks, perhaps by tombstoning them with a flag set by the branch simplification.

I think that we want the inverse of tombstones: reachable bits. Otherwise we cannot differentiate between a block with multiple predecessors, one of which got optimized away but others that did not, versus a block with a single predecessor that got optimized away and now it is unreachable. This should be possible by making the egraph pass use a computed-on-demand pre-order traversal, rather than a pre-computed dominator tree's pre-order traversal (which will not reflect any changes to the CFG that optimizations create).

Wasmtime GitHub notifications bot (Apr 08 2025 at 20:26):

jameysharp commented on issue #6080:

Traps, specifically, are special. They make the rest of the current block unreachable, as well as any block dominated by the current block.

Note that brif(true, then, else) ==> jump(then) actually has these same complications, not just traps, due to potentially making the else block no longer dominated by the current block, despite potentially having uses of values defined earlier in the current block.

I think this is only half right: If the else block uses values defined in the current block, then after removing the edge, either it still must be dominated by this block and also becomes dominated by the then block, or it is no longer reachable at all and should be deleted.

Put another way: If there were edges to it from blocks which are not dominated by this block, then it would already have not been dominated by this block, and would not have uses of values defined in this block.

By a similar argument, I think (but haven't fully thought this through) that while you're right about using reachable bits instead of tombstones, I don't think you need to change the traversal. I think it's sufficient to use the same traversal order computed up front, but when you get to a block you can skip it if by that time it has not been found to be reachable. I think the necessary condition for that to work is that CFG edges are only deleted, not added, which I think is largely a reasonable restriction. Although it might also be okay to move an edge to point to a different block as long as it was dominated by the original target of that edge, or something like that?

That said I'm confused about your mention of pre-order traversal because I don't remember now what the requirements on traversal order are and why we didn't use reverse post-order, so maybe that's where my reasoning falls down, I don't know.

Wasmtime GitHub notifications bot (Apr 08 2025 at 21:01):

cfallin commented on issue #6080:

I think we still want to use the domtree traversal for other reasons (or more precisely: my head hurts trying to think about how a computed-on-demand traversal order would interact with GVN, while the existing domtree traversal gives us exactly what we want for GVN to work with the scoped hashmap). I agree with Jamey that removing edges should be (I think?) possible to handle by skipping blocks during the traversal, and nothing else about dominance or use-def links needs to change, as long as we don't visit unreachable code. Put another way, removing edges can never cause an existing dominance relation to go from true to false for reachable code: if A dom B before removing an edge, that means the only paths from start to B were through A, and removing an edge cannot create a new path not through A, so A dom B still if there are any paths to B.

That said, good call @fitzgen on reachability vs not-reachability; I hadn't thought about the removing-only-one-in-edge case! So I think it could be something like: block_reachable_in_edges: SecondaryMap<Block, u32>, starts at zero, increments when we process a terminator and see targets; at each block visit during the first (egraph-building) traversal, skip a block if the count is zero.

Note also this might combine nicely with blockparam removal, per Jamey's and others' thoughts, but that's a whole other topic :-)

Wasmtime GitHub notifications bot (Apr 08 2025 at 21:10):

jameysharp commented on issue #6080:

A small modification: I don't think you need a count, just a boolean indicating whether the block is reachable or not; set it true every time you visit an out-edge, never clear it, and check if it's set before visiting a block. So an EntitySet of blocks should be plenty rather than a SecondaryMap.

Wasmtime GitHub notifications bot (Apr 08 2025 at 21:25):

jameysharp commented on issue #6080:

Also I'm not sure which idea you mean by "blockparam removal", Chris, but I was thinking again about something along those lines while I was working on https://github.com/jameysharp/skism, and the only place I think we've written the idea down is a parenthetical note at the end of https://github.com/bytecodealliance/wasmtime/issues/7639#issuecomment-1841813631 ("it might be useful to split critical edges before the egraphs pass, so computation which is only used on some branches isn't forced before the branch"), so it's probably worth writing it up in more detail sometime.

Wasmtime GitHub notifications bot (Apr 08 2025 at 22:20):

cfallin commented on issue #6080:

Ah, I had recalled you having thoughts about integrating constant-phis with the egraph pass, and how that might work with a fixpoint (or in the cheap version, no fixpoint, only simplify/remove when we've seen all predecessors already). I guess I was seeing a connection here in that removing predecessors is another way of meeting those criteria (seen all preds and all preds call the block with the same value) -- it seems there would be positive interaction in many cases. In any case, I agree, all of this should be written up in more detail!

Last updated: Apr 18 2025 at 01:31 UTC