cfallin opened issue #5796:
@jameysharp brought to my attention an interesting case:
test optimize set opt_level=speed target x86_64 function %f(i32, i32, i32) -> i32 { block0(v0: i32, v1: i32, v2: i32): brif v0, block1, block2 block1: v3 = udiv v1, v2 return v3 block2: v4 = udiv v1, v2 brif v1, block3, block4 block3: return v4 block4: return v4 }
When run through the egraph framework, the
udiv
is sunk fromblock2
to two copies inblock3
andblock4
. This is code-motion of a side-effect that is actually, strictly speaking, incorrect. It won't cause a program that doesn't trap to trap, or a program that traps to not trap, but it may affect which trap is seen first and/or which side-effects occur prior to the trap.The basic issue is that we sort of treat idempotent operators as both pure -- so GVN'able -- but also as having a location, so remaining in the side-effect skeleton. This combines with the use of an ordinary map, not a scoped hashmap, for GVN'ing. That choice was valid when we processed only pure operators, which "float" outside of control flow and so have no need to limit their visibility to dominated blocks. But in the above program, the GVN map gets an entry when we see
v3
inblock1
, then we visitblock2
, see the GVN-map entry, rewritev4
tov3
, and remove the secondudiv
. Elaboration will then place copies of the udiv back inblock3
andblock4
as the value is used, because it already exists in the layout elsewhere. So we never use a value outside of its defined range (i.e., defs still dominate uses), but we do lose information about where the def occurs.@jameysharp and I were at first thinking we could add a scoped hashmap to not dedup
v3
andv4
to the same value, but actually this is suboptimal: the whole point of interning nodes in the egraph is to share the work of optimizing them once. And an idempotent op is mostly like a fully pure op, so we should be able to mostly reason about it in this shared way, except for where its side effects must have been made visible.A key insight for a better solution is: one can see any idempotent op as an actual pure op, with two outputs, the original and a "has trapped" flag; combined with an actual side-effecting op on the "has trapped" flag. We don't want to actually rewrite in this form. But in principle, the side-effect is separable in that way, and that gives us the GVN semantics that we desire.
The proposal is this: let idempotent ops GVN as pure ops do. Unconditionally remove them from the side-effect skeleton (the
Layout
). But in their place in theLayout
, insert aforce
pseudo-inst that uses the GVN'd value. The only purpose of this pseudo-op is to ensure that the value is computed at the given location during elaboration.This would give the desired behavior for the program above: it would retain the
udiv
in bothblock1
andblock2
, and would neither hoist nor lower it, while still GVN'ing both to one value that could be reasoned about / rewritten once.
cfallin closed issue #5796:
@jameysharp brought to my attention an interesting case:
test optimize set opt_level=speed target x86_64 function %f(i32, i32, i32) -> i32 { block0(v0: i32, v1: i32, v2: i32): brif v0, block1, block2 block1: v3 = udiv v1, v2 return v3 block2: v4 = udiv v1, v2 brif v1, block3, block4 block3: return v4 block4: return v4 }
When run through the egraph framework, the
udiv
is sunk fromblock2
to two copies inblock3
andblock4
. This is code-motion of a side-effect that is actually, strictly speaking, incorrect. It won't cause a program that doesn't trap to trap, or a program that traps to not trap, but it may affect which trap is seen first and/or which side-effects occur prior to the trap.The basic issue is that we sort of treat idempotent operators as both pure -- so GVN'able -- but also as having a location, so remaining in the side-effect skeleton. This combines with the use of an ordinary map, not a scoped hashmap, for GVN'ing. That choice was valid when we processed only pure operators, which "float" outside of control flow and so have no need to limit their visibility to dominated blocks. But in the above program, the GVN map gets an entry when we see
v3
inblock1
, then we visitblock2
, see the GVN-map entry, rewritev4
tov3
, and remove the secondudiv
. Elaboration will then place copies of the udiv back inblock3
andblock4
as the value is used, because it already exists in the layout elsewhere. So we never use a value outside of its defined range (i.e., defs still dominate uses), but we do lose information about where the def occurs.@jameysharp and I were at first thinking we could add a scoped hashmap to not dedup
v3
andv4
to the same value, but actually this is suboptimal: the whole point of interning nodes in the egraph is to share the work of optimizing them once. And an idempotent op is mostly like a fully pure op, so we should be able to mostly reason about it in this shared way, except for where its side effects must have been made visible.A key insight for a better solution is: one can see any idempotent op as an actual pure op, with two outputs, the original and a "has trapped" flag; combined with an actual side-effecting op on the "has trapped" flag. We don't want to actually rewrite in this form. But in principle, the side-effect is separable in that way, and that gives us the GVN semantics that we desire.
The proposal is this: let idempotent ops GVN as pure ops do. Unconditionally remove them from the side-effect skeleton (the
Layout
). But in their place in theLayout
, insert aforce
pseudo-inst that uses the GVN'd value. The only purpose of this pseudo-op is to ensure that the value is computed at the given location during elaboration.This would give the desired behavior for the program above: it would retain the
udiv
in bothblock1
andblock2
, and would neither hoist nor lower it, while still GVN'ing both to one value that could be reasoned about / rewritten once.
Last updated: Dec 23 2024 at 12:05 UTC