fitzgen opened PR #2184 from souper-harvest
to main
:
This PR adds the
souper-harvest
subcommand toclif-util
.Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.
To harvest the expression that produced a given value
x
, we do a post-order traversal of the dataflow graph starting fromx
. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Soupervar
, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation ofx
as the root of left-hand side candidate.@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.
cc @jubitaneja @regehr
fitzgen requested iximeow for a review on PR #2184.
fitzgen updated PR #2184 from souper-harvest
to main
:
This PR adds the
souper-harvest
subcommand toclif-util
.Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.
To harvest the expression that produced a given value
x
, we do a post-order traversal of the dataflow graph starting fromx
. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Soupervar
, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation ofx
as the root of left-hand side candidate.@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.
cc @jubitaneja @regehr
fitzgen updated PR #2184 from souper-harvest
to main
:
This PR adds the
souper-harvest
subcommand toclif-util
.Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.
To harvest the expression that produced a given value
x
, we do a post-order traversal of the dataflow graph starting fromx
. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Soupervar
, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation ofx
as the root of left-hand side candidate.@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.
cc @jubitaneja @regehr
fitzgen updated PR #2184 from souper-harvest
to main
:
This PR adds the
souper-harvest
subcommand toclif-util
.Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.
To harvest the expression that produced a given value
x
, we do a post-order traversal of the dataflow graph starting fromx
. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Soupervar
, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation ofx
as the root of left-hand side candidate.@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.
cc @jubitaneja @regehr
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
self.compute_cfg(); self.preopt(isa)?;
Otherwise
preopt
panics in some cases withthread 'main' panicked at 'assertion failed: self.is_valid()', cranelift/codegen/src/flowgraph.rs:161:9
.
fitzgen submitted PR Review.
fitzgen created PR Review Comment:
Interesting, I haven't seen this panic yet when harvesting LHSs. Can you share the stack trace? I wonder if it is just related to the basic block clean ups that preopt does, which are irrelevant for LHS harvesting. If so, we might want to consider some refactoring in preopt so we can run just the peepholes alone.
bjorn3 submitted PR Review.
bjorn3 created PR Review Comment:
thread 'main' panicked at 'assertion failed: self.is_valid()', cranelift/codegen/src/flowgraph.rs:161:9 stack backtrace: [...] 11: std::panicking::begin_panic at /home/bjorn/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:410 12: cranelift_codegen::flowgraph::ControlFlowGraph::recompute_block at cranelift/codegen/src/flowgraph.rs:161 13: cranelift_codegen::simple_preopt::branch_order at cranelift/codegen/src/simple_preopt.rs:608 14: cranelift_codegen::simple_preopt::do_preopt at cranelift/codegen/src/simple_preopt.rs:1102 15: cranelift_codegen::context::Context::preopt at cranelift/codegen/src/context.rs:321 16: cranelift_codegen::context::Context::souper_harvest at cranelift/codegen/src/context.rs:463 17: clif_util::souper_harvest::run::{{closure}} at cranelift/src/souper_harvest.rs:74 [...]
Clif ir:
test compile set is_pic set enable_simd target x86_64-unknown-linux-gnu haswell function u0:2() system_v { ; symbol _ZN21mini_core_hello_world4main17h4614255fce60abb6E ss0 = explicit_slot 16 gv0 = symbol colocated u1:9 ; alloc29 gv1 = symbol colocated u1:10 ; alloc30 gv2 = symbol colocated u1:11 ; alloc32 gv3 = symbol colocated u1:12 ; trap at Instance { def: Item(WithOptConstParam { did: DefId(0:53 ~ mini_core_hello_world[317d]::main[0]), const_param_did: None }), substs: [] } (_ZN21mini_core_hello_world4main17h4614255fce60abb6E): [corruption] Diverging function returned sig0 = (i64, i64, i64) system_v sig1 = (i64) -> i32 system_v fn0 = u0:4 sig0 ; Instance { def: Item(WithOptConstParam { did: DefId(1:228 ~ mini_core[8787]::panic[0]), const_param_did: None }), substs: [] } fn1 = u0:3 sig1 ; puts block0: nop jump block1 block1: nop @0002 v0 = global_value.i64 gv0 @0002 v1 = load.i64 notrap v0 @0002 v2 = iconst.i64 2 @0001 stack_store v1, ss0 @0001 stack_store v2, ss0+8 @0007 v3 = stack_load.i64 ss0 @0007 v4 = stack_load.i64 ss0+8 @000a v5 = iconst.i64 4 @000a v6 = urem v3, v5 @000f brz v6, block2 @000f jump block3 block2: @000f nop @0010 v7 = iconst.i64 8 @0013 return block3: @0013 nop @0014 v8 = global_value.i64 gv1 @0014 v9 = iconst.i64 27 @0015 v10 = global_value.i64 gv2 @0015 v11 = iadd_imm v10, 0 @0015 call fn0(v8, v9, v11) @0015 v12 = global_value.i64 gv3 @0015 v13 = call fn1(v12) @0015 trap unreachable }
Rust source:
fn main() { let slice = &[0, 1] as &[i32]; let slice_ptr = slice as *const [i32] as *const i32; assert_eq!(slice_ptr as usize % 4, 0); }
bjorn3 edited PR Review Comment.
iximeow submitted PR Review.
iximeow submitted PR Review.
iximeow created PR Review Comment:
also
// TODO: ir::Opcode::IaddCout
?
iximeow created PR Review Comment:
I don't remember how many clif instructions have >1 result, but in the future this could be loosened potentially be loosened up to looking at all results and possibly materializing the results from different Souper-found optimizations, right?
iximeow created PR Review Comment:
If, indeed, there should be an
IaddCout
above, then this should be updated too
iximeow created PR Review Comment:
Me not knowing Souper well: how is the size to truncate to inferred?
fitzgen submitted PR Review.
fitzgen created PR Review Comment:
There aren't a ton that have multiple results. The most common are multi-value calls, which aren't relevant here. But add-with-carry and combined div/mul are candidates.
We would also need to extend peepmatic to work with operations that return multiple values, which it doesn't support right now. I expect this would require more effort than the harvester, which is relatively small and isolated.
fitzgen submitted PR Review.
fitzgen created PR Review Comment:
truncate
instructions require a type annotation on the result. When harvesting, we always add the type annotation to every result (see thesouper_type_of
call after thismatch
). We could elide most of them, allowing souper to infer them, but it seems like extra work and maintenance for very little benefit.
fitzgen updated PR #2184 from souper-harvest
to main
:
This PR adds the
souper-harvest
subcommand toclif-util
.Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.
To harvest the expression that produced a given value
x
, we do a post-order traversal of the dataflow graph starting fromx
. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Soupervar
, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation ofx
as the root of left-hand side candidate.@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.
cc @jubitaneja @regehr
fitzgen updated PR #2184 from souper-harvest
to main
:
This PR adds the
souper-harvest
subcommand toclif-util
.Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.
To harvest the expression that produced a given value
x
, we do a post-order traversal of the dataflow graph starting fromx
. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Soupervar
, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation ofx
as the root of left-hand side candidate.@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.
cc @jubitaneja @regehr
fitzgen updated PR #2184 from souper-harvest
to main
:
This PR adds the
souper-harvest
subcommand toclif-util
.Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.
To harvest the expression that produced a given value
x
, we do a post-order traversal of the dataflow graph starting fromx
. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Soupervar
, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation ofx
as the root of left-hand side candidate.@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.
cc @jubitaneja @regehr
fitzgen requested iximeow for a review on PR #2184.
iximeow submitted PR Review.
fitzgen updated PR #2184 from souper-harvest
to main
:
This PR adds the
souper-harvest
subcommand toclif-util
.Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.
To harvest the expression that produced a given value
x
, we do a post-order traversal of the dataflow graph starting fromx
. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Soupervar
, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation ofx
as the root of left-hand side candidate.@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.
cc @jubitaneja @regehr
iximeow submitted PR Review.
fitzgen merged PR #2184.
Last updated: Dec 23 2024 at 12:05 UTC