wasmtime / PR #2184 Harvest left-hand side superoptimizat... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #2184 Harvest left-hand side superoptimizat...

Wasmtime GitHub notifications bot (Sep 02 2020 at 22:37):

fitzgen opened PR #2184 from souper-harvest to main:

This PR adds the souper-harvest subcommand to clif-util.

Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.

To harvest the expression that produced a given value x, we do a post-order traversal of the dataflow graph starting from x. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Souper var, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation of x as the root of left-hand side candidate.

@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.

cc @jubitaneja @regehr

Wasmtime GitHub notifications bot (Sep 02 2020 at 22:37):

fitzgen requested iximeow for a review on PR #2184.

Wasmtime GitHub notifications bot (Sep 02 2020 at 22:57):

fitzgen updated PR #2184 from souper-harvest to main:

This PR adds the souper-harvest subcommand to clif-util.

Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.

To harvest the expression that produced a given value x, we do a post-order traversal of the dataflow graph starting from x. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Souper var, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation of x as the root of left-hand side candidate.

@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.

cc @jubitaneja @regehr

Wasmtime GitHub notifications bot (Sep 03 2020 at 15:56):

fitzgen updated PR #2184 from souper-harvest to main:

This PR adds the souper-harvest subcommand to clif-util.

Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.

To harvest the expression that produced a given value x, we do a post-order traversal of the dataflow graph starting from x. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Souper var, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation of x as the root of left-hand side candidate.

@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.

cc @jubitaneja @regehr

Wasmtime GitHub notifications bot (Sep 10 2020 at 20:02):

fitzgen updated PR #2184 from souper-harvest to main:

This PR adds the souper-harvest subcommand to clif-util.

Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.

To harvest the expression that produced a given value x, we do a post-order traversal of the dataflow graph starting from x. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Souper var, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation of x as the root of left-hand side candidate.

@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.

cc @jubitaneja @regehr

Wasmtime GitHub notifications bot (Sep 11 2020 at 08:15):

bjorn3 submitted PR Review.

Wasmtime GitHub notifications bot (Sep 11 2020 at 08:15):

bjorn3 created PR Review Comment:

        self.compute_cfg();
        self.preopt(isa)?;
Otherwise preopt panics in some cases with thread 'main' panicked at 'assertion failed: self.is_valid()', cranelift/codegen/src/flowgraph.rs:161:9.

Wasmtime GitHub notifications bot (Sep 11 2020 at 15:31):

fitzgen submitted PR Review.

Wasmtime GitHub notifications bot (Sep 11 2020 at 15:31):

fitzgen created PR Review Comment:

Interesting, I haven't seen this panic yet when harvesting LHSs. Can you share the stack trace? I wonder if it is just related to the basic block clean ups that preopt does, which are irrelevant for LHS harvesting. If so, we might want to consider some refactoring in preopt so we can run just the peepholes alone.

Wasmtime GitHub notifications bot (Sep 11 2020 at 16:00):

bjorn3 submitted PR Review.

Wasmtime GitHub notifications bot (Sep 11 2020 at 16:00):

bjorn3 created PR Review Comment:

thread 'main' panicked at 'assertion failed: self.is_valid()', cranelift/codegen/src/flowgraph.rs:161:9
stack backtrace:
[...]
  11: std::panicking::begin_panic
             at /home/bjorn/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:410
  12: cranelift_codegen::flowgraph::ControlFlowGraph::recompute_block
             at cranelift/codegen/src/flowgraph.rs:161
  13: cranelift_codegen::simple_preopt::branch_order
             at cranelift/codegen/src/simple_preopt.rs:608
  14: cranelift_codegen::simple_preopt::do_preopt
             at cranelift/codegen/src/simple_preopt.rs:1102
  15: cranelift_codegen::context::Context::preopt
             at cranelift/codegen/src/context.rs:321
  16: cranelift_codegen::context::Context::souper_harvest
             at cranelift/codegen/src/context.rs:463
  17: clif_util::souper_harvest::run::{{closure}}
             at cranelift/src/souper_harvest.rs:74
[...]

Clif ir:

test compile
set is_pic
set enable_simd
target x86_64-unknown-linux-gnu haswell

function u0:2() system_v {
; symbol _ZN21mini_core_hello_world4main17h4614255fce60abb6E

    ss0 = explicit_slot 16
    gv0 = symbol colocated u1:9 ; alloc29
    gv1 = symbol colocated u1:10 ; alloc30
    gv2 = symbol colocated u1:11 ; alloc32
    gv3 = symbol colocated u1:12 ; trap at Instance { def: Item(WithOptConstParam { did: DefId(0:53 ~ mini_core_hello_world[317d]::main[0]), const_param_did: None }), substs: [] } (_ZN21mini_core_hello_world4main17h4614255fce60abb6E): [corruption] Diverging function returned
    sig0 = (i64, i64, i64) system_v
    sig1 = (i64) -> i32 system_v
    fn0 = u0:4 sig0 ; Instance { def: Item(WithOptConstParam { did: DefId(1:228 ~ mini_core[8787]::panic[0]), const_param_did: None }), substs: [] }
    fn1 = u0:3 sig1 ; puts

                                block0:
                                    nop
                                    jump block1

                                block1:
                                    nop
@0002                               v0 = global_value.i64 gv0
@0002                               v1 = load.i64 notrap v0
@0002                               v2 = iconst.i64 2
@0001                               stack_store v1, ss0
@0001                               stack_store v2, ss0+8
@0007                               v3 = stack_load.i64 ss0
@0007                               v4 = stack_load.i64 ss0+8
@000a                               v5 = iconst.i64 4
@000a                               v6 = urem v3, v5
@000f                               brz v6, block2
@000f                               jump block3

                                block2:
@000f                               nop
@0010                               v7 = iconst.i64 8
@0013                               return

                                block3:
@0013                               nop
@0014                               v8 = global_value.i64 gv1
@0014                               v9 = iconst.i64 27
@0015                               v10 = global_value.i64 gv2
@0015                               v11 = iadd_imm v10, 0
@0015                               call fn0(v8, v9, v11)
@0015                               v12 = global_value.i64 gv3
@0015                               v13 = call fn1(v12)
@0015                               trap unreachable
}

Rust source:

fn main() {
    let slice = &[0, 1] as &[i32];
    let slice_ptr = slice as *const [i32] as *const i32;

    assert_eq!(slice_ptr as usize % 4, 0);
}

Wasmtime GitHub notifications bot (Sep 11 2020 at 16:14):

bjorn3 edited PR Review Comment.

Wasmtime GitHub notifications bot (Sep 12 2020 at 18:50):

iximeow submitted PR Review.

Wasmtime GitHub notifications bot (Sep 12 2020 at 18:50):

iximeow submitted PR Review.

Wasmtime GitHub notifications bot (Sep 12 2020 at 18:50):

iximeow created PR Review Comment:

also // TODO: ir::Opcode::IaddCout?

Wasmtime GitHub notifications bot (Sep 12 2020 at 18:50):

iximeow created PR Review Comment:

I don't remember how many clif instructions have >1 result, but in the future this could be loosened potentially be loosened up to looking at all results and possibly materializing the results from different Souper-found optimizations, right?

Wasmtime GitHub notifications bot (Sep 12 2020 at 18:50):

iximeow created PR Review Comment:

If, indeed, there should be an IaddCout above, then this should be updated too

Wasmtime GitHub notifications bot (Sep 12 2020 at 18:50):

iximeow created PR Review Comment:

Me not knowing Souper well: how is the size to truncate to inferred?

Wasmtime GitHub notifications bot (Sep 14 2020 at 15:25):

fitzgen submitted PR Review.

Wasmtime GitHub notifications bot (Sep 14 2020 at 15:25):

fitzgen created PR Review Comment:

There aren't a ton that have multiple results. The most common are multi-value calls, which aren't relevant here. But add-with-carry and combined div/mul are candidates.

We would also need to extend peepmatic to work with operations that return multiple values, which it doesn't support right now. I expect this would require more effort than the harvester, which is relatively small and isolated.

Wasmtime GitHub notifications bot (Sep 14 2020 at 15:29):

fitzgen submitted PR Review.

Wasmtime GitHub notifications bot (Sep 14 2020 at 15:29):

fitzgen created PR Review Comment:

truncate instructions require a type annotation on the result. When harvesting, we always add the type annotation to every result (see the souper_type_of call after this match). We could elide most of them, allowing souper to infer them, but it seems like extra work and maintenance for very little benefit.

Wasmtime GitHub notifications bot (Sep 14 2020 at 15:42):

fitzgen updated PR #2184 from souper-harvest to main:

This PR adds the souper-harvest subcommand to clif-util.

Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.

To harvest the expression that produced a given value x, we do a post-order traversal of the dataflow graph starting from x. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Souper var, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation of x as the root of left-hand side candidate.

@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.

cc @jubitaneja @regehr

Wasmtime GitHub notifications bot (Sep 14 2020 at 15:43):

fitzgen updated PR #2184 from souper-harvest to main:

This PR adds the souper-harvest subcommand to clif-util.

Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.

To harvest the expression that produced a given value x, we do a post-order traversal of the dataflow graph starting from x. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Souper var, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation of x as the root of left-hand side candidate.

@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.

cc @jubitaneja @regehr

Wasmtime GitHub notifications bot (Sep 14 2020 at 15:48):

fitzgen updated PR #2184 from souper-harvest to main:

This PR adds the souper-harvest subcommand to clif-util.

Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.

To harvest the expression that produced a given value x, we do a post-order traversal of the dataflow graph starting from x. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Souper var, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation of x as the root of left-hand side candidate.

@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.

cc @jubitaneja @regehr

Wasmtime GitHub notifications bot (Sep 14 2020 at 21:53):

fitzgen requested iximeow for a review on PR #2184.

Wasmtime GitHub notifications bot (Sep 14 2020 at 23:03):

iximeow submitted PR Review.

Wasmtime GitHub notifications bot (Sep 14 2020 at 23:28):

fitzgen updated PR #2184 from souper-harvest to main:

This PR adds the souper-harvest subcommand to clif-util.

Given a clif function, harvest all its integer subexpressions, so that they can be fed into Souper as candidates for superoptimization. For some of these candidates, Souper will successfully synthesize a right-hand side that is equivalent but has lower cost than the left-hand side. Then, we can combine these left- and right-hand sides into a complete optimization, and add it to our peephole passes.

To harvest the expression that produced a given value x, we do a post-order traversal of the dataflow graph starting from x. As we do this traversal, we maintain a map from clif values to their translated Souper values. We stop traversing when we reach anything that can't be translated into Souper IR: a memory load, a float-to-int conversion, a block parameter, etc. For values produced by these instructions, we create a Souper var, which is an input variable to the optimization. For instructions that have a direct mapping into Souper IR, we get the Souper version of each of its operands and then create the Souper version of the instruction itself. It should now be clear why we do a post-order traversal: we need an instruction's translated operands in order to translate the instruction itself. Once this instruction is translated, we update the clif-to-souper map with this new translation so that any other instruction that uses this result as an operand has access to the translated value. When the traversal is complete we return the translation of x as the root of left-hand side candidate.

@iximeow, do you feel comfortable reviewing this? If so, please take a look. If not, I can find another reviewer, no worries.

cc @jubitaneja @regehr

Wasmtime GitHub notifications bot (Sep 14 2020 at 23:30):

iximeow submitted PR Review.

Wasmtime GitHub notifications bot (Sep 15 2020 at 01:01):

fitzgen merged PR #2184.

Last updated: Apr 18 2025 at 20:03 UTC