cranelift dce · cranelift · Zulip Chat Archive

I know there used to be a function in do_dce but I can't seem to find it anymore. Anyone happen to know where it was moved?

Chris Fallin (Jul 01 2024 at 23:58):

It was removed when we removed the old optimization pipeline (in favor of the new egraphs-based pipeline)

Chris Fallin (Jul 01 2024 at 23:58):

The new pipeline also does DCE, implicitly, as part of the "elaboration" pass from the egraph back to CLIF

Leaves (Jul 02 2024 at 02:23):

Any references you would recommend for writing a DCE algorithm for an IR that uses basic block parameters? Specifically, handling the elimination of parameters when used in loops. With PHI's it's somewhat obvious because the phi joins itself with an incoming variable. But the BBparams I guess I need to write some algorithm to discover the same situation.

Leaves (Jul 02 2024 at 15:03):

reading over the OLD cranelift dce pass... It doesn't seem to do anything with block parameters. If unused block parameters are not being removed, then predecessor blocks cannot know that successors don't need them. Which would mean that only code unused by the block it is in is going to be removed. which makes me question the post order traversal in the first place?

fitzgen (he/him) (Jul 02 2024 at 15:47):

Leaves (Jul 02 2024 at 16:28):

during lowering might make sense, since I just can't seem to find where useless bb params are being removed.

Leaves (Jul 02 2024 at 16:31):

David Lloyd (Jul 02 2024 at 16:44):

fitzgen (he/him) (Jul 02 2024 at 16:44):

Leaves (Jul 02 2024 at 23:30):

Leaves (Jul 02 2024 at 23:31):

v0, created in BB1 is dead and has no actual uses other than being passed around in a loop by BB2.

Chris Fallin (Jul 03 2024 at 00:11):

yep, also during lowering, implicitly (due to our demand-driven approach); so we have two implicit DCEs, one in the mid-end and one in the back-end

we separately have a remove_constant_phis pass that is meant to address this (link)

note that integrating this with at least the mid-end DCE (egraph approach overall, really) is something we'd like to do in the future, but for now they're separate

wasmtime/cranelift/codegen/src/remove_constant_phis.rs at main · bytecodealliance/wasmtime

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

Leaves (Jul 03 2024 at 00:20):

It would seem to me like this would HAVE to be integrated with some sort of DCE pass to produce good results?Additionally, because these are separate, confirming that the original do_dce pass wasn't doing this, what was the point of the dfs post_order traversal in there? Since parameters wouldn't be changed, there could be no interblock/crossblock dce.

The dfs post-order makes sense in the same way that traversing the instructions in a single block backwards make sense, and when the cfg is a DAG my dce algorithm works as it does change block parameters as it traverses the dfs post-order. But without changing block parameters, I can't see what good the traversal of the BBs in that order is(was, since it's removed now) doing?

Chris Fallin (Jul 03 2024 at 00:34):

Well, no, the constant-phis removal pass turned out to be useful at the time even without being in a DCE (or full opt rewrite) fixpoint. I believe the main issue when Julian wrote it (circa 2020 or so) was excessive redundancy in blockparams coming from the frontend at the time; subsequently Jamey did some cleanups to that and that may have reduced some pressure.

Note that SSA allows inter-block use-def links; the only constraint is that the use is in a block dominated by the def (including but not limited to the same block).

Given that, removing dead instructions in a block deeper in the domtree (earlier in postorder) can lead to instructions becoming dead in "parent" (later) blocks.

I believe you're imagining the IR as a sort of "max-SSA" construction where every live value passes through blockparams, so unchanging blockparams "lock down" liveness at block boundaries; that's (fortunately!) not the case.

Leaves (Jul 03 2024 at 00:41):

Oh... I'm definitely imagining the IR as that, very good point. As it stands, I do pass all values from one block to another through block parameters. Why do you say "fortunately"?

Leaves (Jul 03 2024 at 00:42):

And yeah this locking down of liveness does seem to be what I need to deal with.

Leaves (Jul 03 2024 at 00:43):

And also sorry if I haven't been clear, I am not actually using cranelift, I am designing my own IR but am following several of the same design decisions as cranelift(and using Regalloc2 in the process because it's so handy!)

Chris Fallin (Jul 03 2024 at 00:57):

Ah, "fortunately" because it's a huge efficiency loss to require all live values to pass through blockparams, for both direct and indirect reasons (I learned this when building another tool using SSA -- part of weval -- that initially required this invariant for correctness):

Chris Fallin (Jul 03 2024 at 00:58):

Leaves (Jul 03 2024 at 01:39):

Yeah I can see how that might choke things up for regalloc2. Ok going to spend some time reworking things now. No real need for the maximal ssa.

Leaves (Jul 03 2024 at 01:44):

what I really want is a pruned or optimal ssa I guess. Need to figure out how to produce that.

Altan Haan (Sep 18 2024 at 22:53):

Reviving this since I came across something that might be related (let me know if I should make a new thread!)

int  foo(int x) {
    int y;
    if (x) {
        y = 1;
    } else {
        y = 1;
    }
    return y;
}

  (func $foo (type 1) (param i32) (result i32)
    (local i32)
    block
      block
        local.get 0
        i32.eqz
        br_if 0
        i32.const 1
        local.set 1
        br 1
      end
      i32.const 1
      local.set 1
    end
    local.get 1
    return)

0000000000000020 <wasm[0]::function[1]>:
      20:       55                      push   %rbp
      21:       48 89 e5                mov    %rsp,%rbp
      24:       85 d2                   test   %edx,%edx
      26:       0f 84 0a 00 00 00       je     36 <wasm[0]::function[1]+0x16>
      2c:       b8 01 00 00 00          mov    $0x1,%eax
      31:       e9 05 00 00 00          jmp    3b <wasm[0]::function[1]+0x1b>
      36:       b8 01 00 00 00          mov    $0x1,%eax
      3b:       48 89 ec                mov    %rbp,%rsp
      3e:       5d                      pop    %rbp
      3f:       c3                      ret

Evidently, the branch was not optimized away. This was slightly surprising since I expected that egraph GVN + elaboration should be able to eliminate the distinct assignments. Is this optimization missed because it requires composing elaboration with remove_constant_phis and some kind of branch elimination?

Chris Fallin (Sep 18 2024 at 22:58):

Right, we don't remove blockparams unless they are actually taking the same SSA value on all in-edges ("constant phis" is a little bit of a misnomer -- it doesn't refer to integer constants or any sort of constant-value analysis). That's why the value is materialized on both sides of the diamond (though fortunately register hinting and liverange merging worked well in regalloc here, so materialized directly to the return register)

Chris Fallin (Sep 18 2024 at 23:00):

the egraph infrastructure actually doesn't mutate blockparams at all -- that adds significant additional complexity that we haven't worked through yet (it's fair to say it's a research question, I guess?); "constant phi removal" happens first, to clean up anything that SSA construction from the Wasm may have missed

Amanieu (Sep 18 2024 at 23:56):

@Chris Fallin I actually do this in my compiler. I don't use egraphs but a forward-dataflow optimization (const-prop, simplification, etc) but the same principle applies. Blockparams are basically multi-input pure operations, the only tricky part is that the input may come from a later instruction in the processing order. The way to handle this is to re-run the pass until a fixed point: every time an branch argument is modified and the branch target has already been processed in the current pass then you need to do run another pass.

Amanieu (Sep 18 2024 at 23:58):

To avoid redundant work I do track for each value whether it has changed since the last pass. If no instruction inputs have changed then there is no need to re-process it in this pass.

Chris Fallin (Sep 19 2024 at 00:12):

Indeed, it's definitely possible! I do it in waffle (a Wasm-to-Wasm compiler library) as well. The trick with Cranelift is integrating it into the aegraph -- it's not a traditional multipass setup, there are interesting data structure implications to modifying the "skeleton", that's really the challenge

Jacob Lifshay (Sep 19 2024 at 03:17):

I'm not as familiar with cranelift's internals, but assuming you have one aegraph for the function, couldn't you add the blockparams to the aegraph as phi functions since they're pure functions (with an extra input for the program counter)? this would allow pretty trivial dce of blockparams, assuming the aegraph can support operations that have cycles in the data flow graph

Jacob Lifshay (Sep 19 2024 at 03:18):

this would allow you to make it look like the skeleton wasn't changing since the changing parts would instead be the phi functions

Chris Fallin (Sep 19 2024 at 03:51):

Well, not quite; the aegraph is acyclic exactly because we break it at blockparams (i.e. do not see through them), and the acyclicity is critical for a bunch of other properties.

There's a lot of subtlety here; I'd encourage you to study the code (there are a lot of comments) and watch the talk; a number of folks have thought pretty hard about various aspects and "why not do X ..." thoughts are best had after all of that background is absorbed, though I do appreciate it is well-intentioned!

Jacob Lifshay (Sep 19 2024 at 05:40):

I did a while ago, but apparently was brain-dead tired enough today that I forgot about needing acyclicity

Stream: cranelift

Topic: cranelift dce

Leaves (Jul 01 2024 at 22:29):

Chris Fallin (Jul 01 2024 at 23:58):

Chris Fallin (Jul 01 2024 at 23:58):

Leaves (Jul 02 2024 at 02:23):

Leaves (Jul 02 2024 at 15:03):

fitzgen (he/him) (Jul 02 2024 at 15:47):

Leaves (Jul 02 2024 at 16:28):

Leaves (Jul 02 2024 at 16:31):

David Lloyd (Jul 02 2024 at 16:44):

fitzgen (he/him) (Jul 02 2024 at 16:44):

Leaves (Jul 02 2024 at 23:30):

Leaves (Jul 02 2024 at 23:31):

Chris Fallin (Jul 03 2024 at 00:11):

Leaves (Jul 03 2024 at 00:20):

Chris Fallin (Jul 03 2024 at 00:34):

Leaves (Jul 03 2024 at 00:41):

Leaves (Jul 03 2024 at 00:42):

Leaves (Jul 03 2024 at 00:43):

Chris Fallin (Jul 03 2024 at 00:57):

Chris Fallin (Jul 03 2024 at 00:58):

Leaves (Jul 03 2024 at 01:39):

Leaves (Jul 03 2024 at 01:44):

Altan Haan (Sep 18 2024 at 22:53):

Chris Fallin (Sep 18 2024 at 22:58):

Chris Fallin (Sep 18 2024 at 23:00):

Amanieu (Sep 18 2024 at 23:56):

Amanieu (Sep 18 2024 at 23:58):

Chris Fallin (Sep 19 2024 at 00:12):

Jacob Lifshay (Sep 19 2024 at 03:17):

Jacob Lifshay (Sep 19 2024 at 03:18):

Chris Fallin (Sep 19 2024 at 03:51):

Jacob Lifshay (Sep 19 2024 at 05:40):