mmitton commented on issue #5851:
This function contains both BrIf and BrTable with a loop and optimizes it down to a constant return.
function u0:0() -> i32 system_v {
block0:
v0 = iconst.i32 1
v1 = iconst.i32 10
brif v0, block1, block2(v1) ; v0 = 1, v1 = 10block1:
v2 = iconst.i32 0
jump block5(v2) ; v2 = 0block2(v3: i32):
v4 = iconst.i32 1
v5 = isub v3, v4 ; v4 = 1
jump block3block3:
brif.i32 v5, block2(v5), block4block4:
v6 = iconst.i32 8
jump block5(v6) ; v6 = 8block5(v7: i32):
br_table v7, block6, [block7, block8]block6:
v8 = iconst.i32 4
jump block9(v8) ; v8 = 4block7:
v9 = iconst.i32 8
jump block9(v9) ; v9 = 8block8:
v10 = iconst.i32 16
jump block9(v10) ; v10 = 16block9(v11: i32):
return v11
}[cranelift_codegen::isa::x64] disassembly:
pushq %rbp
unwind PushFrameRegs { offset_upward_to_caller_sp: 16 }
movq %rsp, %rbp
unwind DefineNewFrame { offset_upward_to_caller_sp: 16, offset_downward_to_clobbers: 0 }
block0:
jmp label1
block1:
jmp label2
block2:
jmp label3
block3:
jmp label4
block4:
jmp label5
block5:
movl $8, %eax
movq %rbp, %rsp
popq %rbp
ret
jameysharp commented on issue #5851:
This is a fantastic illustration of how much better Cranelift could do with all these optimizations combined. Thank you for putting it together! The example you've given in the comment above is compelling: I love seeing all that code disappear.
I'm having trouble reviewing this all at once, though. I'm hoping we can split it into a series of smaller PRs. Individually they won't be nearly as effective, of course, but we need to be able to reason about the correctness of each change individually.
What I'd like to see first is just brif/br_table folding. I'd like to leave out the changes to block parameters and the changes for domtree reconstruction. Maybe that just means putting your first two commits in a new PR?
I suspect that piece by itself will be uncontroversial and easy to merge, while already being a big improvement. We'll need one more thing before merging it: some new small regression tests in
cranelift/filetests/filetests/egraph/
demonstrating that these kinds of branches are correctly folded away in simple cases. For bonus points, you could measure the impact of this change using our Sightglass benchmark suite.Once that's merged, let's come back to the rest of the changes in this PR!
jameysharp commented on issue #5851:
I've written up a bunch of thoughts about our general requirements for branch folding in a pair of issues. These two issues describe essentially independent tasks, and both are complex enough that we need to solve them separately so we can have a chance at reviewing each of them. On top of that, a third independent issue may make the other two easier. So probably the best order to tackle these is:
- #5908
- #6109
- #6106
The biggest challenge for all of this work, and for this PR as it stands now, is that we want the egraph pass to visit each basic block only once. In fact some fundamental invariants of our egraph implementation currently rely on this. Someday I want to relax that restriction and have the _option_ of running equality saturation to a fixpoint, but we aren't there today.
So we miss some optimization opportunities on loops, but we still find a lot of optimizations, and this restriction keeps the optimization pass fast.
Last updated: Nov 22 2024 at 16:03 UTC