alexcrichton opened issue #3441:
A fuzz-generated test recently found times out during compilation. This single-function module looks like:
(module (func loop end loop end loop end ;; ... ~20k more times loop end loop end))
Compiling this module yields:
DEBUG wasmtime_cranelift::compiler > FuncIndex(0) translated in 1.248188995s TRACE wasmtime_cranelift::compiler > FuncIndex(0) timing info ======== ======== ================================== Total Self Pass -------- -------- ---------------------------------- 0.006 0.006 Translate WASM function 0.624 0.007 Compilation passes 0.003 0.003 Control flow graph 0.003 0.003 Dominator tree 0.000 0.000 Loop analysis 0.000 0.000 Pre-legalization rewriting 0.000 0.000 Dead code elimination 0.001 0.001 Global value numbering 0.002 0.000 Loop invariant code motion 0.000 0.000 Remove unreachable blocks 0.001 0.001 Remove constant phi-nodes 0.005 0.005 VCode lowering 0.000 0.000 VCode post-register allocation finalization 0.270 0.270 VCode emission 0.000 0.000 VCode emission finalization 0.333 0.333 Register allocation 0.000 0.000 Binary machine code emission ======== ======== ==================================
Something here seems accidentally quadratic perhaps? Whatever the reason this function probably shouldn't take upwards of a second to compile.
adamrk commented on issue #3441:
Took a look at this and it seems like there is quadratic behavior coming from
regalloc
here. When calculating the loop depth it's iterating over each block and then traversing the entire dominance tree each time. Opened this PR which I think fixes that issue.Here are the results I see from that.
Baseline:DEBUG wasmtime_cranelift::compiler > FuncIndex(0) translated in 1.950180269s TRACE wasmtime_cranelift::compiler > FuncIndex(0) timing info ======== ======== ================================== Total Self Pass -------- -------- ---------------------------------- 0.014 0.014 Translate WASM function 1.936 0.017 Compilation passes 0.008 0.008 Control flow graph 0.007 0.007 Dominator tree 0.001 0.001 Loop analysis 0.001 0.001 Pre-legalization rewriting 0.000 0.000 Dead code elimination 0.001 0.001 Global value numbering 0.004 0.000 Loop invariant code motion 0.000 0.000 Remove unreachable blocks 0.001 0.001 Remove constant phi-nodes 0.014 0.014 VCode lowering 0.002 0.002 VCode post-register allocation finalization 0.621 0.621 VCode emission 0.000 0.000 VCode emission finalization 1.264 1.264 Register allocation 0.000 0.000 Binary machine code emission ======== ======== ==================================
With the change:
DEBUG wasmtime_cranelift::compiler > FuncIndex(0) translated in 632.089765ms TRACE wasmtime_cranelift::compiler > FuncIndex(0) timing info ======== ======== ================================== Total Self Pass -------- -------- ---------------------------------- 0.013 0.013 Translate WASM function 0.619 0.015 Compilation passes 0.007 0.007 Control flow graph 0.008 0.008 Dominator tree 0.001 0.001 Loop analysis 0.001 0.001 Pre-legalization rewriting 0.000 0.000 Dead code elimination 0.001 0.001 Global value numbering 0.005 0.000 Loop invariant code motion 0.000 0.000 Remove unreachable blocks 0.001 0.001 Remove constant phi-nodes 0.011 0.011 VCode lowering 0.002 0.002 VCode post-register allocation finalization 0.554 0.554 VCode emission 0.000 0.000 VCode emission finalization 0.018 0.018 Register allocation 0.000 0.000 Binary machine code emission ======== ======== ==================================
cfallin closed issue #3441:
A fuzz-generated test recently found times out during compilation. This single-function module looks like:
(module (func loop end loop end loop end ;; ... ~20k more times loop end loop end))
Compiling this module yields:
DEBUG wasmtime_cranelift::compiler > FuncIndex(0) translated in 1.248188995s TRACE wasmtime_cranelift::compiler > FuncIndex(0) timing info ======== ======== ================================== Total Self Pass -------- -------- ---------------------------------- 0.006 0.006 Translate WASM function 0.624 0.007 Compilation passes 0.003 0.003 Control flow graph 0.003 0.003 Dominator tree 0.000 0.000 Loop analysis 0.000 0.000 Pre-legalization rewriting 0.000 0.000 Dead code elimination 0.001 0.001 Global value numbering 0.002 0.000 Loop invariant code motion 0.000 0.000 Remove unreachable blocks 0.001 0.001 Remove constant phi-nodes 0.005 0.005 VCode lowering 0.000 0.000 VCode post-register allocation finalization 0.270 0.270 VCode emission 0.000 0.000 VCode emission finalization 0.333 0.333 Register allocation 0.000 0.000 Binary machine code emission ======== ======== ==================================
Something here seems accidentally quadratic perhaps? Whatever the reason this function probably shouldn't take upwards of a second to compile.
cfallin commented on issue #3441:
Ah, GitHub interpreted my "fixes ..." over on the regalloc.rs side over-optimistically; this isn't actually fixed in Cranelift until we update to regalloc.rs 0.0.32, so I'll reopen this.
cfallin reopened issue #3441:
A fuzz-generated test recently found times out during compilation. This single-function module looks like:
(module (func loop end loop end loop end ;; ... ~20k more times loop end loop end))
Compiling this module yields:
DEBUG wasmtime_cranelift::compiler > FuncIndex(0) translated in 1.248188995s TRACE wasmtime_cranelift::compiler > FuncIndex(0) timing info ======== ======== ================================== Total Self Pass -------- -------- ---------------------------------- 0.006 0.006 Translate WASM function 0.624 0.007 Compilation passes 0.003 0.003 Control flow graph 0.003 0.003 Dominator tree 0.000 0.000 Loop analysis 0.000 0.000 Pre-legalization rewriting 0.000 0.000 Dead code elimination 0.001 0.001 Global value numbering 0.002 0.000 Loop invariant code motion 0.000 0.000 Remove unreachable blocks 0.001 0.001 Remove constant phi-nodes 0.005 0.005 VCode lowering 0.000 0.000 VCode post-register allocation finalization 0.270 0.270 VCode emission 0.000 0.000 VCode emission finalization 0.333 0.333 Register allocation 0.000 0.000 Binary machine code emission ======== ======== ==================================
Something here seems accidentally quadratic perhaps? Whatever the reason this function probably shouldn't take upwards of a second to compile.
cfallin closed issue #3441:
A fuzz-generated test recently found times out during compilation. This single-function module looks like:
(module (func loop end loop end loop end ;; ... ~20k more times loop end loop end))
Compiling this module yields:
DEBUG wasmtime_cranelift::compiler > FuncIndex(0) translated in 1.248188995s TRACE wasmtime_cranelift::compiler > FuncIndex(0) timing info ======== ======== ================================== Total Self Pass -------- -------- ---------------------------------- 0.006 0.006 Translate WASM function 0.624 0.007 Compilation passes 0.003 0.003 Control flow graph 0.003 0.003 Dominator tree 0.000 0.000 Loop analysis 0.000 0.000 Pre-legalization rewriting 0.000 0.000 Dead code elimination 0.001 0.001 Global value numbering 0.002 0.000 Loop invariant code motion 0.000 0.000 Remove unreachable blocks 0.001 0.001 Remove constant phi-nodes 0.005 0.005 VCode lowering 0.000 0.000 VCode post-register allocation finalization 0.270 0.270 VCode emission 0.000 0.000 VCode emission finalization 0.333 0.333 Register allocation 0.000 0.000 Binary machine code emission ======== ======== ==================================
Something here seems accidentally quadratic perhaps? Whatever the reason this function probably shouldn't take upwards of a second to compile.
alexcrichton commented on issue #3441:
Thanks @adamrk for the fix here!
@cfallin I was also slightly worried on this that vcode emission is a pretty big time sink of 600ms in these timings compared to 13ms creation of the clif, do you think that's also worth looking into and/or opening a separate issue for?
cfallin commented on issue #3441:
@alexcrichton I spent some time profiling this and it appears that there is indeed some suboptimal behavior in the branch simplification; specifically all the empty loops cause thousands of labels to alias and this wreaks havoc on things. I'll create an issue and see if I can add a limiter of some sort.
Last updated: Dec 23 2024 at 12:05 UTC