Stream: git-wasmtime

Topic: wasmtime / issue #3441 Seemingly-quadratic behavior compi...


view this post on Zulip Wasmtime GitHub notifications bot (Oct 11 2021 at 16:59):

alexcrichton opened issue #3441:

A fuzz-generated test recently found times out during compilation. This single-function module looks like:

(module
  (func
    loop end
    loop end
    loop end
    ;; ... ~20k more times
    loop end
    loop end))

Compiling this module yields:

 DEBUG wasmtime_cranelift::compiler > FuncIndex(0) translated in 1.248188995s
 TRACE wasmtime_cranelift::compiler > FuncIndex(0) timing info
======== ========  ==================================
   Total     Self  Pass
-------- --------  ----------------------------------
   0.006    0.006  Translate WASM function
   0.624    0.007  Compilation passes
   0.003    0.003  Control flow graph
   0.003    0.003  Dominator tree
   0.000    0.000  Loop analysis
   0.000    0.000  Pre-legalization rewriting
   0.000    0.000  Dead code elimination
   0.001    0.001  Global value numbering
   0.002    0.000  Loop invariant code motion
   0.000    0.000  Remove unreachable blocks
   0.001    0.001  Remove constant phi-nodes
   0.005    0.005  VCode lowering
   0.000    0.000  VCode post-register allocation finalization
   0.270    0.270  VCode emission
   0.000    0.000  VCode emission finalization
   0.333    0.333  Register allocation
   0.000    0.000  Binary machine code emission
======== ========  ==================================

Something here seems accidentally quadratic perhaps? Whatever the reason this function probably shouldn't take upwards of a second to compile.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2021 at 20:17):

adamrk commented on issue #3441:

Took a look at this and it seems like there is quadratic behavior coming from regalloc here. When calculating the loop depth it's iterating over each block and then traversing the entire dominance tree each time. Opened this PR which I think fixes that issue.

Here are the results I see from that.
Baseline:

 DEBUG wasmtime_cranelift::compiler > FuncIndex(0) translated in 1.950180269s
 TRACE wasmtime_cranelift::compiler > FuncIndex(0) timing info
======== ========  ==================================
   Total     Self  Pass
-------- --------  ----------------------------------
   0.014    0.014  Translate WASM function
   1.936    0.017  Compilation passes
   0.008    0.008  Control flow graph
   0.007    0.007  Dominator tree
   0.001    0.001  Loop analysis
   0.001    0.001  Pre-legalization rewriting
   0.000    0.000  Dead code elimination
   0.001    0.001  Global value numbering
   0.004    0.000  Loop invariant code motion
   0.000    0.000  Remove unreachable blocks
   0.001    0.001  Remove constant phi-nodes
   0.014    0.014  VCode lowering
   0.002    0.002  VCode post-register allocation finalization
   0.621    0.621  VCode emission
   0.000    0.000  VCode emission finalization
   1.264    1.264  Register allocation
   0.000    0.000  Binary machine code emission
======== ========  ==================================

With the change:

 DEBUG wasmtime_cranelift::compiler > FuncIndex(0) translated in 632.089765ms
 TRACE wasmtime_cranelift::compiler > FuncIndex(0) timing info
======== ========  ==================================
   Total     Self  Pass
-------- --------  ----------------------------------
   0.013    0.013  Translate WASM function
   0.619    0.015  Compilation passes
   0.007    0.007  Control flow graph
   0.008    0.008  Dominator tree
   0.001    0.001  Loop analysis
   0.001    0.001  Pre-legalization rewriting
   0.000    0.000  Dead code elimination
   0.001    0.001  Global value numbering
   0.005    0.000  Loop invariant code motion
   0.000    0.000  Remove unreachable blocks
   0.001    0.001  Remove constant phi-nodes
   0.011    0.011  VCode lowering
   0.002    0.002  VCode post-register allocation finalization
   0.554    0.554  VCode emission
   0.000    0.000  VCode emission finalization
   0.018    0.018  Register allocation
   0.000    0.000  Binary machine code emission
======== ========  ==================================

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2021 at 21:56):

cfallin closed issue #3441:

A fuzz-generated test recently found times out during compilation. This single-function module looks like:

(module
  (func
    loop end
    loop end
    loop end
    ;; ... ~20k more times
    loop end
    loop end))

Compiling this module yields:

 DEBUG wasmtime_cranelift::compiler > FuncIndex(0) translated in 1.248188995s
 TRACE wasmtime_cranelift::compiler > FuncIndex(0) timing info
======== ========  ==================================
   Total     Self  Pass
-------- --------  ----------------------------------
   0.006    0.006  Translate WASM function
   0.624    0.007  Compilation passes
   0.003    0.003  Control flow graph
   0.003    0.003  Dominator tree
   0.000    0.000  Loop analysis
   0.000    0.000  Pre-legalization rewriting
   0.000    0.000  Dead code elimination
   0.001    0.001  Global value numbering
   0.002    0.000  Loop invariant code motion
   0.000    0.000  Remove unreachable blocks
   0.001    0.001  Remove constant phi-nodes
   0.005    0.005  VCode lowering
   0.000    0.000  VCode post-register allocation finalization
   0.270    0.270  VCode emission
   0.000    0.000  VCode emission finalization
   0.333    0.333  Register allocation
   0.000    0.000  Binary machine code emission
======== ========  ==================================

Something here seems accidentally quadratic perhaps? Whatever the reason this function probably shouldn't take upwards of a second to compile.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2021 at 21:57):

cfallin commented on issue #3441:

Ah, GitHub interpreted my "fixes ..." over on the regalloc.rs side over-optimistically; this isn't actually fixed in Cranelift until we update to regalloc.rs 0.0.32, so I'll reopen this.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2021 at 21:57):

cfallin reopened issue #3441:

A fuzz-generated test recently found times out during compilation. This single-function module looks like:

(module
  (func
    loop end
    loop end
    loop end
    ;; ... ~20k more times
    loop end
    loop end))

Compiling this module yields:

 DEBUG wasmtime_cranelift::compiler > FuncIndex(0) translated in 1.248188995s
 TRACE wasmtime_cranelift::compiler > FuncIndex(0) timing info
======== ========  ==================================
   Total     Self  Pass
-------- --------  ----------------------------------
   0.006    0.006  Translate WASM function
   0.624    0.007  Compilation passes
   0.003    0.003  Control flow graph
   0.003    0.003  Dominator tree
   0.000    0.000  Loop analysis
   0.000    0.000  Pre-legalization rewriting
   0.000    0.000  Dead code elimination
   0.001    0.001  Global value numbering
   0.002    0.000  Loop invariant code motion
   0.000    0.000  Remove unreachable blocks
   0.001    0.001  Remove constant phi-nodes
   0.005    0.005  VCode lowering
   0.000    0.000  VCode post-register allocation finalization
   0.270    0.270  VCode emission
   0.000    0.000  VCode emission finalization
   0.333    0.333  Register allocation
   0.000    0.000  Binary machine code emission
======== ========  ==================================

Something here seems accidentally quadratic perhaps? Whatever the reason this function probably shouldn't take upwards of a second to compile.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 20 2021 at 23:07):

cfallin closed issue #3441:

A fuzz-generated test recently found times out during compilation. This single-function module looks like:

(module
  (func
    loop end
    loop end
    loop end
    ;; ... ~20k more times
    loop end
    loop end))

Compiling this module yields:

 DEBUG wasmtime_cranelift::compiler > FuncIndex(0) translated in 1.248188995s
 TRACE wasmtime_cranelift::compiler > FuncIndex(0) timing info
======== ========  ==================================
   Total     Self  Pass
-------- --------  ----------------------------------
   0.006    0.006  Translate WASM function
   0.624    0.007  Compilation passes
   0.003    0.003  Control flow graph
   0.003    0.003  Dominator tree
   0.000    0.000  Loop analysis
   0.000    0.000  Pre-legalization rewriting
   0.000    0.000  Dead code elimination
   0.001    0.001  Global value numbering
   0.002    0.000  Loop invariant code motion
   0.000    0.000  Remove unreachable blocks
   0.001    0.001  Remove constant phi-nodes
   0.005    0.005  VCode lowering
   0.000    0.000  VCode post-register allocation finalization
   0.270    0.270  VCode emission
   0.000    0.000  VCode emission finalization
   0.333    0.333  Register allocation
   0.000    0.000  Binary machine code emission
======== ========  ==================================

Something here seems accidentally quadratic perhaps? Whatever the reason this function probably shouldn't take upwards of a second to compile.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2021 at 14:11):

alexcrichton commented on issue #3441:

Thanks @adamrk for the fix here!

@cfallin I was also slightly worried on this that vcode emission is a pretty big time sink of 600ms in these timings compared to 13ms creation of the clif, do you think that's also worth looking into and/or opening a separate issue for?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2021 at 18:47):

cfallin commented on issue #3441:

@alexcrichton I spent some time profiling this and it appears that there is indeed some suboptimal behavior in the branch simplification; specifically all the empty loops cause thousands of labels to alias and this wreaks havoc on things. I'll create an issue and see if I can add a limiter of some sort.


Last updated: Nov 22 2024 at 17:03 UTC