wasmtime / issue #4923 Cranelift: Quadratic compile time ... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #4923 Cranelift: Quadratic compile time ...

Wasmtime GitHub notifications bot (Sep 19 2022 at 15:01):

adambratschikaye labeled issue #4923:

It seems that a wasm source with a chain of blocks each containing a local.get instruction can have quadratic compile time because can_optimize_var_lookup will traverse the entire chain when handling each local.get instruction. As an example, the following wat takes 2.4 seconds to compile on my machine:
(module
  (func (param i32)
    (block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
    (block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
    ... repeated 15,000 times
  )
)
and a flamegraph: ![flamegraph](https://user-images.githubusercontent.com/90606735/191045431-3e7626ef-7c29-4343-b784-4f9f7205d9d5.svg) shows that almost the entire time is spent in can_optimize_var_lookup. Note that the wasm file is 147 KB.

Steps to Reproduce

Download the attached blocks.wasm.gz and unzip it (or take the above wat and convert it to wasm).

Compile it with time target/release/wasmtime compile blocks.wasm.

Expected Results

I'd expect the compilation to be on the order of 100ms as it is for other modules with a single function of similar size on my machine.

Actual Results

The compilation takes over 2 seconds.

Versions and Environment

Cranelift version or commit: commit 27435ae398bcce74bf990b9683e5c47c3f6c5d51

Operating system: Ubuntu 20.04, Linux kernel 5.4.0-122-generic

Architecture:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       2
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7302 16-Core Processor
Extra Info

I'm working on a PR which should help with this.

Wasmtime GitHub notifications bot (Sep 19 2022 at 15:01):

adambratschikaye opened issue #4923:

It seems that a wasm source with a chain of blocks each containing a local.get instruction can have quadratic compile time because can_optimize_var_lookup will traverse the entire chain when handling each local.get instruction. As an example, the following wat takes 2.4 seconds to compile on my machine:
(module
  (func (param i32)
    (block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
    (block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
    ... repeated 15,000 times
  )
)
and a flamegraph: ![flamegraph](https://user-images.githubusercontent.com/90606735/191045431-3e7626ef-7c29-4343-b784-4f9f7205d9d5.svg) shows that almost the entire time is spent in can_optimize_var_lookup. Note that the wasm file is 147 KB.

Steps to Reproduce

Download the attached blocks.wasm.gz and unzip it (or take the above wat and convert it to wasm).

Compile it with time target/release/wasmtime compile blocks.wasm.

Expected Results

I'd expect the compilation to be on the order of 100ms as it is for other modules with a single function of similar size on my machine.

Actual Results

The compilation takes over 2 seconds.

Versions and Environment

Cranelift version or commit: commit 27435ae398bcce74bf990b9683e5c47c3f6c5d51

Operating system: Ubuntu 20.04, Linux kernel 5.4.0-122-generic

Architecture:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       2
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7302 16-Core Processor
Extra Info

I'm working on a PR which should help with this.

Wasmtime GitHub notifications bot (Sep 19 2022 at 15:01):

adambratschikaye labeled issue #4923:

It seems that a wasm source with a chain of blocks each containing a local.get instruction can have quadratic compile time because can_optimize_var_lookup will traverse the entire chain when handling each local.get instruction. As an example, the following wat takes 2.4 seconds to compile on my machine:
(module
  (func (param i32)
    (block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
    (block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
    ... repeated 15,000 times
  )
)
and a flamegraph: ![flamegraph](https://user-images.githubusercontent.com/90606735/191045431-3e7626ef-7c29-4343-b784-4f9f7205d9d5.svg) shows that almost the entire time is spent in can_optimize_var_lookup. Note that the wasm file is 147 KB.

Steps to Reproduce

Download the attached blocks.wasm.gz and unzip it (or take the above wat and convert it to wasm).

Compile it with time target/release/wasmtime compile blocks.wasm.

Expected Results

I'd expect the compilation to be on the order of 100ms as it is for other modules with a single function of similar size on my machine.

Actual Results

The compilation takes over 2 seconds.

Versions and Environment

Cranelift version or commit: commit 27435ae398bcce74bf990b9683e5c47c3f6c5d51

Operating system: Ubuntu 20.04, Linux kernel 5.4.0-122-generic

Architecture:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       2
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7302 16-Core Processor
Extra Info

I'm working on a PR which should help with this.

Wasmtime GitHub notifications bot (Sep 20 2022 at 07:48):

adambratschikaye closed issue #4923:

It seems that a wasm source with a chain of blocks each containing a local.get instruction can have quadratic compile time because can_optimize_var_lookup will traverse the entire chain when handling each local.get instruction. As an example, the following wat takes 2.4 seconds to compile on my machine:
(module
  (func (param i32)
    (block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
    (block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
    ... repeated 15,000 times
  )
)
and a flamegraph: ![flamegraph](https://user-images.githubusercontent.com/90606735/191045431-3e7626ef-7c29-4343-b784-4f9f7205d9d5.svg) shows that almost the entire time is spent in can_optimize_var_lookup. Note that the wasm file is 147 KB.

Steps to Reproduce

Download the attached blocks.wasm.gz and unzip it (or take the above wat and convert it to wasm).

Compile it with time target/release/wasmtime compile blocks.wasm.

Expected Results

I'd expect the compilation to be on the order of 100ms as it is for other modules with a single function of similar size on my machine.

Actual Results

The compilation takes over 2 seconds.

Versions and Environment

Cranelift version or commit: commit 27435ae398bcce74bf990b9683e5c47c3f6c5d51

Operating system: Ubuntu 20.04, Linux kernel 5.4.0-122-generic

Architecture:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       2
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7302 16-Core Processor
Extra Info

I'm working on a PR which should help with this.

Wasmtime GitHub notifications bot (Sep 21 2022 at 18:16):

jameysharp commented on issue #4923:

My draft fix for this is in #4939.

Wasmtime GitHub notifications bot (Sep 21 2022 at 18:16):

jameysharp reopened issue #4923:

It seems that a wasm source with a chain of blocks each containing a local.get instruction can have quadratic compile time because can_optimize_var_lookup will traverse the entire chain when handling each local.get instruction. As an example, the following wat takes 2.4 seconds to compile on my machine:
(module
  (func (param i32)
    (block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
    (block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
    ... repeated 15,000 times
  )
)
and a flamegraph: ![flamegraph](https://user-images.githubusercontent.com/90606735/191045431-3e7626ef-7c29-4343-b784-4f9f7205d9d5.svg) shows that almost the entire time is spent in can_optimize_var_lookup. Note that the wasm file is 147 KB.

Steps to Reproduce

Download the attached blocks.wasm.gz and unzip it (or take the above wat and convert it to wasm).

Compile it with time target/release/wasmtime compile blocks.wasm.

Expected Results

I'd expect the compilation to be on the order of 100ms as it is for other modules with a single function of similar size on my machine.

Actual Results

The compilation takes over 2 seconds.

Versions and Environment

Cranelift version or commit: commit 27435ae398bcce74bf990b9683e5c47c3f6c5d51

Operating system: Ubuntu 20.04, Linux kernel 5.4.0-122-generic

Architecture:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       2
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7302 16-Core Processor
Extra Info

I'm working on a PR which should help with this.

Wasmtime GitHub notifications bot (Sep 23 2022 at 16:41):

jameysharp closed issue #4923:

It seems that a wasm source with a chain of blocks each containing a local.get instruction can have quadratic compile time because can_optimize_var_lookup will traverse the entire chain when handling each local.get instruction. As an example, the following wat takes 2.4 seconds to compile on my machine:
(module
  (func (param i32)
    (block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
    (block (local.set 0 (i32.add (i32.const 1) (local.get 0))))
    ... repeated 15,000 times
  )
)
and a flamegraph: ![flamegraph](https://user-images.githubusercontent.com/90606735/191045431-3e7626ef-7c29-4343-b784-4f9f7205d9d5.svg) shows that almost the entire time is spent in can_optimize_var_lookup. Note that the wasm file is 147 KB.

Steps to Reproduce

Download the attached blocks.wasm.gz and unzip it (or take the above wat and convert it to wasm).

Compile it with time target/release/wasmtime compile blocks.wasm.

Expected Results

I'd expect the compilation to be on the order of 100ms as it is for other modules with a single function of similar size on my machine.

Actual Results

The compilation takes over 2 seconds.

Versions and Environment

Cranelift version or commit: commit 27435ae398bcce74bf990b9683e5c47c3f6c5d51

Operating system: Ubuntu 20.04, Linux kernel 5.4.0-122-generic

Architecture:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          64
On-line CPU(s) list:             0-63
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       2
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7302 16-Core Processor
Extra Info

I'm working on a PR which should help with this.

Last updated: Apr 18 2025 at 07:03 UTC