Stream: git-wasmtime

Topic: wasmtime / issue #4291 Large compile-time-memory-usage re...


view this post on Zulip Wasmtime GitHub notifications bot (Jun 21 2022 at 16:29):

alexcrichton labeled issue #4291:

This WebAssembly file which is reduced to a single function from this issue complies like this on main:

$ /usr/bin/time -v ./target/release/wasmtime compile extract.wasm
...
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.58
...
        Maximum resident set size (kbytes): 6565472
...
        Exit status: 0

when compared to wasmtime 0.36.0 which is pre-regalloc2, however, this yields:

$ /usr/bin/time -v ./wasmtime-v0.36.0-aarch64-linux/wasmtime compile ./extract.wasm
...
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.86
...
        Maximum resident set size (kbytes): 215264
...
        Exit status: 0

I think this means that what previously took ~200M to compile is now taking upwards of 6.5G.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 21 2022 at 16:29):

alexcrichton opened issue #4291:

This WebAssembly file which is reduced to a single function from this issue complies like this on main:

$ /usr/bin/time -v ./target/release/wasmtime compile extract.wasm
...
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.58
...
        Maximum resident set size (kbytes): 6565472
...
        Exit status: 0

when compared to wasmtime 0.36.0 which is pre-regalloc2, however, this yields:

$ /usr/bin/time -v ./wasmtime-v0.36.0-aarch64-linux/wasmtime compile ./extract.wasm
...
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.86
...
        Maximum resident set size (kbytes): 215264
...
        Exit status: 0

I think this means that what previously took ~200M to compile is now taking upwards of 6.5G.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 25 2022 at 21:25):

cfallin commented on issue #4291:

I did some investigation on this yesterday and today (not quite fulltime, I'm still under the weather a bit, but regalloc hacking is still the best way to pass the time...). I found three distinct things I could improve:

This moved the needle on compilation of the above significantly:

% perf stat ../wasmtime/target/release/wasmtime compile ~/testfile.wasm

 Performance counter stats for '../wasmtime/target/release/wasmtime compile /home/cfallin/testfile.wasm':

          4,206.10 msec task-clock                #    1.053 CPUs utilized
             4,340      context-switches          #    1.032 K/sec
               822      cpu-migrations            #  195.431 /sec
         1,163,585      page-faults               #  276.643 K/sec
    16,856,753,781      cycles                    #    4.008 GHz                      (83.36%)
     1,621,615,014      stalled-cycles-frontend   #    9.62% frontend cycles idle     (83.11%)
     3,111,090,359      stalled-cycles-backend    #   18.46% backend cycles idle      (83.35%)
    28,553,303,978      instructions              #    1.69  insn per cycle
                                                  #    0.11  stalled cycles per insn  (83.38%)
     6,475,239,780      branches                  #    1.539 G/sec                    (83.50%)
        16,905,250      branch-misses             #    0.26% of all branches          (83.33%)

       3.995578486 seconds time elapsed

       2.763566000 seconds user
       1.382605000 seconds sys

% perf stat target/release/wasmtime compile ~/testfile.wasm

 Performance counter stats for 'target/release/wasmtime compile /home/cfallin/testfile.wasm':

          1,006.23 msec task-clock                #    1.267 CPUs utilized
             3,825      context-switches          #    3.801 K/sec
               745      cpu-migrations            #  740.388 /sec
            46,823      page-faults               #   46.533 K/sec
     4,000,880,722      cycles                    #    3.976 GHz                      (83.93%)
       285,506,402      stalled-cycles-frontend   #    7.14% frontend cycles idle     (83.77%)
       302,458,733      stalled-cycles-backend    #    7.56% backend cycles idle      (82.24%)
     4,816,665,288      instructions              #    1.20  insn per cycle
                                                  #    0.06  stalled cycles per insn  (83.49%)
       869,534,746      branches                  #  864.151 M/sec                    (83.48%)
        11,265,004      branch-misses             #    1.30% of all branches          (83.27%)

       0.794473768 seconds time elapsed

       0.844001000 seconds user
       0.143025000 seconds sys

Or in other words, 4x faster compilation and 24x fewer page faults (~= 24x less anon memory used).

In comparison, Wasmtime v0.36 (pre-regalloc2) is:

% perf stat ~/Downloads/wasmtime-v0.36.0-x86_64-linux/wasmtime compile ~/testfile.wasm

 Performance counter stats for '/home/cfallin/Downloads/wasmtime-v0.36.0-x86_64-linux/wasmtime compile /home/cfallin/testfile.wasm':

            959.79 msec task-clock                #    1.233 CPUs utilized
             5,047      context-switches          #    5.258 K/sec
               697      cpu-migrations            #  726.199 /sec
            58,171      page-faults               #   60.608 K/sec
     3,792,924,189      cycles                    #    3.952 GHz                      (83.95%)
       234,549,074      stalled-cycles-frontend   #    6.18% frontend cycles idle     (82.94%)
       258,495,205      stalled-cycles-backend    #    6.82% backend cycles idle      (82.15%)
     5,110,076,091      instructions              #    1.35  insn per cycle
                                                  #    0.05  stalled cycles per insn  (83.41%)
     1,102,335,350      branches                  #    1.149 G/sec                    (83.58%)
        11,660,266      branch-misses             #    1.06% of all branches          (84.11%)

       0.778638937 seconds time elapsed

       0.772824000 seconds user
       0.166435000 seconds sys

So v0.36 is ever-so-slightly faster (by ~5%) but curiously the current main-with-fixes runs ~5% fewer instructions during compilation, just gets a lower IPC. Fewer pagefaults ( == less memory) in current as well. These numbers are close enough to "within noise" I'd want to measure more carefully before making strong claims here. I do feel comfortable saying "anomaly fixed and back to parity" though, given the above.

I suspect this may be the same issue we saw in #4045 as well but I haven't verified that.

I'll put up proper PRs next week, when I'm fully back; for now the branches are here (regalloc2) and here (Cranelift).

view this post on Zulip Wasmtime GitHub notifications bot (Jun 28 2022 at 16:02):

cfallin closed issue #4291:

This WebAssembly file which is reduced to a single function from this issue complies like this on main:

$ /usr/bin/time -v ./target/release/wasmtime compile extract.wasm
...
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.58
...
        Maximum resident set size (kbytes): 6565472
...
        Exit status: 0

when compared to wasmtime 0.36.0 which is pre-regalloc2, however, this yields:

$ /usr/bin/time -v ./wasmtime-v0.36.0-aarch64-linux/wasmtime compile ./extract.wasm
...
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.86
...
        Maximum resident set size (kbytes): 215264
...
        Exit status: 0

I think this means that what previously took ~200M to compile is now taking upwards of 6.5G.


Last updated: Nov 22 2024 at 16:03 UTC