Stream: git-wasmtime

Topic: wasmtime / issue #3469 Avoid quadratic behavior in pathol...


view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2021 at 20:35):

cfallin commented on issue #3469:

Out of further curiosity, even 70ms for a function like this seems somewhat high, is that still due to MachBuffer things or is it general "too much elbow grease is needed to bring that down further"

I think it is mostly in the middle-end (analyses and optimizations), which will see the huge CFG with all the loops before it's reduced. The backend stages that are specifically broken out in the clif-util wasm -T output show: 3ms in CLIF -> VCode lowering; 4ms in regalloc; and 4ms in binary emission (MachBuffer + cpu-specific instruction encoding code). So only 10ms in the "backend" and the rest in attempted optimization.

A perf profile of the compilation shows a lot of time in the kernel's pagefault path, so I think that just writing out the data structures has some overhead (for the large function body). I imagine we could probably be smarter about early optimizations that cut down the amount of work the later stages have to do; but nothing immediately obviously or anomalously bad is happening here, I think.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2021 at 20:36):

cfallin edited a comment on issue #3469:

Out of further curiosity, even 70ms for a function like this seems somewhat high, is that still due to MachBuffer things or is it general "too much elbow grease is needed to bring that down further"

I think it is mostly in the middle-end (analyses and optimizations), which will see the huge CFG with all the loops before it's reduced. The backend stages that are specifically broken out in the clif-util wasm -T output show: 3ms in CLIF -> VCode lowering; 4ms in regalloc; and 4ms in binary emission (MachBuffer + cpu-specific instruction encoding code). So only 11ms (EDIT: I can add I promise) in the "backend" and the rest in attempted optimization.

A perf profile of the compilation shows a lot of time in the kernel's pagefault path, so I think that just writing out the data structures has some overhead (for the large function body). I imagine we could probably be smarter about early optimizations that cut down the amount of work the later stages have to do; but nothing immediately obviously or anomalously bad is happening here, I think.


Last updated: Nov 22 2024 at 17:03 UTC