Stream: wasmtime

Topic: Slow module loading for 40MiB program


view this post on Zulip Andrew Werner (Feb 26 2025 at 20:00):

I compiled apache datafusion as a dependency of a wasm component just a few minutes ago, and that went fine, but then loading it in wasmtime takes ~22s! This is with all the things optimized like a --release build. I've attached an image of a profile. Is this expected? Are there ways I should be holding things differently that might help here?

image.png

view this post on Zulip Alex Crichton (Feb 26 2025 at 20:02):

To me that's roughly what I would expect in the breakdown of compiling a big program. Not to say there aren't low-hanging fruit to optimize though! Would you be able to share the wasm module in question?

For your own development, you might want to try Winch as a compiler (e.g. --compiler winch on the CLI) which should have much speedier compile times.

Also for improving this, often times historically I've seen that 99% of the compile time is one gargantuan function and by shrinking that function in the guest language itself you can often improve compile time

view this post on Zulip Andrew Werner (Feb 26 2025 at 20:05):

Thanks for the quick reply! Happy to share a wasm module, give me a few minutes. Will also try using twiggy to see what's dominating. Any other tools or advice to find that?

view this post on Zulip Piotr Bejda (Feb 26 2025 at 20:35):

Here is the module:
wasm_playground_module.wasm

view this post on Zulip Piotr Bejda (Feb 26 2025 at 20:47):

Winch does bring down loading the module down to 3.4s

view this post on Zulip Piotr Bejda (Feb 26 2025 at 20:54):

The biggest function size is 0.14% per twiggy which seems not a lot; is there a better measure for finding function that is most complicated and time consuming for compiler?

view this post on Zulip Piotr Bejda (Feb 26 2025 at 20:56):

actually, it is big, 164kb

view this post on Zulip Alex Crichton (Feb 26 2025 at 21:05):

One thing you can try is WASMTIME_LOG=wasmtime_cranelift::compiler=debug wasmtime compile foo.wasm and that'll print out things like:

2025-02-26T21:04:38.580142Z DEBUG wasmtime_cranelift::compiler: FuncIndex(8519) translated in 1.525569591s
2025-02-26T21:04:38.581608Z DEBUG wasmtime_cranelift::compiler: FuncIndex(8520) translated in 1.040909ms
...

which you can use to find functions that take a particularly long time to compile

view this post on Zulip Alex Crichton (Feb 26 2025 at 21:05):

long ones typically end up getting printed at the end

view this post on Zulip Piotr Bejda (Feb 26 2025 at 21:22):

nothing stands out that much, slowest function is 90ms, there are 9k functions over 1ms

view this post on Zulip Alex Crichton (Feb 26 2025 at 21:23):

yeah I was gonna say I just finished downloading and nothing takes seconds on my machine, it's all ms-or-less

view this post on Zulip Alex Crichton (Feb 26 2025 at 21:24):

this is a pretty big module with 62k functions, and it's probably just "we can probably apply elbow grease to make things faster"

view this post on Zulip Chris Fallin (Feb 26 2025 at 21:27):

FWIW @Piotr Bejda , the biggest parts of that compilation time, regalloc2 and the mid-end (egraph optimizer) are both things I wrote and spent months of time squeezing performance out of; there is probably not much low-hanging fruit left. Better results will come from taking different approaches: Winch as mentioned (baseline compiler, no regalloc or optimization at all), or perhaps a different register allocator (though the one that we're considering, regalloc3, goes in the other direction with slower compilation for a little more runtime perf)

view this post on Zulip Chris Fallin (Feb 26 2025 at 21:28):

to set expectations, our compilation time is about 10x faster than LLVM, so from one perspective this is "very fast" already (for an optimizing compiler); it's just the nature of Wasm that one needs to have a compilation step when loading the module unfortunately, so if the baseline expectation is "just loading a program", it will feel pretty slow

view this post on Zulip Joel Dice (Feb 26 2025 at 21:38):

@Piotr Bejda @Andrew Werner In case you aren't already aware: you can enable cwasm caching via this config setting which will avoid recompiling a wasm file if it hasn't changed since the last time it was run.

view this post on Zulip Piotr Bejda (Feb 26 2025 at 21:41):

makes sense, thank you all very much for sharing expertise! we might be able to use more bare parts of the datafusion, seems like higher abstractions are quite bloated compared to what we need

view this post on Zulip Alex Crichton (Feb 26 2025 at 21:41):

it's probably just "we can probably apply elbow grease to make things faster"

Sorry I should clarify here -- I'm sure we have inefficiencies in schlepping around a lot of functions from cranelift to wasmtime and getting it all into a *.cwasm image. This is not a large part of your profile, though, and even if we were to optimize it may only help 1-2% (as a guess, usnure as to exact percentages)

IIRC I think I saw a fuzz timeout of 1k empty functions awhile back so I do think there's stuff in that scalable area we have yet to improve. Improving regalloc/optimization though as Chris mentions is a much, much taller order

view this post on Zulip Alex Crichton (Feb 26 2025 at 21:42):

In terms of compile time another thing to mention is the incremental compilation support we have. It's not integrated into the CLI at this time but if you're using a rust embedding you might find it useful as it can improve compile times when only small portions of the module have changed

view this post on Zulip Piotr Bejda (Feb 26 2025 at 21:44):

Joel Dice said:

Piotr Bejda Andrew Werner In case you aren't already aware: you can enable cwasm caching via this config setting which will avoid recompiling a wasm file if it hasn't changed since the last time it was run.

yeah, we don't cache the loaded module, we will, but there are some scenarios where we want a process to start executing wasm fast from the get-go too (where that process should not completely trust the provided wasm binary)

view this post on Zulip Joel Dice (Feb 26 2025 at 21:44):

Makes sense. I think that scenario is the reason Winch exists.


Last updated: Feb 27 2025 at 23:03 UTC