I compiled apache datafusion as a dependency of a wasm component just a few minutes ago, and that went fine, but then loading it in wasmtime takes ~22s! This is with all the things optimized like a --release build. I've attached an image of a profile. Is this expected? Are there ways I should be holding things differently that might help here?
To me that's roughly what I would expect in the breakdown of compiling a big program. Not to say there aren't low-hanging fruit to optimize though! Would you be able to share the wasm module in question?
For your own development, you might want to try Winch as a compiler (e.g. --compiler winch
on the CLI) which should have much speedier compile times.
Also for improving this, often times historically I've seen that 99% of the compile time is one gargantuan function and by shrinking that function in the guest language itself you can often improve compile time
Thanks for the quick reply! Happy to share a wasm module, give me a few minutes. Will also try using twiggy to see what's dominating. Any other tools or advice to find that?
Here is the module:
wasm_playground_module.wasm
Winch does bring down loading the module down to 3.4s
The biggest function size is 0.14% per twiggy which seems not a lot; is there a better measure for finding function that is most complicated and time consuming for compiler?
actually, it is big, 164kb
One thing you can try is WASMTIME_LOG=wasmtime_cranelift::compiler=debug wasmtime compile foo.wasm
and that'll print out things like:
2025-02-26T21:04:38.580142Z DEBUG wasmtime_cranelift::compiler: FuncIndex(8519) translated in 1.525569591s
2025-02-26T21:04:38.581608Z DEBUG wasmtime_cranelift::compiler: FuncIndex(8520) translated in 1.040909ms
...
which you can use to find functions that take a particularly long time to compile
long ones typically end up getting printed at the end
nothing stands out that much, slowest function is 90ms, there are 9k functions over 1ms
yeah I was gonna say I just finished downloading and nothing takes seconds on my machine, it's all ms-or-less
this is a pretty big module with 62k functions, and it's probably just "we can probably apply elbow grease to make things faster"
FWIW @Piotr Bejda , the biggest parts of that compilation time, regalloc2 and the mid-end (egraph optimizer) are both things I wrote and spent months of time squeezing performance out of; there is probably not much low-hanging fruit left. Better results will come from taking different approaches: Winch as mentioned (baseline compiler, no regalloc or optimization at all), or perhaps a different register allocator (though the one that we're considering, regalloc3, goes in the other direction with slower compilation for a little more runtime perf)
to set expectations, our compilation time is about 10x faster than LLVM, so from one perspective this is "very fast" already (for an optimizing compiler); it's just the nature of Wasm that one needs to have a compilation step when loading the module unfortunately, so if the baseline expectation is "just loading a program", it will feel pretty slow
@Piotr Bejda @Andrew Werner In case you aren't already aware: you can enable cwasm caching via this config setting which will avoid recompiling a wasm file if it hasn't changed since the last time it was run.
makes sense, thank you all very much for sharing expertise! we might be able to use more bare parts of the datafusion, seems like higher abstractions are quite bloated compared to what we need
it's probably just "we can probably apply elbow grease to make things faster"
Sorry I should clarify here -- I'm sure we have inefficiencies in schlepping around a lot of functions from cranelift to wasmtime and getting it all into a *.cwasm
image. This is not a large part of your profile, though, and even if we were to optimize it may only help 1-2% (as a guess, usnure as to exact percentages)
IIRC I think I saw a fuzz timeout of 1k empty functions awhile back so I do think there's stuff in that scalable area we have yet to improve. Improving regalloc/optimization though as Chris mentions is a much, much taller order
In terms of compile time another thing to mention is the incremental compilation support we have. It's not integrated into the CLI at this time but if you're using a rust embedding you might find it useful as it can improve compile times when only small portions of the module have changed
Joel Dice said:
Piotr Bejda Andrew Werner In case you aren't already aware: you can enable cwasm caching via this config setting which will avoid recompiling a wasm file if it hasn't changed since the last time it was run.
yeah, we don't cache the loaded module, we will, but there are some scenarios where we want a process to start executing wasm fast from the get-go too (where that process should not completely trust the provided wasm binary)
Makes sense. I think that scenario is the reason Winch exists.
Last updated: Feb 27 2025 at 23:03 UTC