girazoki opened issue #3523:
We have implemented this repo https://github.com/girazoki/wasmtime-try to try something that has been happening us with some latest code change we have done at moonbeam. The code change is https://github.com/PureStake/moonbeam/pull/989.
Without the PR, wasmtime takes around 15 seconds to compile the wasm. However, if we include the change (removal of the trait) wasmtime takes less than a second to compile. We wonder how these differences can be so big with such a small code change.
Any help would be appreciated, thanks!
girazoki edited issue #3523:
We have implemented this repo https://github.com/girazoki/wasmtime-try to try something that has been happening us with some latest code change we have done at moonbeam. The code change is https://github.com/PureStake/moonbeam/pull/989.
Without the PR, wasmtime takes around 15 seconds to compile the wasm. However, if we include the change (removal of the trait) wasmtime takes less than a second to compile. We wonder how these differences can be so big with such a small code change.
Both runtimes are in https://github.com/girazoki/wasmtime-try to try them out
Any help would be appreciated, thanks!
alexcrichton commented on issue #3523:
Running the script from your repository this file doesn't exist, but when corrected to
moonbase_fast_runtime.wasm
both modules compile in ~16s for me locally.Can you detail a bit more what you're doing, what's fast, and what's slow?
peterhuene commented on issue #3523:
The repro has a bug where it isn't loading the "fast" wasm file (named
moonbase_fast_runtime.wasm
instead ofmoonbase_runtime_fast.wasm
)With that fixed, I see:
Importing ./moonbase_runtime_slow.wasm... imported in 16830 ms Importing ./moonbase_fast_runtime.wasm... imported in 16952 ms
peterhuene edited a comment on issue #3523:
The repro has a bug where it isn't loading the "fast" wasm file (named
moonbase_fast_runtime.wasm
instead ofmoonbase_runtime_fast.wasm
).With that fixed, I see:
Importing ./moonbase_runtime_slow.wasm... imported in 16830 ms Importing ./moonbase_runtime_fast.wasm... imported in 16952 ms
girazoki commented on issue #3523:
Sorry it was my bad, updated fast. Now you should be able to see the difference, I updated the fast one.
The context is: We have the moonbase runtime https://github.com/PureStake/moonbeam/blob/master/runtime/moonbase/src/lib.rs that adds https://github.com/PureStake/moonbeam/blob/57fdb3f18fe0a48471bfcad2c37d89708a3a274f/runtime/moonbase/src/lib.rs#L1023. This is moonbase_slow
When we inserted this change, we realized an increase on the wasm compilation time. So we are trying to monitor the reason why this happens. For that reason why created this PR https://github.com/PureStake/moonbeam/pull/989, with the code commented, compiled it and measured its wasm compilation time. This is moonbase_fast
For me the first takes 16 secs, while the latter takes around 1. And I am not able to picture the reason of the difference
notlesh commented on issue #3523:
FYI, some debugging I've done indicates that we get into one invocation of
compile_function()
(here) which takes the entire duration of the compilation (that is, if it takes 22 seconds to compile the entire binary, this one job takes 22 seconds itself).
alexcrichton commented on issue #3523:
Thanks for the update, I can indeed reproduce the large discrepancy in compile times. What's happening here is that something pathological is being hit in Cranelift, and it's definitely something that we should fix!
Specifically in this wasm file the
<xcm_executor::XcmExecutor<Config> as xcm::v2::traits::ExecuteXcm<<Config as xcm_executor::config::Config>::Call>>::execute_xcm_in_credit
function takes 16s to compile. In that wasm file it's function index 1198 and the mangled name is_ZN155_$LT$xcm_executor..XcmExecutor$LT$Config$GT$$u20$as$u20$xcm..v2..traits..ExecuteXcm$LT$$LT$Config$u20$as$u20$xcm_executor..config..Config$GT$..Call$GT$$GT$21execute_xcm_in_credit17hf41c2b7a319ac14dE
.Extracting just that one function to a wasm module yields extract.wasm.gz and this can be reproduced locally with
wamtime compile extract.wasm
taking 16-17s locally. The breakdown I see is:$ RUST_LOG=wasmtime_cranelift ./target/release/wasmtime compile extract.wasm DEBUG wasmtime_cranelift::compiler > FuncIndex(989) translated in 16.363414899s TRACE wasmtime_cranelift::compiler > FuncIndex(989) timing info ======== ======== ================================== Total Self Pass -------- -------- ---------------------------------- 0.320 0.320 Translate WASM function 16.039 0.025 Compilation passes 0.001 0.001 Control flow graph 0.002 0.002 Dominator tree 0.000 0.000 Loop analysis 0.001 0.001 Pre-legalization rewriting 0.004 0.004 Dead code elimination 0.012 0.012 Global value numbering 0.036 0.027 Loop invariant code motion 0.000 0.000 Remove unreachable blocks 0.246 0.246 Remove constant phi-nodes 0.188 0.188 VCode lowering 0.022 0.022 VCode post-register allocation finalization 0.026 0.026 VCode emission 0.000 0.000 VCode emission finalization 15.478 15.478 Register allocation 0.004 0.004 Binary machine code emission 0.008 0.008 Layout full renumbering ======== ======== ==================================
There are known pitfalls with our register allocator at this time and plans to move to a better register allocator. My guess is that this is likely to fall into that category of "probably will get fixed", but @cfallin do you think it's worthwhile digging in more here to figure out why something is triggering the slowdown here?
@girazoki and @notlesh in the meantime for your specific use case if you'd like to avoid this slowdown I'd recommend taking a look at
<xcm_executor::XcmExecutor<Config> as xcm::v2::traits::ExecuteXcm<<Config as xcm_executor::config::Config>::Call>>::execute_xcm_in_credit
and trying to simplify it a bit. It's a huge function in the wasm and probably has a ton of stuff#[inline]
'd into it which greatly increases the size of the function. It may be the case that adding#[inline(never)]
or otherwise removing#[inline]
annotations in a few places could improve wasmtime compile times. Nevertheless we should still fix this in Wasmtime, but that's at least a possibility for fixing things in the near future if you're interested.
alexcrichton commented on issue #3523:
This is a slightly smaller minimization
with data strings and import strings all removed as well which makes navigating the*.wat
a bit easier, and still takes ~16s to compile the one function within.
alexcrichton edited a comment on issue #3523:
This is a slightly smaller minimization with data strings and import strings all removed as well which makes navigating the
*.wat
a bit easier, and still takes ~16s to compile the one function within.
Last updated: Jan 24 2025 at 00:11 UTC