Stream: git-wasmtime

Topic: wasmtime / issue #3523 Huge increase in wasmtime compilation


view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2021 at 19:16):

girazoki opened issue #3523:

We have implemented this repo https://github.com/girazoki/wasmtime-try to try something that has been happening us with some latest code change we have done at moonbeam. The code change is https://github.com/PureStake/moonbeam/pull/989.

Without the PR, wasmtime takes around 15 seconds to compile the wasm. However, if we include the change (removal of the trait) wasmtime takes less than a second to compile. We wonder how these differences can be so big with such a small code change.

Any help would be appreciated, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2021 at 19:17):

girazoki edited issue #3523:

We have implemented this repo https://github.com/girazoki/wasmtime-try to try something that has been happening us with some latest code change we have done at moonbeam. The code change is https://github.com/PureStake/moonbeam/pull/989.

Without the PR, wasmtime takes around 15 seconds to compile the wasm. However, if we include the change (removal of the trait) wasmtime takes less than a second to compile. We wonder how these differences can be so big with such a small code change.

Both runtimes are in https://github.com/girazoki/wasmtime-try to try them out
Any help would be appreciated, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2021 at 19:27):

alexcrichton commented on issue #3523:

Running the script from your repository this file doesn't exist, but when corrected to moonbase_fast_runtime.wasm both modules compile in ~16s for me locally.

Can you detail a bit more what you're doing, what's fast, and what's slow?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2021 at 19:27):

peterhuene commented on issue #3523:

The repro has a bug where it isn't loading the "fast" wasm file (named moonbase_fast_runtime.wasm instead of moonbase_runtime_fast.wasm)

With that fixed, I see:

Importing ./moonbase_runtime_slow.wasm...
imported in 16830 ms
Importing ./moonbase_fast_runtime.wasm...
imported in 16952 ms

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2021 at 19:28):

peterhuene edited a comment on issue #3523:

The repro has a bug where it isn't loading the "fast" wasm file (named moonbase_fast_runtime.wasm instead of moonbase_runtime_fast.wasm).

With that fixed, I see:

Importing ./moonbase_runtime_slow.wasm...
imported in 16830 ms
Importing ./moonbase_runtime_fast.wasm...
imported in 16952 ms

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2021 at 19:54):

girazoki commented on issue #3523:

Sorry it was my bad, updated fast. Now you should be able to see the difference, I updated the fast one.

The context is: We have the moonbase runtime https://github.com/PureStake/moonbeam/blob/master/runtime/moonbase/src/lib.rs that adds https://github.com/PureStake/moonbeam/blob/57fdb3f18fe0a48471bfcad2c37d89708a3a274f/runtime/moonbase/src/lib.rs#L1023. This is moonbase_slow

When we inserted this change, we realized an increase on the wasm compilation time. So we are trying to monitor the reason why this happens. For that reason why created this PR https://github.com/PureStake/moonbeam/pull/989, with the code commented, compiled it and measured its wasm compilation time. This is moonbase_fast

For me the first takes 16 secs, while the latter takes around 1. And I am not able to picture the reason of the difference

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2021 at 20:03):

notlesh commented on issue #3523:

FYI, some debugging I've done indicates that we get into one invocation of compile_function() (here) which takes the entire duration of the compilation (that is, if it takes 22 seconds to compile the entire binary, this one job takes 22 seconds itself).

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2021 at 21:05):

alexcrichton commented on issue #3523:

Thanks for the update, I can indeed reproduce the large discrepancy in compile times. What's happening here is that something pathological is being hit in Cranelift, and it's definitely something that we should fix!

Specifically in this wasm file the <xcm_executor::XcmExecutor<Config> as xcm::v2::traits::ExecuteXcm<<Config as xcm_executor::config::Config>::Call>>::execute_xcm_in_credit function takes 16s to compile. In that wasm file it's function index 1198 and the mangled name is _ZN155_$LT$xcm_executor..XcmExecutor$LT$Config$GT$$u20$as$u20$xcm..v2..traits..ExecuteXcm$LT$$LT$Config$u20$as$u20$xcm_executor..config..Config$GT$..Call$GT$$GT$21execute_xcm_in_credit17hf41c2b7a319ac14dE.

Extracting just that one function to a wasm module yields extract.wasm.gz and this can be reproduced locally with wamtime compile extract.wasm taking 16-17s locally. The breakdown I see is:

$ RUST_LOG=wasmtime_cranelift ./target/release/wasmtime compile extract.wasm
 DEBUG wasmtime_cranelift::compiler > FuncIndex(989) translated in 16.363414899s
 TRACE wasmtime_cranelift::compiler > FuncIndex(989) timing info
======== ========  ==================================
   Total     Self  Pass
-------- --------  ----------------------------------
   0.320    0.320  Translate WASM function
  16.039    0.025  Compilation passes
   0.001    0.001  Control flow graph
   0.002    0.002  Dominator tree
   0.000    0.000  Loop analysis
   0.001    0.001  Pre-legalization rewriting
   0.004    0.004  Dead code elimination
   0.012    0.012  Global value numbering
   0.036    0.027  Loop invariant code motion
   0.000    0.000  Remove unreachable blocks
   0.246    0.246  Remove constant phi-nodes
   0.188    0.188  VCode lowering
   0.022    0.022  VCode post-register allocation finalization
   0.026    0.026  VCode emission
   0.000    0.000  VCode emission finalization
  15.478   15.478  Register allocation
   0.004    0.004  Binary machine code emission
   0.008    0.008  Layout full renumbering
======== ========  ==================================

There are known pitfalls with our register allocator at this time and plans to move to a better register allocator. My guess is that this is likely to fall into that category of "probably will get fixed", but @cfallin do you think it's worthwhile digging in more here to figure out why something is triggering the slowdown here?

@girazoki and @notlesh in the meantime for your specific use case if you'd like to avoid this slowdown I'd recommend taking a look at <xcm_executor::XcmExecutor<Config> as xcm::v2::traits::ExecuteXcm<<Config as xcm_executor::config::Config>::Call>>::execute_xcm_in_credit and trying to simplify it a bit. It's a huge function in the wasm and probably has a ton of stuff #[inline]'d into it which greatly increases the size of the function. It may be the case that adding #[inline(never)] or otherwise removing #[inline] annotations in a few places could improve wasmtime compile times. Nevertheless we should still fix this in Wasmtime, but that's at least a possibility for fixing things in the near future if you're interested.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2021 at 21:11):

alexcrichton commented on issue #3523:

This is a slightly smaller minimization
with data strings and import strings all removed as well which makes navigating the *.wat a bit easier, and still takes ~16s to compile the one function within.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 15 2021 at 21:11):

alexcrichton edited a comment on issue #3523:

This is a slightly smaller minimization with data strings and import strings all removed as well which makes navigating the *.wat a bit easier, and still takes ~16s to compile the one function within.


Last updated: Nov 22 2024 at 16:03 UTC