Stream: wasmtime

Topic: Issue Loading WASM Component with DataFusion Integration


view this post on Zulip Utilize3214 (Dec 14 2024 at 13:50):

I'm developing a Rust host application that uses Wasmtime to load and execute WebAssembly components.

Here's my setup:

I'm trying to integrate DataFusion into one of my WASM components following examples from:

  1. DataFusion GitHub Issue #177
  2. DataFusion WASM Bindings

The issue occurs when I add the DataFusion context in the code for the WASM module (it complies to WASM without errors):

rust let ctx = SessionContext::new();

When loading this specific module, the host gets stuck at:

rust println!("extension: {:?}", extension); // Load Extension from the .wasm file let component = Component::from_file(&engine, extension.unwrap()).map_err(|e| { println!("Error while loading component {:?}", e); e })?;

The only output visible in the console is from the first println! showing the extension value.

Questions:

  1. Is there a way to get more detailed error information from Component::from_file?
  2. Do I need to modify how I load the WASM module when using DataFusion?
  3. Are there any specific considerations when using DataFusion in a WASM component?

Additional Context: - This loading method works successfully with multiple other WASM modules - The issue only occurs with the DataFusion integration

view this post on Zulip Pat Hickey (Dec 14 2024 at 17:06):

How large is the component, and how long are you waiting? If you run ‘top’ while it’s hanging what does that say - hopefully a few threads busy in wasmtime? Loading the component is going to generate native code for all of the functions inside, which could take a while.

view this post on Zulip Pat Hickey (Dec 14 2024 at 17:07):

An alternative is that you do the code generation AOT with ‘wasmtime compile’ and then deserialize the compiled component from file. Deserializing should be very fast

view this post on Zulip Utilize3214 (Dec 15 2024 at 20:39):

Thanks! The .wasm file is 46 MB . I left it to run for longer and it worked. It took about 3 mins for it to run through. I'll look into AOT and also see if it is possible to make .wasm file smaller, the other files I have are between 100-300 kB.

view this post on Zulip Ramon Klass (Dec 15 2024 at 21:46):

one thing to note is that initializing components in debug builds takes a LONG time, if you were to buold your host in release mode it would feel quite fast even with such a big component (I know 45MB+ is currently the "normal" size of a python component for example)

view this post on Zulip Utilize3214 (Dec 16 2024 at 20:17):

Thanks for the tip! Building the host with the release flag significantly improved the load time from ~300s to 23s. However similar projects like https://waynexia.github.io/datafusion-playground/ and https://parquet-viewer.xiangpeng.systems/ loads much faster. Could the fact that they use wasm-bindgen explain the performance difference?

view this post on Zulip Alex Crichton (Dec 16 2024 at 20:23):

What you're running into is the fact that Wasmtime needs to convert WebAssembly to native code. This is done with the Cranelift compiler that Wasmtime uses. This process takes time and is Rust code that runs when you load the wasm module. Unoptimized Rust code is significantly slower than optimized Rust code, hence why a --release build is so much faster.

If you're comparing against the web then that's comparing against different systems. Web browsers are optimized for time-to-execution for WebAssembly and Javascript and use many tricks for doing this. For example web browsers will compile your code as it's being downloaded so by the time the download is finished everything is ready to go. Web browsers also use a "baseline compiler" for wasm which is significantly faster than the final tier of compilation (which Cranelift is more akin to).

Wasmtime supports baseline compilation through the "Winch" compiler which generates code much speedier than Cranelift. The tradeoff is that the generated code performs much worse.

When you're working with out-of-browser wasm these are concerns that you'll have to balance in your embedding. There's not necessarily a one-size-fits-all solution for you such as "just point a web browser at it" at this time.

view this post on Zulip Utilize3214 (Dec 18 2024 at 06:49):

Thanks! Winch reduced the processing time to 8 seconds. I'll also look into AOT. Additionally, I think there are potential optimizations I can do in Datafusion.


Last updated: Dec 23 2024 at 13:07 UTC