I'm developing a Rust host application that uses Wasmtime to load and execute WebAssembly components.
Here's my setup:
cargo component
.wit
file defining its exported interface I'm trying to integrate DataFusion into one of my WASM components following examples from:
The issue occurs when I add the DataFusion context in the code for the WASM module (it complies to WASM without errors):
rust let ctx = SessionContext::new();
When loading this specific module, the host gets stuck at:
rust println!("extension: {:?}", extension); // Load Extension from the .wasm file let component = Component::from_file(&engine, extension.unwrap()).map_err(|e| { println!("Error while loading component {:?}", e); e })?;
The only output visible in the console is from the first println!
showing the extension value.
Questions:
Component::from_file
? Additional Context: - This loading method works successfully with multiple other WASM modules - The issue only occurs with the DataFusion integration
How large is the component, and how long are you waiting? If you run ‘top’ while it’s hanging what does that say - hopefully a few threads busy in wasmtime? Loading the component is going to generate native code for all of the functions inside, which could take a while.
An alternative is that you do the code generation AOT with ‘wasmtime compile’ and then deserialize the compiled component from file. Deserializing should be very fast
Thanks! The .wasm file is 46 MB . I left it to run for longer and it worked. It took about 3 mins for it to run through. I'll look into AOT and also see if it is possible to make .wasm file smaller, the other files I have are between 100-300 kB.
one thing to note is that initializing components in debug builds takes a LONG time, if you were to buold your host in release mode it would feel quite fast even with such a big component (I know 45MB+ is currently the "normal" size of a python component for example)
Thanks for the tip! Building the host with the release flag significantly improved the load time from ~300s to 23s. However similar projects like https://waynexia.github.io/datafusion-playground/ and https://parquet-viewer.xiangpeng.systems/ loads much faster. Could the fact that they use wasm-bindgen explain the performance difference?
What you're running into is the fact that Wasmtime needs to convert WebAssembly to native code. This is done with the Cranelift compiler that Wasmtime uses. This process takes time and is Rust code that runs when you load the wasm module. Unoptimized Rust code is significantly slower than optimized Rust code, hence why a --release
build is so much faster.
If you're comparing against the web then that's comparing against different systems. Web browsers are optimized for time-to-execution for WebAssembly and Javascript and use many tricks for doing this. For example web browsers will compile your code as it's being downloaded so by the time the download is finished everything is ready to go. Web browsers also use a "baseline compiler" for wasm which is significantly faster than the final tier of compilation (which Cranelift is more akin to).
Wasmtime supports baseline compilation through the "Winch" compiler which generates code much speedier than Cranelift. The tradeoff is that the generated code performs much worse.
When you're working with out-of-browser wasm these are concerns that you'll have to balance in your embedding. There's not necessarily a one-size-fits-all solution for you such as "just point a web browser at it" at this time.
Thanks! Winch reduced the processing time to 8 seconds. I'll also look into AOT. Additionally, I think there are potential optimizations I can do in Datafusion.
Last updated: Dec 23 2024 at 13:07 UTC