I am using the standard embedded wasmtime host in component mode by using bindgen! macro and calling the generated guest resource trait methods and it works great. However, there is a hot function where this interface is forcing an additional copy (Rust Type -> WIT Type -> WASM memory) causing non-ideal performance. I think if I could go from Rust Type -> WASM memory directly I could shave off some time. I just need to do this for 1 or 2 methods. Is there a way to do this?
Not currently, no, there's no way to get core wasm functions from a component. Would you be able to share the shape of the performance problem you're running into? That could help guide perhaps an alternative architecture or possibly different APIs to add to wasmtime to make this faster
I created the same real world component in Rust native compiled mode and again as a WASM plugin (only difference is the WASM version uses RefCell to achieve mutability). I benchmarked the simple hot call that gets a slice of a WIT record as a param (which I convert from a slice of the non-WIT type). The native version is ~5ns per call and the WASM ~250ns per call on my M3 max. I haven't done any real profiling yet, but I suspect the handling of the slice conversion and copy into WASM memory is likely most of what is possible to optimize. The only optimization I did was swap out to a SmallVec from a Vec when converting to the WIT type. That shaved off 12% from skipping the allocation (~285ns --> ~250ns).
This is the function I call on the host side:
fn update_bar_and_calc(
&mut self,
data: &[indicator::DataUpdate],
new_bar: bool,
) -> Result<f64, StringError> {
// TODO: Copy slice directly into Wasm memory instead of an extra copy/type conversion
let data = data
.iter()
.map(|d| d.into())
.collect::<SmallVec<[DataUpdate; 4]>>();
let result = self
.component
.trading_indicator_guest()
.indicator()
.call_update_bar_and_calc(&mut self.store, self.resource, &data, new_bar);
Self::map_result(result, |value| value)
}
Guest side:
/// Update the current bar or start a new one
fn update_bar(&self, data: Vec<DataUpdate>, new_bar: bool) -> Result<(), String> {
if new_bar {
// Panic safety: data is same length as dependencies (length validated in 'new')
self.data.borrow_mut().push(data[0].data);
} else {
// Panic safety: data is same length as dependencies (length validated in 'new')
self.data.borrow_mut().update_last(data[0].data);
}
Ok(())
}
/// Update the current bar or start a new one and calculate the indicator value
fn update_bar_and_calc(&self, data: Vec<DataUpdate>, new_bar: bool) -> Result<f64, String> {
self.update_bar(data, new_bar)?;
Ok(self.data.borrow()[0])
}
(haven't forgotten about this it's just a busy week with the wasm CG meeting, I'll probably investigate deeper next week myself if no one else gets to it in the meantime)
Appreciated and no hurry. I've moved onto other parts of the project for now.
Ok looked more into this, and yeah unfortunately I'm not coming up with any sort of low hanging fruit of what to do here. Although if what you gist'd is what the guest is "doing for real" it seems a bit odd that a whole vec is passed in but only the first element is used? Is that intentional?
Otherwise though the main way to improve this that I can think of would be to push the DataUpdate type further upwards (e.g. don't require the .into() and natively store it with the wit-bindgen type). Either that or you could try implementing lift/lower directly for indicator::DataUpdate and work with that directly
It is intentional. This is a trait and implementers will have between 0 and n entries in the vec (where n typically won't exceed 4 or 5, so SmallVec works well as an optimization). This particular implementer knows the length will always be 1 for it, and that is guaranteed by my framework, so I don't check and just use the first slot.
That is always a conundrum for me with these frameworks/ORMs etc or anything w/ generated types. How far do you allow its types make it into the rest of your code? In this case, WASM plugins are an add on crate, so I hesitate to bring knowledge of WASM further into other crates, esp. given this isn't light stuff (Thinking more on this: maybe I could make a feature "wasm" and swap between the two impls depending on that. It will have to compile wasmtime regardless in that case).
One quick question to wrap this up. I can probably do this legwork myself but I figure you might know outright: Are there any guarantees on field ordering of generated code? (I assume it probably isn't repr C though...) Are the bitflags just regular bitflags crate generated? Wondering if I might be able to prove my data structure is same as the generated one and just transmute somehow vs copy. The only actual even possible difference is in the bitfields, but that is very probable to be identical as well (Thinking my idea at end of prev paragraph might be better though...).
Thanks again for your thoughts. I know you are busy.
For data layouts it's a bit tricky. The layout of structs and such is defined at https://github.com/WebAssembly/component-model/blob/main/design/mvp/CanonicalABI.md so in that sense it's fixed in stone. That being said though there's no guarantee that the bindings themselves are using exactly that layout. For example a record-with-a-string uses a String in Rust meaning that passing to the canonical ABI requires a conversion of the struct itself. Basically the generated bindings are going to a conversion from the raw data in-memory to the actual types in Rust.
This sort of reminds me though that it might be another source of slowdown. For example to remove some copies you'd need to:
Lift and Lower for indicator::DataUpdate (or something like that) to use the data at-rest as it is on the host.Vec<T> in the guest meaning that the guest receives a copy of the data and then copies it again.The guest dosen't do this copy for simple types like Vec<u8> but for aggregates it does perform a copy. That could be fixed, however, by improving the logic of wit-bindgen.
You've given me some reading and things to think about. Thanks again.
Last updated: Dec 06 2025 at 05:03 UTC