Hi, is it possible to run multiple functions on a wasmtime Instance concurrently or do I need to call each function sequentially?
I'm guessing it might not work due to the memory model. But lets say I want to run a type of wasm server, if I have to create a separate instance for each invocation wouldn't that carry quite a big memory overhead?
Buster Styren said:
Hi, is it possible to run multiple functions on a wasmtime Instance concurrently or do I need to call each function sequentially?
For a single instance, calls must be sequential. There is ongoing work to add threading to wasmtime but afaik it isn't generally usable yet.
if I have to create a separate instance for each invocation wouldn't that carry quite a big memory overhead
Not necessarily, though it depends a bit on the host OS. Linux and macOS can use copy-on-write memory initialization which - depending on the module - can give you some of the same memory behavior as multithreading.
Note that the threading support is good enough to use in Wasmtime today for testing out, but it requires cooperation with the wasm module and module's can't transparently be multi-threaded. Most modules aren't in multi-threaded mode (e.g. support in Rust is unstable) so that's probably not viable.
Otherwise though to add to what Lann already mentioned the overhead for a new instance is relatively small with copy-on-write and it largely depends on the guest. If the guest only needs 64k of memory then that's the rough overhead, but larger guests may require more memory. In that sense the overhead may depend on whether you control the wasm module or it's given as an arbitrary module to you
Alright, that answers all my questions. Thank you both!
Second a
Caller
can be used as the name implies, learning about the caller’s context, namely it’s exported memory and exported functions. This allows functions which take pointers as arguments to easily read the memory the pointers point into, or if a function is expected to call malloc in the wasm module to reserve space for the output you can do that.
When a host exported function is called from inside wasm, the caller is accessible. Using caller.get_export()
I can access Option<Extern>
and get Func
out of it, which is a malloc
equivalent exported by wasm. How do I call this malloc, I do not have access to &mut store
? Would that not be concurrent execution, which is not allowed.
How do I allocate space for say an array and return the array back to wasm function?
So I double checked where I got my information from: wasmtime::Memory
-> Memory And Safety:
This includes getting access to the T on Store<T>, _but it also means that you can’t recursively call into WebAssembly for instance_.
But caller
is proxy for store
, so this works:
fn main() -> anyhow::Result<()> {
let engine = wasmtime::Engine::default();
let module = wasmtime::Module::from_file(
&engine,
"../guest/target/wasm32-unknown-unknown/debug/guest.wasm",
)?;
let mut store = wasmtime::Store::new(&engine, ());
let from_host = wasmtime::Func::wrap(&mut store, |mut caller: wasmtime::Caller<'_, ()>| {
println!("called from wasm");
let mut o = wasmtime::Val::I32(0);
let mut v = vec![o];
caller.get_export("foo").expect("foo not found").into_func().expect("foo not a func").call(&mut caller, &[], &mut v).expect("call failed");
dbg!(v);
});
let from_host2 = wasmtime::Func::wrap(&mut store, |mut _caller: wasmtime::Caller<'_, ()>| {
println!("called from wasm2");
10
});
let instance = wasmtime::Instance::new(&mut store, &module, &[from_host.into(), from_host2.into()])?;
let sum = instance.get_typed_func::<(i32,), i32>(&mut store, "sum")?;
println!("wasm said: {}", sum.call(&mut store, (1,))?);
Ok(())
}
Can we conclude the wasmtime::Memory
is inaccurate? Even in this thread @Lann Martin says "For a single instance, calls must be sequential. ".
What am I missing?
It makes sense to me that the code you posted should work. I'm not sure I understand what "can’t recursively call into WebAssembly" in the docs means exactly.
For a single instance, calls must be sequential.
This was just in response to whether calls can be concurrent in a single instance, which they can't (module the wasm threading proposal); you could say that Wasm supports only a single call stack at a time.
With the Wasmtime embedding API if you can do it without unsafe
then it's safe to do it. The documentation on Memory
is about code that may be tempted to use unsafe
. You're not using unsafe
here so you're good.
So, I just looked into copy-on-write and I'm unsure how I can get that to work properly in a multi-threaded context. Even if it's cheap to create multiple Instance I still need to supply a mutable reference to the Store in order to add host functions during linking, so I can't "prepare" the whole module and instantiate them (cheaply) at will, I guess?
Which means that I can't readily instantiate Instances from the linked module without doing the actual linking with a separate store, which I guess will not make it possible to copy-on-write any significant part of the mem allocation? Or is it possible to copy or clone the store for each new invocation?
The fastest way to instantiate is by first creating an InstancePre<T>
and then repeatedly using that to instantiate. Each instantiation requires a unique Store<T>
but those should be cheap to create and destroy. The copy-on-write optimizations have to do with linear memory initialization and is a per-module property that the embedder doesn't actively need to enable as it will happen automatically.
Ah beautiful, now it all makes sense. Thank you for being so helpful.
@Buster Styren You may be interested in some performance testing I did using various execution strategies in Wasmtime, including pre-compilation, pre-instantiation, allocation pooling, etc.: https://github.com/dicej/wasmtime-serverless-performance
"how I got wasmtime to be 30x slower than fork
" hehe a great comparison!
I had no idea that a fresh instance with pooling plus reusing an instance was only ~2x, that's actually much lower than I might have expected
One interesting axis in case anyone's curious for benchmarks like this to explore is what happens at higher parallelism too. For example have N background threads all doing the same work as the main thread and time how long it takes the main thread to do its iteration (we have this in Wasmtime's instantiate
benchmarks). Strategies such as CoW and pooling show pretty big effects at >2 concurrency when IPIs and the kernel come into play more
This is a year old thread, but I think it makes sense to reconsider its answer now that no-std starts to be an option. In embedded context, there is not necessarily copy-on-write capabilities and creating a new store and instance for each thread might be too costly. Is my understanding correct that this is a limitation of wasmtime and that the wasm specification theoretically permits concurrent usage of a single store? (i.e. running concurrent threads within the same store) Is this a capability that wasmtime might be considering to add in the future? (in particular once no-std is supported) Thanks! (I can create a separate discussion thread or open an issue on GitHub if that's preferred)
Wasmtime won't support concurrent execution within a single store, even with no_std support. That part is pretty foundational to Wasmtime right now. Wasmtime has been optimized for cheap instantiation but you by no means need to use that, you can create a single instance and run with that as well. If the instance-per-thread model is too expensive for you then you probably want to follow the shared-everything-threads proposal, but if you're also interested to use Wasmtime I'd recommend profiling if possible to see what the hot spots are and we can help see if we can optimize those. It's going to take quite some time to get shared-everything-threads implemented.
But yeah if you'd like to open a tracking issue/discussion place that's also reasonable!
Thanks for the link! I'll follow that proposal. And I'll definitely give wasmtime a try once no-std is supported. But I expect other issues before reaching threads, like supporting smaller page sizes (which I discovered has a proposal now, nice).
Sorry, actually one more question. Do we agree that this instance-per-thread approach requires modules that want parallel threads to import their shared memory? Otherwise each instance would get its own if I understand Wasmtime correctly. However, my reading of the spec says "For memories of shared type, no state is recorded in the instance itself." which seems to differ with what Wasmtime does. Or am I missing something?
Modules/instances have no inherent memories attached to them, they need to either import them or define them. In that sense if memory were not imported you'd still need to decide how to represent linear memory. If memory were defined then that wouldn't work in the instance-per-thread model because then each instance would define its own memory, defeating the purpose of sharing memory across threads.
So to answer your question, effectively yes - the instance-per-thread model basically requires a module to import shared linear memory. The shared part is required as it's the only wasm type that's safe to share across threads.
As for the spec, wasmtime implements those semantics which is that a shared memory is not attached to a particular instance or doesn't close over instance state. It sounds though like you've got something which you think contradicts this? I can take a closer look if you'd like to point that out
Thanks! This answers my first question (and confirms my understanding). Regarding my second question, now that I think about it, it's actually not a contradiction, but just a correct implementation detail. (The spec does not record any state in a shared memory instance, but this doesn't prevent an embedded to do so for convenience. What matters is that the trace of events be consistent, which doesn't prevent storing the shared memory state in its memory instance. To give an example of an alternative design to Wasmtime: each time a module is instantiated, for each shared memory it defines, the Wasmtime user should provide an appropriate shared memory. Wasmtime decided to create one automatically for the user and store it in the memory instance, or somewhere equivalent in the store.)
Last updated: Nov 22 2024 at 16:03 UTC