Concurrent execution · wasmtime · Zulip Chat Archive

Hi, is it possible to run multiple functions on a wasmtime Instance concurrently or do I need to call each function sequentially?

Buster Styren (May 12 2023 at 08:38):

I'm guessing it might not work due to the memory model. But lets say I want to run a type of wasm server, if I have to create a separate instance for each invocation wouldn't that carry quite a big memory overhead?

Lann Martin (May 12 2023 at 12:28):

For a single instance, calls must be sequential. There is ongoing work to add threading to wasmtime but afaik it isn't generally usable yet.

Lann Martin (May 12 2023 at 12:31):

Not necessarily, though it depends a bit on the host OS. Linux and macOS can use copy-on-write memory initialization which - depending on the module - can give you some of the same memory behavior as multithreading.

Alex Crichton (May 12 2023 at 14:20):

Note that the threading support is good enough to use in Wasmtime today for testing out, but it requires cooperation with the wasm module and module's can't transparently be multi-threaded. Most modules aren't in multi-threaded mode (e.g. support in Rust is unstable) so that's probably not viable.

Otherwise though to add to what Lann already mentioned the overhead for a new instance is relatively small with copy-on-write and it largely depends on the guest. If the guest only needs 64k of memory then that's the rough overhead, but larger guests may require more memory. In that sense the overhead may depend on whether you control the wasm module or it's given as an arbitrary module to you

Buster Styren (May 16 2023 at 14:50):

Amit Upadhyay (May 17 2023 at 06:23):

When a host exported function is called from inside wasm, the caller is accessible. Using caller.get_export() I can access Option<Extern> and get Func out of it, which is a malloc equivalent exported by wasm. How do I call this malloc, I do not have access to &mut store? Would that not be concurrent execution, which is not allowed.

How do I allocate space for say an array and return the array back to wasm function?

Amit Upadhyay (May 17 2023 at 08:13):

fn main() -> anyhow::Result<()> {
    let engine = wasmtime::Engine::default();
    let module = wasmtime::Module::from_file(
        &engine,
        "../guest/target/wasm32-unknown-unknown/debug/guest.wasm",
    )?;

    let mut store = wasmtime::Store::new(&engine, ());

    let from_host = wasmtime::Func::wrap(&mut store, |mut caller: wasmtime::Caller<'_, ()>| {
        println!("called from wasm");
        let mut o = wasmtime::Val::I32(0);
        let mut v = vec![o];
        caller.get_export("foo").expect("foo not found").into_func().expect("foo not a func").call(&mut caller, &[], &mut v).expect("call failed");
        dbg!(v);
    });

    let from_host2 = wasmtime::Func::wrap(&mut store, |mut _caller: wasmtime::Caller<'_, ()>| {
        println!("called from wasm2");
        10
    });

    let instance = wasmtime::Instance::new(&mut store, &module, &[from_host.into(), from_host2.into()])?;
    let sum = instance.get_typed_func::<(i32,), i32>(&mut store, "sum")?;
    println!("wasm said: {}", sum.call(&mut store, (1,))?);

    Ok(())
}

Can we conclude the wasmtime::Memory is inaccurate? Even in this thread @Lann Martin says "For a single instance, calls must be sequential. ".

Lann Martin (May 17 2023 at 13:24):

It makes sense to me that the code you posted should work. I'm not sure I understand what "can’t recursively call into WebAssembly" in the docs means exactly.

This was just in response to whether calls can be concurrent in a single instance, which they can't (module the wasm threading proposal); you could say that Wasm supports only a single call stack at a time.

Alex Crichton (May 17 2023 at 14:12):

With the Wasmtime embedding API if you can do it without unsafe then it's safe to do it. The documentation on Memory is about code that may be tempted to use unsafe. You're not using unsafe here so you're good.

Buster Styren (May 17 2023 at 21:27):

So, I just looked into copy-on-write and I'm unsure how I can get that to work properly in a multi-threaded context. Even if it's cheap to create multiple Instance I still need to supply a mutable reference to the Store in order to add host functions during linking, so I can't "prepare" the whole module and instantiate them (cheaply) at will, I guess?

Which means that I can't readily instantiate Instances from the linked module without doing the actual linking with a separate store, which I guess will not make it possible to copy-on-write any significant part of the mem allocation? Or is it possible to copy or clone the store for each new invocation?

Alex Crichton (May 17 2023 at 21:30):

The fastest way to instantiate is by first creating an InstancePre<T> and then repeatedly using that to instantiate. Each instantiation requires a unique Store<T> but those should be cheap to create and destroy. The copy-on-write optimizations have to do with linear memory initialization and is a per-module property that the embedder doesn't actively need to enable as it will happen automatically.

Buster Styren (May 17 2023 at 21:35):

Joel Dice (May 17 2023 at 21:52):

@Buster Styren You may be interested in some performance testing I did using various execution strategies in Wasmtime, including pre-compilation, pre-instantiation, allocation pooling, etc.: https://github.com/dicej/wasmtime-serverless-performance

Alex Crichton (May 17 2023 at 22:22):

Alex Crichton (May 17 2023 at 22:23):

I had no idea that a fresh instance with pooling plus reusing an instance was only ~2x, that's actually much lower than I might have expected

Alex Crichton (May 17 2023 at 22:25):

One interesting axis in case anyone's curious for benchmarks like this to explore is what happens at higher parallelism too. For example have N background threads all doing the same work as the main thread and time how long it takes the main thread to do its iteration (we have this in Wasmtime's instantiate benchmarks). Strategies such as CoW and pooling show pretty big effects at >2 concurrency when IPIs and the kernel come into play more

Julien Cretin (ia0) (Apr 22 2024 at 13:45):

This is a year old thread, but I think it makes sense to reconsider its answer now that no-std starts to be an option. In embedded context, there is not necessarily copy-on-write capabilities and creating a new store and instance for each thread might be too costly. Is my understanding correct that this is a limitation of wasmtime and that the wasm specification theoretically permits concurrent usage of a single store? (i.e. running concurrent threads within the same store) Is this a capability that wasmtime might be considering to add in the future? (in particular once no-std is supported) Thanks! (I can create a separate discussion thread or open an issue on GitHub if that's preferred)

Alex Crichton (Apr 22 2024 at 14:44):

Wasmtime won't support concurrent execution within a single store, even with no_std support. That part is pretty foundational to Wasmtime right now. Wasmtime has been optimized for cheap instantiation but you by no means need to use that, you can create a single instance and run with that as well. If the instance-per-thread model is too expensive for you then you probably want to follow the shared-everything-threads proposal, but if you're also interested to use Wasmtime I'd recommend profiling if possible to see what the hot spots are and we can help see if we can optimize those. It's going to take quite some time to get shared-everything-threads implemented.

But yeah if you'd like to open a tracking issue/discussion place that's also reasonable!

Julien Cretin (ia0) (Apr 22 2024 at 15:49):

Thanks for the link! I'll follow that proposal. And I'll definitely give wasmtime a try once no-std is supported. But I expect other issues before reaching threads, like supporting smaller page sizes (which I discovered has a proposal now, nice).

Julien Cretin (ia0) (Apr 23 2024 at 08:08):

Sorry, actually one more question. Do we agree that this instance-per-thread approach requires modules that want parallel threads to import their shared memory? Otherwise each instance would get its own if I understand Wasmtime correctly. However, my reading of the spec says "For memories of shared type, no state is recorded in the instance itself." which seems to differ with what Wasmtime does. Or am I missing something?

Alex Crichton (Apr 23 2024 at 17:34):

Modules/instances have no inherent memories attached to them, they need to either import them or define them. In that sense if memory were not imported you'd still need to decide how to represent linear memory. If memory were defined then that wouldn't work in the instance-per-thread model because then each instance would define its own memory, defeating the purpose of sharing memory across threads.

So to answer your question, effectively yes - the instance-per-thread model basically requires a module to import shared linear memory. The shared part is required as it's the only wasm type that's safe to share across threads.

As for the spec, wasmtime implements those semantics which is that a shared memory is not attached to a particular instance or doesn't close over instance state. It sounds though like you've got something which you think contradicts this? I can take a closer look if you'd like to point that out

Julien Cretin (ia0) (Apr 24 2024 at 09:16):

Thanks! This answers my first question (and confirms my understanding). Regarding my second question, now that I think about it, it's actually not a contradiction, but just a correct implementation detail. (The spec does not record any state in a shared memory instance, but this doesn't prevent an embedded to do so for convenience. What matters is that the trace of events be consistent, which doesn't prevent storing the shared memory state in its memory instance. To give an example of an alternative design to Wasmtime: each time a module is instantiated, for each shared memory it defines, the Wasmtime user should provide an appropriate shared memory. Wasmtime decided to create one automatically for the user and store it in the memory instance, or somewhere equivalent in the store.)

Stream: wasmtime

Topic: Concurrent execution

Buster Styren (May 12 2023 at 08:35):