Stream: wasmtime

Topic: Concurrent execution


view this post on Zulip Buster Styren (May 12 2023 at 08:35):

Hi, is it possible to run multiple functions on a wasmtime Instance concurrently or do I need to call each function sequentially?

view this post on Zulip Buster Styren (May 12 2023 at 08:38):

I'm guessing it might not work due to the memory model. But lets say I want to run a type of wasm server, if I have to create a separate instance for each invocation wouldn't that carry quite a big memory overhead?

view this post on Zulip Lann Martin (May 12 2023 at 12:28):

Buster Styren said:

Hi, is it possible to run multiple functions on a wasmtime Instance concurrently or do I need to call each function sequentially?

For a single instance, calls must be sequential. There is ongoing work to add threading to wasmtime but afaik it isn't generally usable yet.

view this post on Zulip Lann Martin (May 12 2023 at 12:31):

if I have to create a separate instance for each invocation wouldn't that carry quite a big memory overhead

Not necessarily, though it depends a bit on the host OS. Linux and macOS can use copy-on-write memory initialization which - depending on the module - can give you some of the same memory behavior as multithreading.

view this post on Zulip Alex Crichton (May 12 2023 at 14:20):

Note that the threading support is good enough to use in Wasmtime today for testing out, but it requires cooperation with the wasm module and module's can't transparently be multi-threaded. Most modules aren't in multi-threaded mode (e.g. support in Rust is unstable) so that's probably not viable.

Otherwise though to add to what Lann already mentioned the overhead for a new instance is relatively small with copy-on-write and it largely depends on the guest. If the guest only needs 64k of memory then that's the rough overhead, but larger guests may require more memory. In that sense the overhead may depend on whether you control the wasm module or it's given as an arbitrary module to you

view this post on Zulip Buster Styren (May 16 2023 at 14:50):

Alright, that answers all my questions. Thank you both!

view this post on Zulip Amit Upadhyay (May 17 2023 at 06:23):

wasmtime::Caller:

Second a Caller can be used as the name implies, learning about the caller’s context, namely it’s exported memory and exported functions. This allows functions which take pointers as arguments to easily read the memory the pointers point into, or if a function is expected to call malloc in the wasm module to reserve space for the output you can do that.

When a host exported function is called from inside wasm, the caller is accessible. Using caller.get_export() I can access Option<Extern> and get Func out of it, which is a malloc equivalent exported by wasm. How do I call this malloc, I do not have access to &mut store? Would that not be concurrent execution, which is not allowed.

How do I allocate space for say an array and return the array back to wasm function?

view this post on Zulip Amit Upadhyay (May 17 2023 at 08:13):

So I double checked where I got my information from: wasmtime::Memory -> Memory And Safety:

This includes getting access to the T on Store<T>, _but it also means that you can’t recursively call into WebAssembly for instance_.

But caller is proxy for store, so this works:

fn main() -> anyhow::Result<()> {
    let engine = wasmtime::Engine::default();
    let module = wasmtime::Module::from_file(
        &engine,
        "../guest/target/wasm32-unknown-unknown/debug/guest.wasm",
    )?;

    let mut store = wasmtime::Store::new(&engine, ());

    let from_host = wasmtime::Func::wrap(&mut store, |mut caller: wasmtime::Caller<'_, ()>| {
        println!("called from wasm");
        let mut o = wasmtime::Val::I32(0);
        let mut v = vec![o];
        caller.get_export("foo").expect("foo not found").into_func().expect("foo not a func").call(&mut caller, &[], &mut v).expect("call failed");
        dbg!(v);
    });

    let from_host2 = wasmtime::Func::wrap(&mut store, |mut _caller: wasmtime::Caller<'_, ()>| {
        println!("called from wasm2");
        10
    });

    let instance = wasmtime::Instance::new(&mut store, &module, &[from_host.into(), from_host2.into()])?;
    let sum = instance.get_typed_func::<(i32,), i32>(&mut store, "sum")?;
    println!("wasm said: {}", sum.call(&mut store, (1,))?);

    Ok(())
}

Full source.

Can we conclude the wasmtime::Memory is inaccurate? Even in this thread @Lann Martin says "For a single instance, calls must be sequential. ".

What am I missing?

Contribute to fastn-stack/experiments development by creating an account on GitHub.

view this post on Zulip Lann Martin (May 17 2023 at 13:24):

It makes sense to me that the code you posted should work. I'm not sure I understand what "can’t recursively call into WebAssembly" in the docs means exactly.

For a single instance, calls must be sequential.

This was just in response to whether calls can be concurrent in a single instance, which they can't (module the wasm threading proposal); you could say that Wasm supports only a single call stack at a time.

view this post on Zulip Alex Crichton (May 17 2023 at 14:12):

With the Wasmtime embedding API if you can do it without unsafe then it's safe to do it. The documentation on Memory is about code that may be tempted to use unsafe. You're not using unsafe here so you're good.

view this post on Zulip Buster Styren (May 17 2023 at 21:27):

So, I just looked into copy-on-write and I'm unsure how I can get that to work properly in a multi-threaded context. Even if it's cheap to create multiple Instance I still need to supply a mutable reference to the Store in order to add host functions during linking, so I can't "prepare" the whole module and instantiate them (cheaply) at will, I guess?

Which means that I can't readily instantiate Instances from the linked module without doing the actual linking with a separate store, which I guess will not make it possible to copy-on-write any significant part of the mem allocation? Or is it possible to copy or clone the store for each new invocation?

view this post on Zulip Alex Crichton (May 17 2023 at 21:30):

The fastest way to instantiate is by first creating an InstancePre<T> and then repeatedly using that to instantiate. Each instantiation requires a unique Store<T> but those should be cheap to create and destroy. The copy-on-write optimizations have to do with linear memory initialization and is a per-module property that the embedder doesn't actively need to enable as it will happen automatically.

view this post on Zulip Buster Styren (May 17 2023 at 21:35):

Ah beautiful, now it all makes sense. Thank you for being so helpful.

view this post on Zulip Joel Dice (May 17 2023 at 21:52):

@Buster Styren You may be interested in some performance testing I did using various execution strategies in Wasmtime, including pre-compilation, pre-instantiation, allocation pooling, etc.: https://github.com/dicej/wasmtime-serverless-performance

Contribute to dicej/wasmtime-serverless-performance development by creating an account on GitHub.

view this post on Zulip Alex Crichton (May 17 2023 at 22:22):

"how I got wasmtime to be 30x slower than fork" hehe a great comparison!

view this post on Zulip Alex Crichton (May 17 2023 at 22:23):

I had no idea that a fresh instance with pooling plus reusing an instance was only ~2x, that's actually much lower than I might have expected

view this post on Zulip Alex Crichton (May 17 2023 at 22:25):

One interesting axis in case anyone's curious for benchmarks like this to explore is what happens at higher parallelism too. For example have N background threads all doing the same work as the main thread and time how long it takes the main thread to do its iteration (we have this in Wasmtime's instantiate benchmarks). Strategies such as CoW and pooling show pretty big effects at >2 concurrency when IPIs and the kernel come into play more

view this post on Zulip Julien Cretin (ia0) (Apr 22 2024 at 13:45):

This is a year old thread, but I think it makes sense to reconsider its answer now that no-std starts to be an option. In embedded context, there is not necessarily copy-on-write capabilities and creating a new store and instance for each thread might be too costly. Is my understanding correct that this is a limitation of wasmtime and that the wasm specification theoretically permits concurrent usage of a single store? (i.e. running concurrent threads within the same store) Is this a capability that wasmtime might be considering to add in the future? (in particular once no-std is supported) Thanks! (I can create a separate discussion thread or open an issue on GitHub if that's preferred)

view this post on Zulip Alex Crichton (Apr 22 2024 at 14:44):

Wasmtime won't support concurrent execution within a single store, even with no_std support. That part is pretty foundational to Wasmtime right now. Wasmtime has been optimized for cheap instantiation but you by no means need to use that, you can create a single instance and run with that as well. If the instance-per-thread model is too expensive for you then you probably want to follow the shared-everything-threads proposal, but if you're also interested to use Wasmtime I'd recommend profiling if possible to see what the hot spots are and we can help see if we can optimize those. It's going to take quite some time to get shared-everything-threads implemented.

But yeah if you'd like to open a tracking issue/discussion place that's also reasonable!

A draft proposal for spawning threads in WebAssembly - WebAssembly/shared-everything-threads

view this post on Zulip Julien Cretin (ia0) (Apr 22 2024 at 15:49):

Thanks for the link! I'll follow that proposal. And I'll definitely give wasmtime a try once no-std is supported. But I expect other issues before reaching threads, like supporting smaller page sizes (which I discovered has a proposal now, nice).

view this post on Zulip Julien Cretin (ia0) (Apr 23 2024 at 08:08):

Sorry, actually one more question. Do we agree that this instance-per-thread approach requires modules that want parallel threads to import their shared memory? Otherwise each instance would get its own if I understand Wasmtime correctly. However, my reading of the spec says "For memories of shared type, no state is recorded in the instance itself." which seems to differ with what Wasmtime does. Or am I missing something?

view this post on Zulip Alex Crichton (Apr 23 2024 at 17:34):

Modules/instances have no inherent memories attached to them, they need to either import them or define them. In that sense if memory were not imported you'd still need to decide how to represent linear memory. If memory were defined then that wouldn't work in the instance-per-thread model because then each instance would define its own memory, defeating the purpose of sharing memory across threads.

So to answer your question, effectively yes - the instance-per-thread model basically requires a module to import shared linear memory. The shared part is required as it's the only wasm type that's safe to share across threads.

As for the spec, wasmtime implements those semantics which is that a shared memory is not attached to a particular instance or doesn't close over instance state. It sounds though like you've got something which you think contradicts this? I can take a closer look if you'd like to point that out

view this post on Zulip Julien Cretin (ia0) (Apr 24 2024 at 09:16):

Thanks! This answers my first question (and confirms my understanding). Regarding my second question, now that I think about it, it's actually not a contradiction, but just a correct implementation detail. (The spec does not record any state in a shared memory instance, but this doesn't prevent an embedded to do so for convenience. What matters is that the trace of events be consistent, which doesn't prevent storing the shared memory state in its memory instance. To give an example of an alternative design to Wasmtime: each time a module is instantiated, for each shared memory it defines, the Wasmtime user should provide an appropriate shared memory. Wasmtime decided to create one automatically for the user and store it in the memory instance, or somewhere equivalent in the store.)


Last updated: Oct 23 2024 at 20:03 UTC