Stream: general

Topic: ✔ Concurrent access of WASM modules/components


view this post on Zulip Christoph Brewing (Jan 11 2024 at 07:20):

Given I want to implement a multi-threaded application in Rust, e.g. with Tokio, where the application is designed to execute functions from stateless (!) Webassembly (WASM) libraries.

When I have an instance of a WASM library and a first thread which is in train to use it for the execution of a long running function. In this point in time a second thread tries to use the same WASM library's instance. Is that possible and ok or should the instance be protected (locked) in order to ensure single thread access?

view this post on Zulip Notification Bot (Jan 11 2024 at 11:28):

Christoph Brewing has marked this topic as resolved.

view this post on Zulip mainrs (Jan 18 2024 at 12:48):

Christoph Brewing said:

Given I want to implement a multi-threaded application in Rust, e.g. with Tokio, where the application is designed to execute functions from stateless (!) Webassembly (WASM) libraries.

When I have an instance of a WASM library and a first thread which is in train to use it for the execution of a long running function. In this point in time a second thread tries to use the same WASM library's instance. Is that possible and ok or should the instance be protected (locked) in order to ensure single thread access?

Could you elaborate on this one a little bit more? How did you solve this? I have a similar use-case: my application has a list of files to download and the downloader is written in WASI. I want to make it parallel. Since WASI has no threading yet, I was thinking about running multiple threads in tokio, each using the instantiated module to create a running http request.

I would have the same problem as you: simultaneous access to the WASM library

view this post on Zulip Christoph Brewing (Jan 22 2024 at 20:40):

@mainrs Sorry for the delay. Here is my reasoning:

The first thing I noticed was that in order to call any function from a WASM component (I have components, not modules, but should not make any difference) or to even instantiate a WASM component, I need a mutable Store (&mut store). My conclusion was that in order to use that in a multi-threaded environment, I would have one store per WASM component instance and I would protect both of them, store and corresponding instance, together. In my case, I use a Mutex for protecting both of them as an ensemble.

This kind of ensemble, Store and component instance, I have it multiple times in my application - as many times as I have worker threads.

In contrast to this, I just have one Engine, one Config and one Linker in the entire application.

There was one interesting "EUREKA" moment during development: In my application, new WASM component instances are created on demand. Thus, it may be the case that multiple instances have to be created at the same time. Due to logical complexity, the instantiation of any of my WASM component takes a few seconds (lots of state being initialized in that time). The first design of the multi-threaded application called Component.from_file() followed by instantiate() each time I needed a new instance. With multiple threads calling for an instantiation at the same time, it took the system quiiiiiite long to handle these calls. I still do not understand the root cause for that. However, at a certain point I started to cache the Component which is returned by Component.from_file() such that each physical .wasm would have to be compiled once and really only once. The instantiation (instantiate()) itself is quite fast. This change improved the overall system a lot.

Does this answer help you?

view this post on Zulip Lann Martin (Jan 23 2024 at 13:44):

With multiple threads calling for an instantiation at the same time, it took the system quiiiiiite long to handle these calls. I still do not understand the root cause for that.

This is almost certainly cranelift compiling your wasm (in Component::from_file). It is quite CPU-intensive so many compilations at once would be quite slow.

view this post on Zulip Christoph Brewing (Jan 23 2024 at 13:49):

Agreed, my observation was that the cpu was 100% busy. What surprised me, however, was that each thread returned only after ALL instances were ready.

For example, compilation + instantiation take 5 s for one instance. When I tried to compile and instantiate 3 instances at the same time, each being processed by a (different) blocking thread on my 4 core machine, each thread returned only after rougly 15 s where I can see absolutely no reason for them to have "waited" for one another.

view this post on Zulip Lann Martin (Jan 23 2024 at 13:51):

Is caching enabled for your engine?

Actually, maybe this is simpler: cranelift itself uses multiple threads by default, so you would expect 3 identical compilations to complete roughly 3x later than 1 (subject to your OS's scheduler).

view this post on Zulip Christoph Brewing (Jan 23 2024 at 13:54):

nope, and I am (was) not aware of its existence. Do you expect this to change things a lot?

I kind of implemented my own caching, in that is to say, I currently cache compiled components in a separate data structure (map).

From the short description, I am not sure what it does ..

view this post on Zulip Christoph Brewing (Jan 23 2024 at 13:55):

AHA, your last explanation actually explains what I have observed.
I did not know that, thank you!

view this post on Zulip Lann Martin (Jan 23 2024 at 13:56):

https://docs.rs/wasmtime/16.0.0/wasmtime/struct.Config.html#method.parallel_compilation

view this post on Zulip Christoph Brewing (Jan 23 2024 at 13:58):

Thanks, anyway a great takeaway for any application design with tight timining constraints, I think. To resume this discussion:

"Compilation of WASM modules/components takes time and cpu cycles, so one better takes care for it if speed matters".

view this post on Zulip Lann Martin (Jan 23 2024 at 14:13):

Another option you have (which is what wasmtime's caching feature uses internally) is to precompile wasm for the target machine. This is significantly harder to orchestrate than from_file but gives you control over exactly when and where compilation happens.

view this post on Zulip Christoph Brewing (Jan 23 2024 at 14:14):

That is an interesting point. Well, I would not consider it for the moment since I kind of "sell" infrastructure to my company internal customers with rather limited understanding/influence where it might run.

However, yes, as a general feature, very interesting indeed.


Last updated: Nov 22 2024 at 16:03 UTC