Given I want to implement a multi-threaded application in Rust, e.g. with Tokio, where the application is designed to execute functions from stateless (!) Webassembly (WASM) libraries.
When I have an instance of a WASM library and a first thread which is in train to use it for the execution of a long running function. In this point in time a second thread tries to use the same WASM library's instance. Is that possible and ok or should the instance be protected (locked) in order to ensure single thread access?
Christoph Brewing has marked this topic as resolved.
Christoph Brewing said:
Given I want to implement a multi-threaded application in Rust, e.g. with Tokio, where the application is designed to execute functions from stateless (!) Webassembly (WASM) libraries.
When I have an instance of a WASM library and a first thread which is in train to use it for the execution of a long running function. In this point in time a second thread tries to use the same WASM library's instance. Is that possible and ok or should the instance be protected (locked) in order to ensure single thread access?
Could you elaborate on this one a little bit more? How did you solve this? I have a similar use-case: my application has a list of files to download and the downloader is written in WASI. I want to make it parallel. Since WASI has no threading yet, I was thinking about running multiple threads in tokio, each using the instantiated module to create a running http request.
I would have the same problem as you: simultaneous access to the WASM library
@mainrs Sorry for the delay. Here is my reasoning:
The first thing I noticed was that in order to call any function from a WASM component (I have components, not modules, but should not make any difference) or to even instantiate a WASM component, I need a mutable Store (&mut store
). My conclusion was that in order to use that in a multi-threaded environment, I would have one store per WASM component instance and I would protect both of them, store and corresponding instance, together. In my case, I use a Mutex for protecting both of them as an ensemble.
This kind of ensemble, Store and component instance, I have it multiple times in my application - as many times as I have worker threads.
In contrast to this, I just have one Engine, one Config and one Linker in the entire application.
There was one interesting "EUREKA" moment during development: In my application, new WASM component instances are created on demand. Thus, it may be the case that multiple instances have to be created at the same time. Due to logical complexity, the instantiation of any of my WASM component takes a few seconds (lots of state being initialized in that time). The first design of the multi-threaded application called Component.from_file()
followed by instantiate()
each time I needed a new instance. With multiple threads calling for an instantiation at the same time, it took the system quiiiiiite long to handle these calls. I still do not understand the root cause for that. However, at a certain point I started to cache the Component
which is returned by Component.from_file()
such that each physical .wasm would have to be compiled once and really only once. The instantiation (instantiate()
) itself is quite fast. This change improved the overall system a lot.
Does this answer help you?
With multiple threads calling for an instantiation at the same time, it took the system quiiiiiite long to handle these calls. I still do not understand the root cause for that.
This is almost certainly cranelift compiling your wasm (in Component::from_file
). It is quite CPU-intensive so many compilations at once would be quite slow.
Agreed, my observation was that the cpu was 100% busy. What surprised me, however, was that each thread returned only after ALL instances were ready.
For example, compilation + instantiation take 5 s for one instance. When I tried to compile and instantiate 3 instances at the same time, each being processed by a (different) blocking thread on my 4 core machine, each thread returned only after rougly 15 s where I can see absolutely no reason for them to have "waited" for one another.
Is caching enabled for your engine?
Actually, maybe this is simpler: cranelift itself uses multiple threads by default, so you would expect 3 identical compilations to complete roughly 3x later than 1 (subject to your OS's scheduler).
nope, and I am (was) not aware of its existence. Do you expect this to change things a lot?
I kind of implemented my own caching, in that is to say, I currently cache compiled components in a separate data structure (map).
From the short description, I am not sure what it does ..
AHA, your last explanation actually explains what I have observed.
I did not know that, thank you!
https://docs.rs/wasmtime/16.0.0/wasmtime/struct.Config.html#method.parallel_compilation
Thanks, anyway a great takeaway for any application design with tight timining constraints, I think. To resume this discussion:
"Compilation of WASM modules/components takes time and cpu cycles, so one better takes care for it if speed matters".
Another option you have (which is what wasmtime's caching feature uses internally) is to precompile wasm for the target machine. This is significantly harder to orchestrate than from_file
but gives you control over exactly when and where compilation happens.
That is an interesting point. Well, I would not consider it for the moment since I kind of "sell" infrastructure to my company internal customers with rather limited understanding/influence where it might run.
However, yes, as a general feature, very interesting indeed.
Last updated: Jan 24 2025 at 00:11 UTC