Hi! Currently I'm compiling a lot of tiny wasm modules across multiple threads. They all use the same Store
type, so I was looking at the optimization mentioned in the Linker
documentation that mentions sharing one Linker
between modules. The problem I'm noticing is that putting a Linker
behind a Mutex
(or RwLock
, etc.) actually seems to regress performance a bit because of the thread contention, so I was thinking of having one Linker
per thread or something.
The reason I need a mutable Linker
today is because each module may register some new functions that I haven't encountered before, so it might call e.g., linker.func_wrap
to lazily add those host functions. I could try to do all of these upfront to build a list of all possible functions to register, but the lazy func_wrap
s are really convenient so I was hoping to avoid a separate pass.
Has anyone done anything similar and could recommend a strategy here? Maybe there's a way to avoid putting the Linker
behind a Mutex
?
You can just clone the Linker
, which does copy symbols internally but avoids reprocessing e.g. add_to_linker
calls
Interesting, that sounds like it could be a good path to try too. I'm currently testing out finer grained locking now too (e.g., right before I use func_wrap
) in case I might reduce thread contention in general (the trade-off being more locking/unlocking). A lot of the modules are similar and don't need a lot of func_wrap
s so maybe this will help.
Actually, looking at the code it appears that Linker
uses Arc<str>
internally, so you aren't even copying all that much. I doubt you'll beat copying with lock contention.
Yeah definitely possible. I would be concerned about some of the types in there though, e.g. the hashmaps on Linker
might add up quickly for thousands of modules
another way you could reduce contention, if it applies to your use case, is to link a module into an InstancePre once, and then clone the InstancePre to instantiate on many threads
InstancePre is basically amortizing out the linker resolution for modules that get instantiated with the same linker many times
I apologize for the name InstancePre, we couldnt come up with a better word for a thing that is halfway between a module and an instance
its got all the imports resolved, it just needs a store for the mutable state
Some other possible help I can offer:
They all use the same Store type
One thing to be aware of with this is that if you never destroy a Store
it keeps all wasm instances alive so you'd never deallocate anything that's short lived. Store
, however, is intended to be pretty cheap to create/destroy, but if that's a problem for you let us know and we can help investigate perf.
Also one idea we talked about a long time ago was the idea of hierarchical linkers, where you have a global linker with base functionality and then you can cheaply extend that with further linkers in per-module situations. That never materialized though, so it's not implemented today. We generally recommend having one linker for the whole program if you can with a static set of functions, but that doesn't work for use cases which need to perform instantiations with different sets of imported functions
You could pass around an Arc<Linker>
and use it to lazily init thread local clones. I wouldn't personally go that route without compelling benchmarks :shrug:
@Pat Hickey thanks for the suggestion! I do cache InstancePre
where possible, but the modules can be different which is where I end up needing to use the Linker
again (e.g., I run into a module with a function that I haven't set up yet)
@Alex Crichton Yeah it's just the Store
type that is the same, but every instantiation gets its own Store
and destroys it afterwards
Yeah the modules may have different sets of imported functions, although I could look through all modules to union all sets together ahead of time, it's just not very convenient to do that vs. lazily defining imported functions as modules request them
Ah ok makes sense, and RwLock<Linker<T>>
has too much overhead?
RwLock<Linker<T>>
seems to be ok. I ended up using RwLock
with finer-grained write locking whenever I need to add import functions. The sharing one linker with RwLock
seems to improve performance slightly (vs. creating or cloning new linkers per module) but not by very much. The RwLock
changes might be bottlenecked by my thread contention though, so I'll work on improving that separately
Last updated: Jan 24 2025 at 00:11 UTC