Stream: general

Topic: Sharing Linker across multiple threads for many modules


view this post on Zulip Josh Groves (Jan 26 2024 at 13:45):

Hi! Currently I'm compiling a lot of tiny wasm modules across multiple threads. They all use the same Store type, so I was looking at the optimization mentioned in the Linker documentation that mentions sharing one Linker between modules. The problem I'm noticing is that putting a Linker behind a Mutex (or RwLock, etc.) actually seems to regress performance a bit because of the thread contention, so I was thinking of having one Linker per thread or something.

The reason I need a mutable Linker today is because each module may register some new functions that I haven't encountered before, so it might call e.g., linker.func_wrap to lazily add those host functions. I could try to do all of these upfront to build a list of all possible functions to register, but the lazy func_wraps are really convenient so I was hoping to avoid a separate pass.

Has anyone done anything similar and could recommend a strategy here? Maybe there's a way to avoid putting the Linker behind a Mutex?

view this post on Zulip Lann Martin (Jan 26 2024 at 14:27):

You can just clone the Linker, which does copy symbols internally but avoids reprocessing e.g. add_to_linker calls

view this post on Zulip Josh Groves (Jan 26 2024 at 14:38):

Interesting, that sounds like it could be a good path to try too. I'm currently testing out finer grained locking now too (e.g., right before I use func_wrap) in case I might reduce thread contention in general (the trade-off being more locking/unlocking). A lot of the modules are similar and don't need a lot of func_wraps so maybe this will help.

view this post on Zulip Lann Martin (Jan 26 2024 at 15:08):

Actually, looking at the code it appears that Linker uses Arc<str> internally, so you aren't even copying all that much. I doubt you'll beat copying with lock contention.

view this post on Zulip Josh Groves (Jan 26 2024 at 15:10):

Yeah definitely possible. I would be concerned about some of the types in there though, e.g. the hashmaps on Linker might add up quickly for thousands of modules

view this post on Zulip Pat Hickey (Jan 26 2024 at 17:36):

another way you could reduce contention, if it applies to your use case, is to link a module into an InstancePre once, and then clone the InstancePre to instantiate on many threads

view this post on Zulip Pat Hickey (Jan 26 2024 at 17:36):

InstancePre is basically amortizing out the linker resolution for modules that get instantiated with the same linker many times

view this post on Zulip Pat Hickey (Jan 26 2024 at 17:39):

I apologize for the name InstancePre, we couldnt come up with a better word for a thing that is halfway between a module and an instance

view this post on Zulip Pat Hickey (Jan 26 2024 at 17:40):

its got all the imports resolved, it just needs a store for the mutable state

view this post on Zulip Alex Crichton (Jan 26 2024 at 17:46):

Some other possible help I can offer:

They all use the same Store type

One thing to be aware of with this is that if you never destroy a Store it keeps all wasm instances alive so you'd never deallocate anything that's short lived. Store, however, is intended to be pretty cheap to create/destroy, but if that's a problem for you let us know and we can help investigate perf.


Also one idea we talked about a long time ago was the idea of hierarchical linkers, where you have a global linker with base functionality and then you can cheaply extend that with further linkers in per-module situations. That never materialized though, so it's not implemented today. We generally recommend having one linker for the whole program if you can with a static set of functions, but that doesn't work for use cases which need to perform instantiations with different sets of imported functions

view this post on Zulip Lann Martin (Jan 26 2024 at 17:49):

You could pass around an Arc<Linker> and use it to lazily init thread local clones. I wouldn't personally go that route without compelling benchmarks :shrug:

view this post on Zulip Josh Groves (Jan 26 2024 at 17:54):

@Pat Hickey thanks for the suggestion! I do cache InstancePre where possible, but the modules can be different which is where I end up needing to use the Linker again (e.g., I run into a module with a function that I haven't set up yet)

view this post on Zulip Josh Groves (Jan 26 2024 at 17:57):

@Alex Crichton Yeah it's just the Store type that is the same, but every instantiation gets its own Store and destroys it afterwards

view this post on Zulip Josh Groves (Jan 26 2024 at 18:02):

Yeah the modules may have different sets of imported functions, although I could look through all modules to union all sets together ahead of time, it's just not very convenient to do that vs. lazily defining imported functions as modules request them

view this post on Zulip Alex Crichton (Jan 26 2024 at 18:20):

Ah ok makes sense, and RwLock<Linker<T>> has too much overhead?

view this post on Zulip Josh Groves (Jan 26 2024 at 18:28):

RwLock<Linker<T>> seems to be ok. I ended up using RwLock with finer-grained write locking whenever I need to add import functions. The sharing one linker with RwLock seems to improve performance slightly (vs. creating or cloning new linkers per module) but not by very much. The RwLock changes might be bottlenecked by my thread contention though, so I'll work on improving that separately


Last updated: Oct 23 2024 at 20:03 UTC