Stream: git-wasmtime

Topic: wasmtime / issue #5243 Is there a performance bottleneck ...


view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 07:03):

asurdo opened issue #5243:

Question:

I use wasmtime in c++ multithread program. I expected performance improvement when I use more threads, but it does not. Is there any bottleneck in wasmtime? And how can I avoid it?

Hear is my performance test log.

My machine environment is 24 core and 64GB memory.

1 thread: speed 8217/s, average time cost 120us
2 thread: speed: 10137/s, average time cost: 196us
10 thread: speed 11190/s, average time cost: 892us

My code logic

singleton for the engine and linker, creating a store on each call, use wasi and some host function.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 07:14):

asurdo commented on issue #5243:

PS: I compile module only once,and instantiate every time.
I seprate one call to three step, init store, instantiate and call instance func. The step where the time-consuming increases the most is in call instance func.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 09:43):

bjorn3 commented on issue #5243:

The bottleneck is probably in the linux kernel. As I understand it changing memory mappings will acquire a process wide lock on the memory mappings and once it is done changing them every currently running thread of the process is interrupted to make sure that the cpu core knows about the changed memory mapping. Every time you instantiate a module, memory mappings are changed. This effectively means that the instantiation count per second across the whole process is limited by how fast a single core can handle memory mapping changes and in addition every instantiation pauses execution of other threads for a moment.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 09:46):

bjorn3 commented on issue #5243:

I believe using the pooling allocation strategy (set using config.allocation_strategy()) is more performant than the default on demand strategy.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 12:33):

asurdo commented on issue #5243:

I believe using the pooling allocation strategy (set using config.allocation_strategy()) is more performant than the default on demand strategy.

Is config.allocation_strategy() only for Rust? I can't find it in c-API. And in fact, I use jemalloc for my c++ program, malloc is in user mode.

I avoid this problem using multi-process(more memory cost for hundreds of modules), and it worked as I expected.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 12:46):

bjorn3 commented on issue #5243:

Is config.allocation_strategy() only for Rust?

Looks like it. Seems to be an oversight.

And in fact, I use jemalloc for my c++ program, malloc is in user mode.

When instantiating wasmtime mmap's a (memfd) file containing the initial data of the linear memory of the wasm module. This allows the kernel to lazily load parts and copy only when the data is modified as opposed to having wasmtime write it all at once even if the vast majority isn't used at all or isn't modified. This is a necessity for fast instantiating.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 14:42):

alexcrichton commented on issue #5243:

I believe that this is largely the same issue as https://github.com/bytecodealliance/wasmtime/issues/4637, so I'm going to close in favor of that.

The pooling allocator is known to help here. New *_keep_resident configuration options are also known to help. Very-recent Linux kernels are also known to help. This is an ongoing area of investigation for WebAssembly runtimes and is something we're always interested in improving on.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 14:42):

alexcrichton closed issue #5243:

Question:

I use wasmtime in c++ multithread program. I expected performance improvement when I use more threads, but it does not. Is there any bottleneck in wasmtime? And how can I avoid it?

Hear is my performance test log.

My machine environment is 24 core and 64GB memory.

1 thread: speed 8217/s, average time cost 120us
2 thread: speed: 10137/s, average time cost: 196us
10 thread: speed 11190/s, average time cost: 892us

My code logic

singleton for the engine and linker, creating a store on each call, use wasi and some host function.


Last updated: Nov 22 2024 at 17:03 UTC