Stream: general

Topic: Loaded modules in child processes


view this post on Zulip Thibault Charbonnier (Feb 16 2021 at 19:59):

I am embedding Wasmtime in a program which eventually creates child processes. Currently, the embedding goes as such:

  1. Parent process creates a store and loads wasm_module_t via wasmtime_module_new() in order to validate each module, imports, etc...
  2. Parent process calls fork() and create a child process
  3. Child process creates new stores and instantiates said modules from the modules created in step 1. (thanks to CoW)
  4. All is well but when the child process exits (after closing all instances and stores), Valgrind reports the following memory leaks (see below)

I don't know what is causing those but I suspect the fork isn't carrying all necessary Rayon resources to the child process; any idea what is going on here? Should loaded stores not be forked at all?

==298987== 1,024 (512 direct, 512 indirect) bytes in 16 blocks are definitely lost in loss record 21 of 36
==298987==    at 0x4840B65: calloc (vg_replace_malloc.c:760)
==298987==    by 0x52FAE72: __cxa_thread_atexit_impl (in /usr/lib/libc-2.33.so)
==298987==    by 0x4CBD759: try_register_dtor<crossbeam_epoch::collector::LocalHandle> (local.rs:490)
==298987==    by 0x4CBD759: _ZN3std6thread5local4fast12Key$LT$T$GT$14try_initialize17h9f6740ecb45ddd03E.llvm.8233611950401219567 (local.rs:471)
==298987==    by 0x4CB6922: try_with<crossbeam_epoch::collector::LocalHandle,closure-0,bool> (local.rs:271)
==298987==    by 0x4CB6922: with_handle<closure-0,bool> (default.rs:43)
==298987==    by 0x4CB6922: is_pinned (default.rs:30)
==298987==    by 0x4CB6922: crossbeam_deque::deque::Stealer<T>::steal (deque.rs:619)
==298987==    by 0x4CB7D0A: {{closure}} (registry.rs:779)
==298987==    by 0x4CB7D0A: {{closure}}<usize,rayon_core::job::JobRef,closure-1> (iterator.rs:2257)
==298987==    by 0x4CB7D0A: {{closure}}<usize,(),core::ops::control_flow::ControlFlow<rayon_core::job::JobRef, ()>,closure-0,closure-0> (mod.rs:1078)
==298987==    by 0x4CB7D0A: call_mut<((), usize),closure-0> (function.rs:269)
==298987==    by 0x4CB7D0A: try_fold<core::ops::range::Range<usize>,(),&mut closure-0,core::ops::control_flow::ControlFlow<rayon_core::job::JobRef, ()>> (iterator.rs:1888)
==298987==    by 0x4CB7D0A: <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::try_fold (chain.rs:105)
==298987==    by 0x4CB2AFC: try_fold<core::iter::adapters::chain::Chain<core::ops::range::Range<usize>, core::ops::range::Range<usize>>,closure-0,(),closure-0,core::ops::control_flow::ControlFlow<rayon_core::job::JobRef, ()>> (mod.rs:1127)
==298987==    by 0x4CB2AFC: find_map<core::iter::adapters::Filter<core::iter::adapters::chain::Chain<core::ops::range::Range<usize>, core::ops::range::Range<usize>>, closure-0>,rayon_core::job::JobRef,closure-1> (iterator.rs:2263)
==298987==    by 0x4CB2AFC: steal (registry.rs:774)
==298987==    by 0x4CB2AFC: {{closure}} (registry.rs:726)
==298987==    by 0x4CB2AFC: or_else<rayon_core::job::JobRef,closure-0> (option.rs:786)
==298987==    by 0x4CB2AFC: rayon_core::registry::WorkerThread::wait_until_cold (registry.rs:724)
==298987==    by 0x4CB1349: wait_until<rayon_core::latch::CountLatch> (registry.rs:704)
==298987==    by 0x4CB1349: main_loop (registry.rs:837)
==298987==    by 0x4CB1349: rayon_core::registry::ThreadBuilder::run (registry.rs:56)
==298987==    by 0x4CB4A24: {{closure}} (registry.rs:101)
==298987==    by 0x4CB4A24: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125)
==298987==    by 0x4CB451C: {{closure}}<closure-0,()> (mod.rs:474)
==298987==    by 0x4CB451C: call_once<(),closure-0> (panic.rs:322)
==298987==    by 0x4CB451C: do_call<std::panic::AssertUnwindSafe<closure-0>,()> (panicking.rs:381)
==298987==    by 0x4CB451C: try<(),std::panic::AssertUnwindSafe<closure-0>> (panicking.rs:345)
==298987==    by 0x4CB451C: catch_unwind<std::panic::AssertUnwindSafe<closure-0>,()> (panic.rs:396)
==298987==    by 0x4CB451C: {{closure}}<closure-0,()> (mod.rs:473)
==298987==    by 0x4CB451C: core::ops::function::FnOnce::call_once{{vtable-shim}} (function.rs:227)
==298987==    by 0x4FE5AF9: call_once<(),FnOnce<()>,alloc::alloc::Global> (boxed.rs:1307)
==298987==    by 0x4FE5AF9: call_once<(),alloc::boxed::Box<FnOnce<()>, alloc::alloc::Global>,alloc::alloc::Global> (boxed.rs:1307)
==298987==    by 0x4FE5AF9: std::sys::unix::thread::Thread::new::thread_start (thread.rs:71)
==298987==    by 0x4881298: start_thread (in /usr/lib/libpthread-2.33.so)
==298987==    by 0x53BA152: clone (in /usr/lib/libc-2.33.so)

view this post on Zulip Thibault Charbonnier (Feb 16 2021 at 20:01):

2nd Valgrind report was too long, attached here: valgrind.out

view this post on Zulip Alex Crichton (Feb 16 2021 at 20:17):

AFAIK a leak like that is likely due to not joining on all child threads before the program exits (e.g. the rayon workers), and is generally a false positive

view this post on Zulip Alex Crichton (Feb 16 2021 at 20:17):

I'm not sure if rayon provides a way to shut down its thread pool

view this post on Zulip Thibault Charbonnier (Feb 16 2021 at 20:38):

Ah, thank you @Alex Crichton! I've been treating them as false positives so far but really wanted to hear a more informed opinion.

view this post on Zulip bjorn3 (Feb 17 2021 at 09:18):

I don't think it is very safe to call fork() after calling wasmtime_module_new. It spawned new threads. Unless your process is guaranteed to be single-threaded, it is generally a bad idea to fork(). After a fork() only the current thread exists in the new process, which will lead to dead-locks if any of the other threads were holding a lock. While Wasmtime currently waits for all compilation threads to finish their job before returning from wasmtime_module_new, this is not guaranteed, though it unlikely to change I think. What I would be more worried about it rayon not playing nice with fork() and maybe expecting that the worker threads still exist in the new thread.

view this post on Zulip Thibault Charbonnier (Feb 20 2021 at 00:45):

I believe you are right @bjorn3 and thank you for the heads-up. This is exactly what I'm observing, so good to know, I'll make the necessary changes.
Could this limitation around fork() be somewhat limiting for server-side embedding use-cases then? This seems to be Rayon's side to the story on shutting down thread pools: https://github.com/rayon-rs/rayon/issues/688. Would running wasmtime_module_new single-threaded be a foolish thing to do/allow?

AFAICT ThreadPool::drop is async, so there's no easy way to ensure that all threads have shot down. Ideally, I'd like the guarantee of all threads having shot down, including running TLS de...

Last updated: Jan 24 2025 at 00:11 UTC