Loaded modules in child processes · general

I am embedding Wasmtime in a program which eventually creates child processes. Currently, the embedding goes as such:

I don't know what is causing those but I suspect the fork isn't carrying all necessary Rayon resources to the child process; any idea what is going on here? Should loaded stores not be forked at all?

==298987== 1,024 (512 direct, 512 indirect) bytes in 16 blocks are definitely lost in loss record 21 of 36
==298987==    at 0x4840B65: calloc (vg_replace_malloc.c:760)
==298987==    by 0x52FAE72: __cxa_thread_atexit_impl (in /usr/lib/libc-2.33.so)
==298987==    by 0x4CBD759: try_register_dtor<crossbeam_epoch::collector::LocalHandle> (local.rs:490)
==298987==    by 0x4CBD759: _ZN3std6thread5local4fast12Key$LT$T$GT$14try_initialize17h9f6740ecb45ddd03E.llvm.8233611950401219567 (local.rs:471)
==298987==    by 0x4CB6922: try_with<crossbeam_epoch::collector::LocalHandle,closure-0,bool> (local.rs:271)
==298987==    by 0x4CB6922: with_handle<closure-0,bool> (default.rs:43)
==298987==    by 0x4CB6922: is_pinned (default.rs:30)
==298987==    by 0x4CB6922: crossbeam_deque::deque::Stealer<T>::steal (deque.rs:619)
==298987==    by 0x4CB7D0A: {{closure}} (registry.rs:779)
==298987==    by 0x4CB7D0A: {{closure}}<usize,rayon_core::job::JobRef,closure-1> (iterator.rs:2257)
==298987==    by 0x4CB7D0A: {{closure}}<usize,(),core::ops::control_flow::ControlFlow<rayon_core::job::JobRef, ()>,closure-0,closure-0> (mod.rs:1078)
==298987==    by 0x4CB7D0A: call_mut<((), usize),closure-0> (function.rs:269)
==298987==    by 0x4CB7D0A: try_fold<core::ops::range::Range<usize>,(),&mut closure-0,core::ops::control_flow::ControlFlow<rayon_core::job::JobRef, ()>> (iterator.rs:1888)
==298987==    by 0x4CB7D0A: <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::try_fold (chain.rs:105)
==298987==    by 0x4CB2AFC: try_fold<core::iter::adapters::chain::Chain<core::ops::range::Range<usize>, core::ops::range::Range<usize>>,closure-0,(),closure-0,core::ops::control_flow::ControlFlow<rayon_core::job::JobRef, ()>> (mod.rs:1127)
==298987==    by 0x4CB2AFC: find_map<core::iter::adapters::Filter<core::iter::adapters::chain::Chain<core::ops::range::Range<usize>, core::ops::range::Range<usize>>, closure-0>,rayon_core::job::JobRef,closure-1> (iterator.rs:2263)
==298987==    by 0x4CB2AFC: steal (registry.rs:774)
==298987==    by 0x4CB2AFC: {{closure}} (registry.rs:726)
==298987==    by 0x4CB2AFC: or_else<rayon_core::job::JobRef,closure-0> (option.rs:786)
==298987==    by 0x4CB2AFC: rayon_core::registry::WorkerThread::wait_until_cold (registry.rs:724)
==298987==    by 0x4CB1349: wait_until<rayon_core::latch::CountLatch> (registry.rs:704)
==298987==    by 0x4CB1349: main_loop (registry.rs:837)
==298987==    by 0x4CB1349: rayon_core::registry::ThreadBuilder::run (registry.rs:56)
==298987==    by 0x4CB4A24: {{closure}} (registry.rs:101)
==298987==    by 0x4CB4A24: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125)
==298987==    by 0x4CB451C: {{closure}}<closure-0,()> (mod.rs:474)
==298987==    by 0x4CB451C: call_once<(),closure-0> (panic.rs:322)
==298987==    by 0x4CB451C: do_call<std::panic::AssertUnwindSafe<closure-0>,()> (panicking.rs:381)
==298987==    by 0x4CB451C: try<(),std::panic::AssertUnwindSafe<closure-0>> (panicking.rs:345)
==298987==    by 0x4CB451C: catch_unwind<std::panic::AssertUnwindSafe<closure-0>,()> (panic.rs:396)
==298987==    by 0x4CB451C: {{closure}}<closure-0,()> (mod.rs:473)
==298987==    by 0x4CB451C: core::ops::function::FnOnce::call_once{{vtable-shim}} (function.rs:227)
==298987==    by 0x4FE5AF9: call_once<(),FnOnce<()>,alloc::alloc::Global> (boxed.rs:1307)
==298987==    by 0x4FE5AF9: call_once<(),alloc::boxed::Box<FnOnce<()>, alloc::alloc::Global>,alloc::alloc::Global> (boxed.rs:1307)
==298987==    by 0x4FE5AF9: std::sys::unix::thread::Thread::new::thread_start (thread.rs:71)
==298987==    by 0x4881298: start_thread (in /usr/lib/libpthread-2.33.so)
==298987==    by 0x53BA152: clone (in /usr/lib/libc-2.33.so)

Thibault Charbonnier (Feb 16 2021 at 20:01):

Alex Crichton (Feb 16 2021 at 20:17):

AFAIK a leak like that is likely due to not joining on all child threads before the program exits (e.g. the rayon workers), and is generally a false positive

Alex Crichton (Feb 16 2021 at 20:17):

Thibault Charbonnier (Feb 16 2021 at 20:38):

Ah, thank you @Alex Crichton! I've been treating them as false positives so far but really wanted to hear a more informed opinion.

bjorn3 (Feb 17 2021 at 09:18):

I don't think it is very safe to call fork() after calling wasmtime_module_new. It spawned new threads. Unless your process is guaranteed to be single-threaded, it is generally a bad idea to fork(). After a fork() only the current thread exists in the new process, which will lead to dead-locks if any of the other threads were holding a lock. While Wasmtime currently waits for all compilation threads to finish their job before returning from wasmtime_module_new, this is not guaranteed, though it unlikely to change I think. What I would be more worried about it rayon not playing nice with fork() and maybe expecting that the worker threads still exist in the new thread.

Thibault Charbonnier (Feb 20 2021 at 00:45):

I believe you are right @bjorn3 and thank you for the heads-up. This is exactly what I'm observing, so good to know, I'll make the necessary changes.
Could this limitation around fork() be somewhat limiting for server-side embedding use-cases then? This seems to be Rayon's side to the story on shutting down thread pools: https://github.com/rayon-rs/rayon/issues/688. Would running wasmtime_module_new single-threaded be a foolish thing to do/allow?

Stream: general

Topic: Loaded modules in child processes

Thibault Charbonnier (Feb 16 2021 at 19:59):

Thibault Charbonnier (Feb 16 2021 at 20:01):

Alex Crichton (Feb 16 2021 at 20:17):

Alex Crichton (Feb 16 2021 at 20:17):

Thibault Charbonnier (Feb 16 2021 at 20:38):

bjorn3 (Feb 17 2021 at 09:18):

Thibault Charbonnier (Feb 20 2021 at 00:45):