I am embedding Wasmtime in a program which eventually creates child processes. Currently, the embedding goes as such:
I don't know what is causing those but I suspect the fork isn't carrying all necessary Rayon resources to the child process; any idea what is going on here? Should loaded stores not be forked at all?
==298987== 1,024 (512 direct, 512 indirect) bytes in 16 blocks are definitely lost in loss record 21 of 36
==298987== at 0x4840B65: calloc (vg_replace_malloc.c:760)
==298987== by 0x52FAE72: __cxa_thread_atexit_impl (in /usr/lib/libc-2.33.so)
==298987== by 0x4CBD759: try_register_dtor<crossbeam_epoch::collector::LocalHandle> (local.rs:490)
==298987== by 0x4CBD759: _ZN3std6thread5local4fast12Key$LT$T$GT$14try_initialize17h9f6740ecb45ddd03E.llvm.8233611950401219567 (local.rs:471)
==298987== by 0x4CB6922: try_with<crossbeam_epoch::collector::LocalHandle,closure-0,bool> (local.rs:271)
==298987== by 0x4CB6922: with_handle<closure-0,bool> (default.rs:43)
==298987== by 0x4CB6922: is_pinned (default.rs:30)
==298987== by 0x4CB6922: crossbeam_deque::deque::Stealer<T>::steal (deque.rs:619)
==298987== by 0x4CB7D0A: {{closure}} (registry.rs:779)
==298987== by 0x4CB7D0A: {{closure}}<usize,rayon_core::job::JobRef,closure-1> (iterator.rs:2257)
==298987== by 0x4CB7D0A: {{closure}}<usize,(),core::ops::control_flow::ControlFlow<rayon_core::job::JobRef, ()>,closure-0,closure-0> (mod.rs:1078)
==298987== by 0x4CB7D0A: call_mut<((), usize),closure-0> (function.rs:269)
==298987== by 0x4CB7D0A: try_fold<core::ops::range::Range<usize>,(),&mut closure-0,core::ops::control_flow::ControlFlow<rayon_core::job::JobRef, ()>> (iterator.rs:1888)
==298987== by 0x4CB7D0A: <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::try_fold (chain.rs:105)
==298987== by 0x4CB2AFC: try_fold<core::iter::adapters::chain::Chain<core::ops::range::Range<usize>, core::ops::range::Range<usize>>,closure-0,(),closure-0,core::ops::control_flow::ControlFlow<rayon_core::job::JobRef, ()>> (mod.rs:1127)
==298987== by 0x4CB2AFC: find_map<core::iter::adapters::Filter<core::iter::adapters::chain::Chain<core::ops::range::Range<usize>, core::ops::range::Range<usize>>, closure-0>,rayon_core::job::JobRef,closure-1> (iterator.rs:2263)
==298987== by 0x4CB2AFC: steal (registry.rs:774)
==298987== by 0x4CB2AFC: {{closure}} (registry.rs:726)
==298987== by 0x4CB2AFC: or_else<rayon_core::job::JobRef,closure-0> (option.rs:786)
==298987== by 0x4CB2AFC: rayon_core::registry::WorkerThread::wait_until_cold (registry.rs:724)
==298987== by 0x4CB1349: wait_until<rayon_core::latch::CountLatch> (registry.rs:704)
==298987== by 0x4CB1349: main_loop (registry.rs:837)
==298987== by 0x4CB1349: rayon_core::registry::ThreadBuilder::run (registry.rs:56)
==298987== by 0x4CB4A24: {{closure}} (registry.rs:101)
==298987== by 0x4CB4A24: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:125)
==298987== by 0x4CB451C: {{closure}}<closure-0,()> (mod.rs:474)
==298987== by 0x4CB451C: call_once<(),closure-0> (panic.rs:322)
==298987== by 0x4CB451C: do_call<std::panic::AssertUnwindSafe<closure-0>,()> (panicking.rs:381)
==298987== by 0x4CB451C: try<(),std::panic::AssertUnwindSafe<closure-0>> (panicking.rs:345)
==298987== by 0x4CB451C: catch_unwind<std::panic::AssertUnwindSafe<closure-0>,()> (panic.rs:396)
==298987== by 0x4CB451C: {{closure}}<closure-0,()> (mod.rs:473)
==298987== by 0x4CB451C: core::ops::function::FnOnce::call_once{{vtable-shim}} (function.rs:227)
==298987== by 0x4FE5AF9: call_once<(),FnOnce<()>,alloc::alloc::Global> (boxed.rs:1307)
==298987== by 0x4FE5AF9: call_once<(),alloc::boxed::Box<FnOnce<()>, alloc::alloc::Global>,alloc::alloc::Global> (boxed.rs:1307)
==298987== by 0x4FE5AF9: std::sys::unix::thread::Thread::new::thread_start (thread.rs:71)
==298987== by 0x4881298: start_thread (in /usr/lib/libpthread-2.33.so)
==298987== by 0x53BA152: clone (in /usr/lib/libc-2.33.so)
2nd Valgrind report was too long, attached here: valgrind.out
AFAIK a leak like that is likely due to not joining on all child threads before the program exits (e.g. the rayon workers), and is generally a false positive
I'm not sure if rayon provides a way to shut down its thread pool
Ah, thank you @Alex Crichton! I've been treating them as false positives so far but really wanted to hear a more informed opinion.
I don't think it is very safe to call fork()
after calling wasmtime_module_new
. It spawned new threads. Unless your process is guaranteed to be single-threaded, it is generally a bad idea to fork()
. After a fork()
only the current thread exists in the new process, which will lead to dead-locks if any of the other threads were holding a lock. While Wasmtime currently waits for all compilation threads to finish their job before returning from wasmtime_module_new
, this is not guaranteed, though it unlikely to change I think. What I would be more worried about it rayon not playing nice with fork()
and maybe expecting that the worker threads still exist in the new thread.
I believe you are right @bjorn3 and thank you for the heads-up. This is exactly what I'm observing, so good to know, I'll make the necessary changes.
Could this limitation around fork()
be somewhat limiting for server-side embedding use-cases then? This seems to be Rayon's side to the story on shutting down thread pools: https://github.com/rayon-rs/rayon/issues/688. Would running wasmtime_module_new
single-threaded be a foolish thing to do/allow?
Last updated: Jan 24 2025 at 00:11 UTC