jameysharp added the wasmtime label to Issue #7664.
jameysharp added the enhancement label to Issue #7664.
jameysharp opened issue #7664:
Feature
We can currently only walk the wasm stack after exiting from the guest, whether because of a trap, a host-call, or an epoch or fuel interruption. I would like to be able to walk the wasm stack from a signal handler, such as a timer.
Benefit
Wasmtime's guest profiler can currently only take samples on guest exits because it can't collect a stack trace at any other time. When used with epoch interruptions, that biases it to observing execution only at function calls and loop back-edges. It would produce less biased profiles if it could sample at any time with equal probability. However, even if we're not waiting for a guest exit, we still need the guest to stop mutating the stack while we walk it, which suggests doing the work from a signal handler.
Implementation
This is tricky since signal handlers can't take locks or allocate memory. So we need to be able to walk the stack, record program counters, and pass the list of PCs somewhere else, without doing either of those things. All storage and any indications of where to send the results need to be accessible by the signal handler from thread-local storage.
Some possible implementations:
- Pre-allocate a ring-buffer to share between the signal handler and consumer. Use atomics to manipulate the head and tail, and to ensure that writes into the buffer are visible to the consumer before it sees the tail move. Preferably, wake the consumer thread after each write.
- Write program counters to a file descriptor, such as a pipe or socket-pair. Avoid using Rust standard library I/O since it may take locks.
I'm not sure how any of this could work on Windows, but it would be preferable to have it work on all platforms.
Alternatives
When profiling, we could add a trampoline around every wasm call which maintains some call-stack data structure that can be cloned from a signal handler without taking locks. (An
Arc<Vec>
might work.) Using the guest profiler normally already requires specific codegen options (such as enabling epoch interruption) so it's reasonable to require special codegen for this case.The trampoline would record the PC of its caller before calling the real callee, then pop that PC before returning. If there are other references when the guest needs to update the stack, then it must allocate a new copy of the current stack trace, but that's okay because it's not running in signal-handler context.
Then the signal handler just needs to record the PC of the instruction it interrupted, plus the pointer to the cloned stack, passing these to the consumer via a wait-free single-producer queue. Making the stack traces make sense when the signal handler interrupts the trampoline is a tricky detail here.
The trade-off of this alternative is that it makes function calls slower and so has a larger impact on the profiled guest's performance.
Last updated: Jan 24 2025 at 00:11 UTC