Stream: git-wasmtime

Topic: wasmtime / issue #7664 Allow stack-walking from a signal ...


view this post on Zulip Wasmtime GitHub notifications bot (Dec 08 2023 at 21:30):

jameysharp added the wasmtime label to Issue #7664.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 08 2023 at 21:30):

jameysharp added the enhancement label to Issue #7664.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 08 2023 at 21:30):

jameysharp opened issue #7664:

Feature

We can currently only walk the wasm stack after exiting from the guest, whether because of a trap, a host-call, or an epoch or fuel interruption. I would like to be able to walk the wasm stack from a signal handler, such as a timer.

Benefit

Wasmtime's guest profiler can currently only take samples on guest exits because it can't collect a stack trace at any other time. When used with epoch interruptions, that biases it to observing execution only at function calls and loop back-edges. It would produce less biased profiles if it could sample at any time with equal probability. However, even if we're not waiting for a guest exit, we still need the guest to stop mutating the stack while we walk it, which suggests doing the work from a signal handler.

Implementation

This is tricky since signal handlers can't take locks or allocate memory. So we need to be able to walk the stack, record program counters, and pass the list of PCs somewhere else, without doing either of those things. All storage and any indications of where to send the results need to be accessible by the signal handler from thread-local storage.

Some possible implementations:

I'm not sure how any of this could work on Windows, but it would be preferable to have it work on all platforms.

Alternatives

When profiling, we could add a trampoline around every wasm call which maintains some call-stack data structure that can be cloned from a signal handler without taking locks. (An Arc<Vec> might work.) Using the guest profiler normally already requires specific codegen options (such as enabling epoch interruption) so it's reasonable to require special codegen for this case.

The trampoline would record the PC of its caller before calling the real callee, then pop that PC before returning. If there are other references when the guest needs to update the stack, then it must allocate a new copy of the current stack trace, but that's okay because it's not running in signal-handler context.

Then the signal handler just needs to record the PC of the instruction it interrupted, plus the pointer to the cloned stack, passing these to the consumer via a wait-free single-producer queue. Making the stack traces make sense when the signal handler interrupts the trampoline is a tricky detail here.

The trade-off of this alternative is that it makes function calls slower and so has a larger impact on the profiled guest's performance.


Last updated: Nov 22 2024 at 16:03 UTC