Stream: git-wasmtime

Topic: wasmtime / PR #11895 Debugging: add a debugger callback m...


view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 01:26):

cfallin requested alexcrichton for a review on PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 01:26):

cfallin opened PR #11895 from cfallin:wasmtime-debug-callbacks to bytecodealliance:main:

This PR adds a notion of "debug events", and a mechanism in Wasmtime to associate a "debug handler" with a store such that the handler is invoked as-if it were an async hostcall on each event. The async handler owns the store while its future exists, so the whole "world" (within the store) is frozen and the handler can examine any state it likes with a StoreContextMut.

Note that this callback-based scheme is a compromise: eventually, we would like to have a native async API that produces a stream of events, as sketched in #11826 and in [this branch]. However, the async approach implemented naively (that is, with manual fiber suspends and with state passed on the store) suffers from unsoundness in the presence of dropped futures. Alex, Nick and I discussed this extensively and agreed that the Accessor mechanism is the right way to allow for a debugger to have "timesliced"/"shared" access to a store (only when polled/when an event is delivered), but we will defer that for now, because it requires additional work (mainly, converting existing async yield points in the runtime to "give up" the store with the run_concurrent mechanism). I'll file a followup issue to track that. The idea is that we can eventually build that when ready, but the API we provide to a debugger component can remain unchanged; only this plumbing and the glue to the debugger component will be reworked.

With this scheme based on callbacks, we expect that one should be able to implement a debugger using async channels to communicate with the callback. The idea is that there would be a protocol where the callback sends a debug event to the debugger main loop elsewhere in the executor (e.g., over a Tokio channel or other async channel mechanism), and when the debugger wants to allow execution to continue, it sends a "continue" message back. In the meantime, while the world is paused, the debugger can send messages to the callback to query the StoreContextMut it has and read out state. This indirection/proxying of Store access is necessary for soundness: again, teleporting the Store out may look like it almost works ("it is like a mutable reborrow on a hostcall") except in the presence of dropped futures with sandwiched Wasm->host->Wasm situations.

This PR implements debug events for a few cases that can be caught directly in the runtime, e.g., exceptions and traps raised just before re-entry to Wasm. Other kinds of traps, such as those normally implemented by host signals, require additional work (as in #11826) to implement "hostcall injection" on signal reception; and breakpoints will be built on top of that. The point of this PR is only to get the initial plumbing in place for events.

Thanks to Alex for help on working how how to keep this from requiring T: Send throughout Wasmtime!

[this branch]: https://github.com/cfallin/wasmtime/tree/wasmtime-debug-async

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 01:26):

cfallin requested wasmtime-core-reviewers for a review on PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 02:15):

cfallin updated PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 06:43):

cfallin updated PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 19:09):

fitzgen created PR review comment:

I don't think we ever use the term "activation" in our public docs today, just in the internal unwinding and CallThreadState docs. That isn't to say we shouldn't use it here, but we probably want to define what it means in parens or something.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 19:09):

fitzgen created PR review comment:

Nice, IIUC we don't even need out own unsafe block whose correctness is justified via DebugHandler: Send because everything falls out from the existing block_on, right?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 19:09):

fitzgen created PR review comment:

Nitpick: the docs should have a ## Panics section and note that this panics when async is not enabled.

And should this also be cfg'd on feature = "async"? Because feature = "debug" does not imply that right now (should it?)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 19:09):

fitzgen submitted PR review:

LGTM! Will hold off on adding to the merge queue until @alexcrichton has a chance to look at this.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 19:09):

fitzgen created PR review comment:

Should we add an Arc<anyhow::Error> field here (and update TrapReason::User accordingly) so that the debugger can inspect the host error? Not something we have to do now, in this PR, ofc. And if we are concerned about the extra allocation in TrapReaason::User, even when debugging is disabled, we could potentially do a Cow-esque thing where it is a regular anyhow::Error until it needs to be shared, at which point it gets put into an Arc.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 20:48):

alexcrichton submitted PR review:

Would it make sense to also start something like a mask of events for a debug handler? For example if a HostcallError isn't desired then that could get masked out while other events would still be processed.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 20:48):

alexcrichton created PR review comment:

Could this be -> impl Future instead of Box? That'd enable implementing this trait with async fn ...

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 20:48):

alexcrichton created PR review comment:

We'll definitely want to handle the result of block_on. If block_on returns an Err then that means that the fiber needs to exit ASAP (e.g. a dropped future, forcible cleanup, etc), meaning that if this is ignored then wasm might just continue on its merry way while the caller expects it to exit ASAP.

Basically the Result here should be plumbed to the caller.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 20:48):

alexcrichton created PR review comment:

One possible alternative: use &self here but have Clone as a supertrait of DebugHandler. That way we don't require Arc here (e.g. if the implementor is zero-sized) and the implementation internally would clone-out-of-the-store to avoid borrowing the store to be able to pass it to the context today (basically as what already happens here)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 20:48):

alexcrichton created PR review comment:

Could the other states be listed out in an exhaustive match with a comment why they don't have debug events?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 20:48):

alexcrichton created PR review comment:

Two events? Wouldn't UncaughtException always imply the former ThrownException?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 20:48):

alexcrichton created PR review comment:

Well, while you're technically true we're still lying. The stack frame above this has &mut VMStore on the stack and that's not Send and it's just invisible from rustc.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 20:48):

alexcrichton created PR review comment:

This is a subtle, but load bearing, where clause. Could the function have a Rust-level comment (e.g. //, not ///) indicating why? Basically explaining how a &mut dyn VMStore is gonna be on the stack and that is required to be Send, so therefore this must be Send.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 20:48):

alexcrichton created PR review comment:

To add to this, should this also assert! that the debug_guest is enabled?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 20:48):

alexcrichton created PR review comment:

Or, alternatively, this doesn't look like it needs to be owned, so this could be HostcallError(&'a anyhow::Error) maybe?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 21:46):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 21:46):

fitzgen created PR review comment:

I was assuming that the borrowing probably wouldn't work out, but if it does then yeah let's do that

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 21:48):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 21:48):

fitzgen created PR review comment:

To be sure I understand you correctly: you're effectively saying "push the arc into the T: DebugHandler, if necessary"? If so, I like that :+1:

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 21:50):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 21:50):

alexcrichton created PR review comment:

Indeed!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 22:03):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 22:03):

fitzgen created PR review comment:

Oh also: shouldn't this future (eventually?) result in a ContinuationCommand or whatever, where

enum ContinuationCommand {
    // Normal continue.
    Resume(Vec<Val>),

    // Exceptional continue.
    ResumeThrow(Vec<Val>),

    // Forcibly return from the current frame.
    ResumeFinish(Vec<Val>),

    // Terminate execution, raising a trap.
    Terminate(anyhow::Error), // or `wasmtime::Trap`?
}

?

Some variants might be invalid to return when handling some events, and we don't need to implement all variants immediately, but this is the place we would want to do this kind of thing, right?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 22:54):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 22:55):

cfallin created PR review comment:

Either that or a mutation API on the store to alter the return-point/state of the current activation; but honestly I do like the functional style here better.

Right now since we don't support mutating the execution at all, only observing, I don't think we need anything here since there is only one option (Resume with no alternative values specified) implied by return, no?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 22:57):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 22:57):

cfallin created PR review comment:

I did spend a bit of time in lifetime hell yesterday trying to get &'a anyhow::Error to work; it was really not gelling well with the future's lifetime so I gave up. I can try a bit more but I do like the Arc alternative as well from the external perspective at least.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:01):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:01):

fitzgen created PR review comment:

That sounds right to me, and also I guess ContinuationCommand::Resume shouldn't have a payload, only the other variants should.

I also like the "functional style" better than some roundabout mechanism to stash data on the side temporarily to pass things from a callee to a caller.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:29):

cfallin updated PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:29):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:29):

cfallin created PR review comment:

Updated in 08f1ba4a0d, along with a mention of activations in the doc-comment for Store::debug_frames as well.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:34):

cfallin updated PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:34):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:34):

cfallin created PR review comment:

Addressed in 08f1ba4a0dffdb1f4686d88c3be28b8f625202e6 and 28c63d58d7ce65f1b6871954735a11a291e1ab45; async is now a feature dependency of debug, assert was added, and added panic conditions for the two required settings.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:35):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:35):

cfallin created PR review comment:

Right; it's at least not a new kind of unsafety relative to the other places that we have fiber suspends from hostcalls, AFAIK.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:39):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:39):

cfallin created PR review comment:

Unfortunately no, because then the trait is not dyn-compatible -- or at least I don't know how to work around this:

error[E0038]: the trait `debug::DebugHandler` is not dyn compatible
   --> crates/wasmtime/src/runtime/store.rs:275:42
    |
275 |     pub(crate) debug_handler: Option<Arc<dyn DebugHandler<Data = T>>>,
    |                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^ `debug::DebugHandler` is not dyn compatible
    |
note: for a trait to be dyn compatible it needs to allow building a vtable
      for more information, visit <https://doc.rust-lang.org/reference/items/traits.html#dyn-compatibility>
   --> crates/wasmtime/src/runtime/debug.rs:512:10
    |
502 | pub trait DebugHandler: Send + Sync + 'static {
    |           ------------ this trait is not dyn compatible...
...
512 |     ) -> impl Future<Output = ()> + Send + 'a;
    |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ...because method `handle` references an `impl Trait` type in its return type
    = help: consider moving `handle` to another trait

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:45):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:45):

cfallin created PR review comment:

Unfortunately this seems to run afoul of dyn compatibility as well... happy to see it refactored in followup if there is some clever trait-constraint trickery that can work around this. I see dyn_clone and similar things but it seems that may not be worth it for a single use-case like this...

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:48):

cfallin created PR review comment:

My thought process that led here was: I initially wanted events on caught and uncaught exceptions; but we don't yet have the mechanisms to give a view of state at the catch-point, only the throw-point, so the former became "thrown exception"; but then the exception is thrown whether it's caught in Wasm or uncaught.

I suppose I could rename ThrownException to ThrownExceptionToWasm or something similar, but then an observer who wants to see all exception throws needs to watch two different events.

I do agree it's a little awkward to see two distinct hooks at one static point; maybe we just document what ThrownException means and invoke (only) that here...

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:48):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:49):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:49):

cfallin created PR review comment:

Cool -- I'll build this out once we have the other return options.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:52):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 21 2025 at 23:52):

cfallin created PR review comment:

Concretely, we're inside CallThreadState::unwind when we invoke this method; I guess we transmute an Err into a host unwind then? It's a little awkward as it overrides an existing unwind but I guess we can make that work.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:06):

cfallin updated PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:06):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:06):

cfallin created PR review comment:

OK, handled in 7e9d38686a.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:06):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:06):

cfallin created PR review comment:

Good idea -- handled in 7e9d38686a.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:06):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:06):

cfallin created PR review comment:

OK, renamed to CaughtExceptionThrown and UncaughtExceptionThrown in 7e9d38686a.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:10):

cfallin updated PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:10):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:10):

cfallin created PR review comment:

Yep definitely -- added in 50bb08e40e.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:11):

cfallin updated PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:12):

cfallin commented on PR #11895:

Would it make sense to also start something like a mask of events for a debug handler? For example if a HostcallError isn't desired then that could get masked out while other events would still be processed.

Maybe eventually, yeah -- that's a good idea. I might defer that to the point that we're building out the top half to see how we want to use it.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:18):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:18):

cfallin created PR review comment:

In particular it is always some variant of

error: lifetime may not live long enough
    --> crates/wasmtime/src/runtime/store.rs:2726:52
     |
2723 |     fn block_on_debug_handler(&mut self, event: crate::DebugEvent<'_>) -> anyhow::Result<()> {
     |                                          ----- has type `debug::DebugEvent<'1>`
...
2726 |             StoreContextMut(self).block_on(|store| Pin::from(handler.handle(store, event)))
     |                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returning this value requires that `'1` must outlive `'static`

even with

   507     /// Handle a debug event.
   508     fn handle<'a>(
   509         self: Arc<Self>,
   510         store: StoreContextMut<'a, Self::Data>,
   511         event: DebugEvent<'a>,
   512     ) -> Box<dyn Future<Output = ()> + Send + 'a>;

that should tie the lifetime inside the event to the store. Something to do with the anonymous lifetime inference with block_on; it feels like it should be possible but I don't have enough constraint-fu right now to make it work, sorry!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:22):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:22):

cfallin created PR review comment:

The Arc variant of this is pretty messy too: it requires taking apart the unwind-state to temporarily take ownership, then reconstructing it later. I think I'm going to defer this as well since the error will come out of the invocation immediately after the debug event anyway. If we can make the borrowing work I'm happy to see a refactor in a followup though!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:23):

cfallin updated PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 00:24):

cfallin commented on PR #11895:

OK, updated based on feedback -- final look before I merge? Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 15:09):

fitzgen submitted PR review:

LGTM, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 15:22):

alexcrichton commented on PR #11895:

With respect to clones, object-safety, lifetimes, and async fn, it's all possible. That at least compiles in the wasmtime crate, although I didn't go through tests yet. Would you prefer the design in this PR? Or ok if I have a follow-up to improve the API?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 15:29):

cfallin commented on PR #11895:

Ah, neat, an internal adapter that gets monomorphized in the top layer where we have T -- I'll incorporate that, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 20:35):

cfallin updated PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 20:36):

cfallin commented on PR #11895:

All of that worked fine, thanks! I pulled in your patch and then added some more comments to the trait to indicate that it should be cheap to clone, i.e., is recommended to be an Arc under the covers or something similar.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 20:36):

cfallin has enabled auto merge for PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 22:12):

cfallin updated PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 22:14):

cfallin commented on PR #11895:

I pushed a change to avoid this failure where execution in Pulley did not emit a debug event for a divide-by-zero; this is because the trap-handling path is ever so slightly different in Pulley. I'll address in a followup (we're very close and it'd be honestly easier to keep Pulley working throughout this process than to exclude it in config); the cfg to limit to Cranelift-native architectures is in 97336c19963fda33b229dcfbe33f9b880725fc4f.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 22:14):

cfallin has enabled auto merge for PR #11895.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 22 2025 at 22:49):

cfallin merged PR #11895.


Last updated: Dec 06 2025 at 06:05 UTC