cfallin opened issue #11896:
In general when designing our guest debugger functionality, we would like to balance a few requirements:
- It should be relatively straightforward to compose debugging functionality on top of an existing embedder / "main Wasm invocation"; i.e., it should not require deep surgery or awkward refactors, or only work in the Wasmtime CLI.
- It should be possible to provide access to the
Storeto the debugger, including mutability. This is needed for eventual "mutable debugger commands" (e.g., updating locals' values) but also even for any access to GC objects (because of the root-set).- The whole
Storeshould pause when the debugger has control, so it can observe state without racing with other tasks.- The debugger needs to be able to run with access to IO, which in a Rust context almost always means it needs to have async entry points.
We have plans to place the debugger implementation mostly inside a Wasm component, which gives us a little more flexibility to have an "ugly API" underneath, but even still, the closer we get the native host API and paradigm to what the eventual Wasm API describes, the less painful and error-prone the glue will be.
All of these requirements generally push toward a "coroutine"-style async design. In our RFC and in a draft PR (#11826), we have sketched out a general approach to a debug API that contains a "debugger" and "debuggee" as two entities that bounce control back and forth. This is naturally rendered in Rust with an API that literally provides an async API that yields a stream of "debug events", with the debuggee stopped whenever an event is received and running whenever the debugger is polling for the next event. Such an API allows for a nice debugger implementation style: it can keep its main loop in one place, and access the store directly when the debuggee is paused.
Unfortunately, through a bunch of conversations, we have determined that this is not sound as implemented in that draft PR. The PR "teleports" a borrow of the
Storeoutward from an async yield point, where it performs a fiber yield, back to aDebugSession(wrapping the store) on which anasync fn next()was invoked to get the next debug event. The idea was that thenext()invocation exclusively owns the store while we pass control back to the guest; when it returns, we can return ownership of the store back to the debugger; this is more-or-less like passing a mutable reborrow of the store to a hostcall, except that we plumb it back out to the surface. We could even get the provenance right by passing (via a raw pointer) the reborrow outward. However...Unfortunately,
Futurecombinators and dropped futures are a thing, and there is a bad case with a "host code sandwich". Consider: debugger context calls Wasm, calls async host code, calls Wasm. In the second Wasm activation, we hit a debug event. We could yield all the way back up to the debugger and pass a reborrowedStore; but that yield control flow passes through the async host code, by way of aPoll::Pending. That async host code may implement some arbitrary future combinator that chooses to (for example) drop the future, in which case we have a dangling reference to the store and the rest of the debug state we were supposed to examine (e.g. stack frames). One could try to patch this up by holding fibers via reference counting and keeping the fibers alive when paused for debugging; but at that point, we have discovered that...... we are reinventing a bunch of mechanisms in the component-model async implementation. In particular, (i) the
Accessormechanism allows for ownership passing of theStore(timeslicing such that access only exists during one poll, with no borrows persisted across suspends) in a way that is already vetted; and (ii) the task model gives us a first-class way to note that a stack is paused for debugging, and keep it alive. (I'm less sure about the details of (ii), but in principle, the concurrent scheduler is a little tiny OS kernel and we can build the moral equivalent ofptracepauses there, I think.)Given all that, the eventual plan is something like:
- Build a mechanism to set up a concurrent environment with an async debugger that receives a stream of events and can access the store via
Accessor. The debugger itself needs to be within the context of therun_concurrentinvocation, but separate/privileged: all tasks except for the debugger pause when the debugger has control.- Update any point in the Wasmtime runtime that needs to yield a debug event to use the "new-style" async mechanisms, i.e.
Accessor, to safely give control of theStoreback to the debugger.- Move over the "top half" of the debugger that we plan to temporarily build on #11895 and remove that API.
cfallin commented on issue #11896:
cc @alexcrichton @fitzgen -- if I missed anything from our discussion, feel free to add!
fitzgen commented on issue #11896:
Thanks for writing this up!
One small clarification:
It should be possible to provide access to the
Storeto the debugger, including mutability. This is needed for eventual "mutable debugger commands" (e.g., updating locals' values) but also even for any access to GC objects (because of the root-set).Note that even reading Wasm state, without mutating it, will mutate store internals (caches and arenas and such) and therefore requires a
AsStoreContextMut, e.g.:So giving the debugger APIs mutable access to the
Storeisn't something that can be delayed until we get around to adding support for debugging GC objects or mutation from the debugger comes at a later point; it must be part of the initial support for reading Wasm state.
cfallin commented on issue #11896:
Yep, and to your point that it's already needed, the
Store::debug_framesAPI already landed as part of the debug instrumentation requires a mutable context already, in order to read out (and root) GC refs from the stack. So we'd have to even regress on what we already have to build an immutable variant.
alexcrichton commented on issue #11896:
cc @dicej on this as well
I suspect that in the near future my job is going to be to reconcile the two async models we have in Wasmtime (e.g.
call_{async,concurrent}) and getting the component-model-async bits to more-or-less work in core wasm as well. I have talked with all y'all about this but wanted to write it down here too.
Last updated: Dec 06 2025 at 07:03 UTC