wasmtime / PR #11826 Debugging: add an async debug-step-r... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #11826 Debugging: add an async debug-step-r...

Wasmtime GitHub notifications bot (Oct 09 2025 at 07:32):

cfallin opened PR #11826 from cfallin:wasmtime-debug-traps to bytecodealliance:main:

(Stacked on top of #11769.)

As part of the new guest-debugging API, we want to allow the host to
execute the debugged guest code asynchronously, receiving its "debug
step" results each time a debugging-relevant event occurs. In the
fullness of time, this will include: traps, thrown exceptions,
breakpoints and watchpoints hit, single-steps, etc.

As a first step, this PR adds:

A notion of running an asynchronous function call in a "debug
session";

An API on that debug session object (which owns the store while the
function is running) that provides an async method to get the next
DebugStepResult;

An implementation that transmutes traps into a debug-step result,
allowing introspection of the guest state before the trap tears down
its stack;

Access to the stack introspection API provided by https://github.com/bytecodealliance/wasmtime/pull/11769.

The implementation works by performing call injection from the signal
handler. The basic idea is that rather than perform an exception resume
from the signal handler, directly rewriting register state to unwind all
Wasm frames and return the error code to the host, we rewrite register
state to redirect to a handwritten assembly stub. This stub cannot
assume anything about register state (because we don't enforce any
constraints on register state at all the points that trapping signals
could occur); thus, it has to save every register. To allow this
trampoline to do anything at all, we inject a few parameters to it; the
original values of the parameter registers, as well as the original PC
(location of the trap), are saved to the store so they can be restored
into the register-save frame before the injected stub returns (if it
does).

The injected stub can then call into the runtime to perform a
fiber-suspend, setting a "debug yield" value that indicates that a trap
occurred.

A few notes on design constraints that forced my hand in several ways:

We need to inject a call by rewriting only register state, not pushing
a new frame from within the stack handler, because it appears that
Windows vectored exception handlers run on the same stack as the guest
and so there is no room to push an additional frame.

We need access to the store from the signal context now; we can get
this from TLS if we add a raw backpointer from VMStoreContext to
StoreOpaque. I believe we aren't committing any serious pointer
provenance or aliasing-rules crimes here, because dynamically we are
taking ownership of the store back when we're running within the
signal context (it's as if it was passed as an argument, via a very
circuitous route), but I could very well be wrong. I hope we can find
another working approach if so!

The trap suspend protocol looks a little like a resumable trap but
only because we need to properly tear down the future (otherwise we
get a panic on drop). Basically we resume back, and if the trap was a
non-resumable trap, the assembly stub returns not to the original PC
but the PC of another stub that does the original
resume-to-entry-handler action.

Everything is set up here for resumable traps (e.g. for breakpoints) to
also work, but I haven't implemented that yet; that's the next PR (and
requires some other machinery, most notably a private copy of code
memory and the ability to edit and re-publish it; and metadata to
indicate where to patch in breaks; and a pc += BREAK_SIZE somewhere to
skip over on resume).

This is a draft that works on Linux on x86-64; I still need to implement

[ ] aarch64, riscv64, s390x assembly stubs

[ ] Windows and macOS updates to trap handlers

[ ] equivalent behavior on the raise libcall too, not just
signal-based traps

but I wanted to post it now to communicate the current direction and get
any early feedback.

Wasmtime GitHub notifications bot (Oct 09 2025 at 07:35):

cfallin updated PR #11826.

Wasmtime GitHub notifications bot (Oct 09 2025 at 07:38):

cfallin updated PR #11826.

Wasmtime GitHub notifications bot (Oct 09 2025 at 09:45):

github-actions[bot] commented on PR #11826:

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

[ ] If you added a new Config method, you wrote extensive documentation for
it.

<details>

Our documentation should be of the following form:

```text
Short, simple summary sentence.

More details. These details can be multiple paragraphs. There should be
information about not just the method, but its parameters and results as
well.

Is this method fallible? If so, when can it return an error?

Can this method panic? If so, when does it panic?

Example

Optional example here.
```

</details>

[ ] If you added a new Config method, or modified an existing one, you
ensured that this configuration is exercised by the fuzz targets.

<details>

For example, if you expose a new strategy for allocating the next instance
slot inside the pooling allocator, you should ensure that at least one of our
fuzz targets exercises that new strategy.

Often, all that is required of you is to ensure that there is a knob for this
configuration option in [wasmtime_fuzzing::Config][fuzzing-config] (or one
of its nested structs).

Rarely, this may require authoring a new fuzz target to specifically test this
configuration. See [our docs on fuzzing][fuzzing-docs] for more details.

</details>

[ ] If you are enabling a configuration option by default, make sure that it
has been fuzzed for at least two weeks before turning it on by default.

[fuzzing-config]: https://github.com/bytecodealliance/wasmtime/blob/ca0e8d0a1d8cefc0496dba2f77a670571d8fdcab/crates/fuzzing/src/generators.rs#L182-L194
[fuzzing-docs]: https://docs.wasmtime.dev/contributing-fuzzing.html

<details>

To modify this label's message, edit the <code>.github/label-messager/wasmtime-config.md</code> file.

To add new label messages or remove existing label messages, edit the
<code>.github/label-messager.json</code> configuration file.

Learn more.

</details>

Wasmtime GitHub notifications bot (Oct 10 2025 at 22:08):

cfallin updated PR #11826.

Wasmtime GitHub notifications bot (Oct 10 2025 at 23:14):

cfallin updated PR #11826.

Wasmtime GitHub notifications bot (Oct 10 2025 at 23:37):

cfallin updated PR #11826.

Wasmtime GitHub notifications bot (Oct 11 2025 at 00:13):

cfallin updated PR #11826.

Wasmtime GitHub notifications bot (Oct 12 2025 at 01:54):

cfallin updated PR #11826.

Wasmtime GitHub notifications bot (Oct 12 2025 at 06:22):

cfallin updated PR #11826.

Wasmtime GitHub notifications bot (Oct 12 2025 at 22:13):

cfallin edited PR #11826:

(Stacked on top of #11769.)

As part of the new guest-debugging API, we want to allow the host to
execute the debugged guest code asynchronously, receiving its "debug
step" results each time a debugging-relevant event occurs. In the
fullness of time, this will include: traps, thrown exceptions,
breakpoints and watchpoints hit, single-steps, etc.

As a first step, this PR adds:

A notion of running an asynchronous function call in a "debug
session";

An API on that debug session object (which owns the store while the
function is running) that provides an async method to get the next
DebugStepResult;

An implementation that transmutes traps into a debug-step result,
allowing introspection of the guest state before the trap tears down
its stack;

Access to the stack introspection API provided by https://github.com/bytecodealliance/wasmtime/pull/11769.

The implementation works by performing call injection from the signal
handler. The basic idea is that rather than perform an exception resume
from the signal handler, directly rewriting register state to unwind all
Wasm frames and return the error code to the host, we rewrite register
state to redirect to a handwritten assembly stub. This stub cannot
assume anything about register state (because we don't enforce any
constraints on register state at all the points that trapping signals
could occur); thus, it has to save every register. To allow this
trampoline to do anything at all, we inject a few parameters to it; the
original values of the parameter registers, as well as the original PC
(location of the trap), are saved to the store so they can be restored
into the register-save frame before the injected stub returns (if it
does).

The injected stub can then call into the runtime to perform a
fiber-suspend, setting a "debug yield" value that indicates that a trap
occurred.

A few notes on design constraints that forced my hand in several ways:

We need to inject a call by rewriting only register state, not pushing
a new frame from within the stack handler, because it appears that
Windows vectored exception handlers run on the same stack as the guest
and so there is no room to push an additional frame.

We need access to the store from the signal context now; we can get
this from TLS if we add a raw backpointer from VMStoreContext to
StoreOpaque. I believe we aren't committing any serious pointer
provenance or aliasing-rules crimes here, because dynamically we are
taking ownership of the store back when we're running within the
signal context (it's as if it was passed as an argument, via a very
circuitous route), but I could very well be wrong. I hope we can find
another working approach if so!

The trap suspend protocol looks a little like a resumable trap but
only because we need to properly tear down the future (otherwise we
get a panic on drop). Basically we resume back, and if the trap was a
non-resumable trap, the assembly stub returns not to the original PC
but the PC of another stub that does the original
resume-to-entry-handler action.

Everything is set up here for resumable traps (e.g. for breakpoints) to
also work, but I haven't implemented that yet; that's the next PR (and
requires some other machinery, most notably a private copy of code
memory and the ability to edit and re-publish it; and metadata to
indicate where to patch in breaks; and a pc += BREAK_SIZE somewhere to
skip over on resume).

This is a draft that works on Linux on x86-64; I still need to implement

[x] aarch64, riscv64, s390x assembly stubs

[ ] Windows and macOS updates to trap handlers

[ ] equivalent behavior on the raise libcall too, not just
signal-based traps

but I wanted to post it now to communicate the current direction and get
any early feedback.

Wasmtime GitHub notifications bot (Oct 12 2025 at 22:13):

cfallin edited PR #11826:

(Stacked on top of #11769.)

As part of the new guest-debugging API, we want to allow the host to
execute the debugged guest code asynchronously, receiving its "debug
step" results each time a debugging-relevant event occurs. In the
fullness of time, this will include: traps, thrown exceptions,
breakpoints and watchpoints hit, single-steps, etc.

As a first step, this PR adds:

A notion of running an asynchronous function call in a "debug
session";

An API on that debug session object (which owns the store while the
function is running) that provides an async method to get the next
DebugStepResult;

An implementation that transmutes traps into a debug-step result,
allowing introspection of the guest state before the trap tears down
its stack;

Access to the stack introspection API provided by https://github.com/bytecodealliance/wasmtime/pull/11769.

The implementation works by performing call injection from the signal
handler. The basic idea is that rather than perform an exception resume
from the signal handler, directly rewriting register state to unwind all
Wasm frames and return the error code to the host, we rewrite register
state to redirect to a handwritten assembly stub. This stub cannot
assume anything about register state (because we don't enforce any
constraints on register state at all the points that trapping signals
could occur); thus, it has to save every register. To allow this
trampoline to do anything at all, we inject a few parameters to it; the
original values of the parameter registers, as well as the original PC
(location of the trap), are saved to the store so they can be restored
into the register-save frame before the injected stub returns (if it
does).

The injected stub can then call into the runtime to perform a
fiber-suspend, setting a "debug yield" value that indicates that a trap
occurred.

A few notes on design constraints that forced my hand in several ways:

We need to inject a call by rewriting only register state, not pushing
a new frame from within the stack handler, because it appears that
Windows vectored exception handlers run on the same stack as the guest
and so there is no room to push an additional frame.

We need access to the store from the signal context now; we can get
this from TLS if we add a raw backpointer from VMStoreContext to
StoreOpaque. I believe we aren't committing any serious pointer
provenance or aliasing-rules crimes here, because dynamically we are
taking ownership of the store back when we're running within the
signal context (it's as if it was passed as an argument, via a very
circuitous route), but I could very well be wrong. I hope we can find
another working approach if so!

The trap suspend protocol looks a little like a resumable trap but
only because we need to properly tear down the future (otherwise we
get a panic on drop). Basically we resume back, and if the trap was a
non-resumable trap, the assembly stub returns not to the original PC
but the PC of another stub that does the original
resume-to-entry-handler action.

Everything is set up here for resumable traps (e.g. for breakpoints) to
also work, but I haven't implemented that yet; that's the next PR (and
requires some other machinery, most notably a private copy of code
memory and the ability to edit and re-publish it; and metadata to
indicate where to patch in breaks; and a pc += BREAK_SIZE somewhere to
skip over on resume).

This is a draft that works on Linux on x86-64; I still need to implement

[x] aarch64, riscv64, s390x assembly stubs

[ ] Windows and macOS updates to trap handlers

[x] equivalent behavior on the raise libcall too, not just
signal-based traps

but I wanted to post it now to communicate the current direction and get
any early feedback.

Wasmtime GitHub notifications bot (Oct 12 2025 at 22:29):

cfallin commented on PR #11826:

This now has support for all our native architectures, but not macOS or Windows; integrating with the separate exception-handling thread on macOS is proving to be a little unexpectedly interesting and I think the form it may take is that the call-injection machinery I've built here will subsume the existing (non-state-preserving, for-unwinding-traps-only) call injection on macOS. I haven't looked in detail at Windows yet as I'll have to dust off my Windows VM (for the first time since implementing fastcall in 2021!) but I hope the only tricky bit there will be adding a fastcall variant of the x86-64 stub.

One interesting bit that might be good to discuss (cc @alexcrichton / @fitzgen) is the actual API for the "debug step" protocol. I'm relatively happy with the DebugSession in the current PR, with the async fn next(..) -> Option<DebugStepResult> that runs until a trap or exception or breakpoint or ... event. The dynamic store ownership protocol basically works with the safe Rust restrictions there too: one can get at the store only when the Wasm code yields, which is morally like a hostcall that passes a reborrowed &mut Store back. One can then read all store-owned state until one resumes. (To allow the debugger to take control back when running, the plan is that this will compose fine with epochs; we can make an epoch change a debug step event too.) There's the separate issue I wrote up in #11835 about whether "access to store during yields" means StoreOpaque or the whole Store but that's not the issue here.

The thing that I am finding interesting is how to enter a debug session. Right now I have a Func::call_debug that is like call_async but returns a DebugSession, not a future directly. That's fine but feels pretty ad-hoc, and importantly, will not compose with any wit-bindgen-generated host-side glue. For example, attaching a debugger to a WASI CLI-world or HTTP-world component won't be directly possible because the raw calls are inside generated code. So instead I'm considering an alternative (which was actually my first draft before getting lost in Futures Hell and finding an exit to this current world):
let session = store.with_debugger(|store| async {
  // ...
  wasi_bindings.main(&mut store);
  // ...
  Ok(())
});

while let Some(step) = session.next().await {
  update_debug_ui(step);
  update_memory_view(mem.data(&mut session));
  // ...
}
The idea here is that there is that the session wraps an inner arbitrary future that runs with the store. I was tripped up before about the store dynamic ownership-passing protocol but the idea above that debug-steps are morally like hostcalls, so a debug yield passes ownership back, seems to free us from that question. What do you think?

(In the current implementation, nested debug sessions are forbidden dynamically, and the debug session sees only one Wasm activation deep i.e. from Wasm entry to Wasm exit and any hostcall is an atomic step; these simplifying restrictions are important to the coherency of the above too, IMHO.)

Wasmtime GitHub notifications bot (Oct 12 2025 at 22:30):

cfallin edited a comment on PR #11826:

This now has support for all our native architectures, but not macOS or Windows; integrating with the separate exception-handling thread on macOS is proving to be a little unexpectedly interesting and I think the form it may take is that the call-injection machinery I've built here will subsume the existing (non-state-preserving, for-unwinding-traps-only) call injection on macOS. I haven't looked in detail at Windows yet as I'll have to dust off my Windows VM (for the first time since implementing fastcall in 2021!) but I hope the only tricky bit there will be adding a fastcall variant of the x86-64 stub.

One interesting bit that might be good to discuss (cc @alexcrichton / @fitzgen) is the actual API for the "debug step" protocol. I'm relatively happy with the DebugSession in the current PR, with the async fn next(..) -> Option<DebugStepResult> that runs until a trap or exception or breakpoint or ... event. The dynamic store ownership protocol basically works with the safe Rust restrictions there too: one can get at the store only when the Wasm code yields, which is morally like a hostcall that passes a reborrowed &mut Store back. One can then read all store-owned state until one resumes. (To allow the debugger to take control back when running, the plan is that this will compose fine with epochs; we can make an epoch change a debug step event too.) There's the separate issue I wrote up in #11835 about whether "access to store during yields" means StoreOpaque or the whole Store but that's not the issue here.

The thing that I am finding interesting is how to enter a debug session. Right now I have a Func::call_debug that is like call_async but returns a DebugSession, not a future directly. That's fine but feels pretty ad-hoc, and importantly, will not compose with any wit-bindgen-generated host-side glue. For example, attaching a debugger to a WASI CLI-world or HTTP-world component won't be directly possible because the raw calls are inside generated code. So instead I'm considering an alternative (which was actually my first draft before getting lost in Futures Hell and finding an exit to this current world):
let session = store.with_debugger(|store| async {
  // ...
  wasi_instance.main(&mut store);
  // ...
  Ok(())
});

while let Some(step) = session.next().await {
  update_debug_ui(step);
  update_memory_view(mem.data(&mut session));
  // ...
}
The idea here is that there is that the session wraps an inner arbitrary future that runs with the store. I was tripped up before about the store dynamic ownership-passing protocol but the idea above that debug-steps are morally like hostcalls, so a debug yield passes ownership back, seems to free us from that question. What do you think?

(In the current implementation, nested debug sessions are forbidden dynamically, and the debug session sees only one Wasm activation deep i.e. from Wasm entry to Wasm exit and any hostcall is an atomic step; these simplifying restrictions are important to the coherency of the above too, IMHO.)

Wasmtime GitHub notifications bot (Oct 13 2025 at 06:19):

cfallin updated PR #11826.

Wasmtime GitHub notifications bot (Oct 13 2025 at 18:44):

alexcrichton commented on PR #11826:

integrating with the separate exception-handling thread on macOS is proving to be a little unexpectedly interesting

One idea to work with this is that, for all platforms, when the signal handler updates state to the trampoline to call out to the host anything clobbered is pushed to the stack instead of saved in the store. For example the stack pointer would be decremented by 32, the first 16 bytes being the saved return address/frame pointer (pretending it's a called frame) and the next 16 bytes would be 2 clobbered registers or something like that. That would work on macOS and all other platforms as well and means that the store isn't necessary in the signal handler routine at least.

Also, somewhat orthogonal, but I don't think that the asm stubs need to save all registers, only the caller-saved ones according to the native ABI, right?

The thing that I am finding interesting is how to enter a debug session

I'm not sure of a way other than what you've described you've done in this PR already with a call_debug. The call/call_async interfaces effectively fundamentally don't do what you want which is that they take and "lock" the store for the entire duration of the call. There's no way to interrupt the call halfway through and get the store back at the caller side. This works for host imports because once within the future we can temporarily loan the store to the host during a host call, but that doesn't work for giving the store back to the original caller. I do agree though that call_debug is not great and doesn't compose well with generated bindings, so I agree it'd be worthwhile to try to fix this.

What might work best is to go ahead and sketch out call_debug and test/implement with that for now and we can brainstorm later about a possible alternative. My suspicion is that it's going to look like run_concurrent from the component-model-async proposal.

Wasmtime GitHub notifications bot (Oct 13 2025 at 19:26):

cfallin commented on PR #11826:

Also, somewhat orthogonal, but I don't think that the asm stubs need to save all registers, only the caller-saved ones according to the native ABI, right?

Ah, in this case we do actually need to save everything: we're interrupting guest code and we don't have regalloc clobbers on the trap-causing instruction so we need to effectively do a full context switch. (Including vector registers, so this is somewhat heavyweight.)

More is in this comment in this PR.

I'm not sure of a way other than what you've described you've done in this PR already with a call_debug.

What do you think about the with_debugger sketch above?

There's no way to interrupt the call halfway through and get the store back at the caller side.

I guess this is what I'm trying to get at with

The dynamic store ownership protocol basically works with the safe Rust restrictions there too: one can get at the store only when the Wasm code yields, which is morally like a hostcall that passes a reborrowed &mut Store back.

and also restated over in this comment; a hostcall is effectively an interrupt to a call, and so if one sees any debug-step yield that occurs at a trapping instruction as a fancy way of that instruction "calling" back to the host, I think this should actually work. Very important is the way that the lifetimes are tied together on the async fn next on the session: it takes a Pin<&mut Self> with the implicit lifetime there tied to the future, so it does own the store until the future is ready; but the future becomes ready (async fn returns) every time a "debug step result" / debug event occurs, which is effectively such a hostcall. Does that make sense? I think this capability is pretty important for the feasibility of the whole enterprise here so I'm happy to try to explain it another way if needed :-)

Wasmtime GitHub notifications bot (Oct 13 2025 at 20:13):

fitzgen commented on PR #11826:

The thing that I am finding interesting is how to _enter_ a debug session. Right now I have a Func::call_debug that is like call_async but returns a DebugSession, not a future directly. That's fine but feels pretty ad-hoc, and importantly, will not compose with any wit-bindgen-generated host-side glue. For example, attaching a debugger to a WASI CLI-world or HTTP-world component won't be directly possible because the raw calls are inside generated code. So instead I'm considering an alternative (which was actually my first draft before getting lost in Futures Hell and finding an exit to this current world):

This API makes sense to me, modulo bike shedding the exact naming and such.

We could alternatively, if we wanted to rearrange some deck chairs, make the API a callback on the Store that is given the debugging-equivalent of Caller and a step/break/etc event, instead of designing the API as a coroutine that returns many step/break/etc events until the computation completes. This is essentially what SpiderMonkey's Debugger API exposes: when you set a breakpoint, for example, you provide a callback that is invoked with the Debugger.Frame object when the breakpoint is hit (for us it would be that and the Caller) and you return a "continuation value" which is morally enum { Return(Value), Throw(Value), Panic }. This is potentially easier to integrate transparently with existing API usage (e.g. an existing call into host bindgen! code).

But these two approaches are basically the same at the end of the day, and we should be able to make either work if we can make one of them work.

Wasmtime GitHub notifications bot (Oct 13 2025 at 20:14):

fitzgen commented on PR #11826:

(In the current implementation, nested debug sessions are forbidden dynamically, and the debug session sees only one Wasm activation deep i.e. from Wasm entry to Wasm exit and any hostcall is an atomic step; these simplifying restrictions are important to the coherency of the above too, IMHO.)

Callbacks, rather than coroutines, should also Just Work for multiple activations, I think.

Wasmtime GitHub notifications bot (Oct 13 2025 at 20:26):

cfallin commented on PR #11826:

That's fair, yeah; the thing I am trying to aim for is a nice API for the debugger main event loop, and a callback-based approach would have to timeslice debugger and debuggee at the top level and use a channel to push events from the callback, then pause if waiting for a "continue" token; this is also more awkward in a world that we have the debugger component using all this from behind a wit interface. Whereas the async coroutine approach unfolds this in a way that can work in a single thread without channels; the program-under-test is "just" a thing that one can poll for the next output. But, either could work.

Wasmtime GitHub notifications bot (Oct 15 2025 at 04:51):

cfallin updated PR #11826.

Wasmtime GitHub notifications bot (Oct 15 2025 at 06:49):

github-actions[bot] commented on PR #11826:

Subscribe to Label Action

cc @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "pulley", "wasmtime:api", "wasmtime:config"

Thus the following users have been cc'd because of the following labels:

fitzgen: pulley

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Nov 01 2025 at 18:20):

cfallin closed without merge PR #11826.

Wasmtime GitHub notifications bot (Nov 01 2025 at 18:20):

cfallin commented on PR #11826:

I'm closing this for now but I'll keep the branch around -- I'm going to write up an issue describing a simpler path, but we can keep the call-injection stubs around for future performance work one day, if we need them.

Last updated: Feb 24 2026 at 04:36 UTC