wasmtime / PR #12183 Debugger: add top-level crate and as... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #12183 Debugger: add top-level crate and as...

Wasmtime GitHub notifications bot (Dec 19 2025 at 02:56):

cfallin requested alexcrichton for a review on PR #12183.

Wasmtime GitHub notifications bot (Dec 19 2025 at 02:56):

cfallin opened PR #12183 from cfallin:debugger-crate to bytecodealliance:main:

The debug support in Wasmtime so far is structured around async callbacks that occur at certain kinds of events, like breakpoints. This is a suitable foundation but makes for an awkward implementation of a top-level debugger implementation, which likely has an event loop dealing with user commands (via a UI or a protocol connection) and expects to perform actions such as "run until next breakpoint".

This PR introduces a new crate that wraps a Store in a Debugger. This wrapper embodies an inner async body that can perform whatever actions it likes on the Store that is passed back in. This inner body is spawned as an async task. The debugger wrapper registers its own DebugHandler callback that communicates with the outside world via bidirectional command/response queues.

On the "outside", the Debugger presents an interface suitable for inserting into a debug protocol server or UI: an async method that runs until next event and returns that event, and a method that permits querying or modifying the store whenever the run method is not executing. The latter operates by sending a closure over the queue, because the Store must continue to be owned by the async task that is (still) running and suspended in async callbacks.

Right now, this is exercised only via a few unit tests, but the intent is to next build up the "top half" of the debugger using this abstraction, e.g. by running a gdbstub protocol server (likely as a Wasm component in a "debug-main WIT world" -- RFC needed for this).

Also, when we eventually move debugging over to native use of run_concurrent, this paradigm should remain mostly unchanged at this level of API: there can still be an object that has an async method that runs and yields the next event, and there can still be a method that takes a closure that can operate (within its scope only) on the Store.

A few warts that I could use feedback on:

Cancelation safety is weird. Fibers panic when dropped before execution of their body completes, and this seems to mean that we can't allow a Debugger to drop early (or at least, the tokio::test unit test that owns the runtime that runs the async task to finish before the debugged body completes!). If there is a better way to handle cancelation safety here, I'm all ears.

It's not clear to me if the boxed-closure-and-Any approach to providing access to the Store is the best we can do, but I suspect it is.

Wasmtime GitHub notifications bot (Dec 19 2025 at 02:56):

cfallin requested wasmtime-core-reviewers for a review on PR #12183.

Wasmtime GitHub notifications bot (Dec 19 2025 at 02:56):

cfallin requested wasmtime-default-reviewers for a review on PR #12183.

Wasmtime GitHub notifications bot (Dec 19 2025 at 06:34):

cfallin updated PR #12183.

Wasmtime GitHub notifications bot (Dec 19 2025 at 17:43):

cfallin updated PR #12183.

Wasmtime GitHub notifications bot (Dec 19 2025 at 19:35):

fitzgen submitted PR review:

Have to context switch, but want to get my comments thus far posted.

Wasmtime GitHub notifications bot (Dec 19 2025 at 19:35):

fitzgen created PR review comment:

Could be good to have an ascii diagram of the state machine here. Assuming we start in the Paused state, effectively always beginning at a breakpoint just before the program's first instantiation, I'm thinking something like
  |
  |
  V
Paused <---------> Queried
  ^
  |
  |
  V
Running
  |
  |
  V
Complete
But now that I think about it some more, it seems like complete and paused are actually the same for "reactor" style programs? That is, those that are a bag of exported functions that can be called in any order, rather than those defining just a main. This is because you start out in the initial paused state, call into some exported function and do stuff and then that can complete but you can always call back in again and do more stuff and debug these subsequent entries into the program as well.

This also makes me think of something that I don't think we've discussed thus far in our debugging conversations: debugging the start function and instantiation in general. For example, what if a data segment's initialization traps due to out-of-bounds memory access? We want to support debugging that, but I don't think we raise a debugging trap event for that yet, do we? (Obviously, we don't need to address all that in this PR, but I think this state machine should reflect the actual states we want to support in the fullness of time, IMO). So given that I think we actually want something like
   |
   V
Uninstantiated <---.
   ^               |
   |               | (debugger says terminate)
   |               |
   V               |
Running            |
   ^               |
   |               |
   |               |
   V               |
Paused-------------'
   ^
   |
   |
   V
Queried
where uninstantiated to running involves instantiation, and returning from main or any other export that we called into as the entry point puts us back at paused (since we can inspect memories and such in this state, but can't in the uninstantiated state)

Wasmtime GitHub notifications bot (Dec 19 2025 at 19:35):

fitzgen created PR review comment:

Maybe name this Paused, because the name is a bit ambiguous with Complete otherwise (a program stops when it is finished, does it not?)

Wasmtime GitHub notifications bot (Dec 19 2025 at 19:50):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 19 2025 at 19:50):

cfallin created PR review comment:

Happy to add a state diagram!

One point of clarification regarding "before instantiation" or distinctions between different lifecycle states of a Wasm instance/component: this interface actually has no concept of instantiation. The "inner body" can be anything: it could create a bunch of instances, it could continue to call into an already-existing instance, etc. It inverts control so that the code on the outside gets events when the callback occurs, that's it; and the body doesn't run until the first run().

So, yes, we could add debug events for out-of-bounds accesses by data segments. Any breakpoints in the start function should already "just work" though in the sense that it'll be Wasm execution within the scope of the inner body.

Wasmtime GitHub notifications bot (Dec 19 2025 at 22:06):

alexcrichton submitted PR review:

this seems to mean that we can't allow a Debugger to drop early

This is definitely a bug in Wasmtime that we need to fix. Can you add a failing test or gist one that I can help poke around?

Wasmtime GitHub notifications bot (Dec 19 2025 at 22:06):

alexcrichton created PR review comment:

You can drop debug/std/async here since that'll come in through the normal dependency

Wasmtime GitHub notifications bot (Dec 19 2025 at 22:06):

alexcrichton created PR review comment:

should this be omitted?

Wasmtime GitHub notifications bot (Dec 19 2025 at 22:06):

alexcrichton created PR review comment:

This should be define-able with async fn handle(...) and you don't need to worry about Send

Wasmtime GitHub notifications bot (Dec 19 2025 at 23:09):

cfallin commented on PR #12183:

This is definitely a bug in Wasmtime that we need to fix. Can you add a failing test or gist one that I can help poke around?

Sure thing -- with this commit and
cargo test -p wasmtime-internal-debugger -- basic_debugger
I see
thread 'test::basic_debugger' (1129985) panicked at /home/cfallin/work/wasmtime/crates/wasmtime/src/runtime/fiber.rs:199:70:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Wasmtime GitHub notifications bot (Dec 19 2025 at 23:10):

cfallin commented on PR #12183:

(It's entirely possible that I'm holding some part of the async machinery wrong, either in this PR or in earlier work to add the async debugger hook -- happy to have your opinions on that if so!)

Wasmtime GitHub notifications bot (Dec 22 2025 at 16:34):

alexcrichton commented on PR #12183:

That'll get fixed by this:

diff --git a/crates/wasmtime/src/runtime/fiber.rs b/crates/wasmtime/src/runtime/fiber.rs
index 68dce6884f..1ab5120ef3 100644
--- a/crates/wasmtime/src/runtime/fiber.rs
+++ b/crates/wasmtime/src/runtime/fiber.rs
@@ -358,7 +358,7 @@ impl StoreOpaque {
     }

     /// Returns whether `block_on` will succeed or panic.
-    #[cfg(feature = "call-hook")]
+    #[cfg(any(feature = "call-hook", feature = "debug"))]
     pub(crate) fn can_block(&mut self) -> bool {
         self.fiber_async_state_mut().current_future_cx.is_some()
     }
diff --git a/crates/wasmtime/src/runtime/store.rs b/crates/wasmtime/src/runtime/store.rs
index 0ce69fc6c3..ac6929d376 100644
--- a/crates/wasmtime/src/runtime/store.rs
+++ b/crates/wasmtime/src/runtime/store.rs
@@ -2839,6 +2839,9 @@ unsafe impl<T> VMStore for StoreInner<T> {
     #[cfg(feature = "debug")]
     fn block_on_debug_handler(&mut self, event: crate::DebugEvent<'_>) -> anyhow::Result<()> {
         if let Some(handler) = self.debug_handler.take() {
+            if !self.can_block() {
+                bail!("couldn't grab async_cx for debug handler")
+            }
             log::trace!("about to raise debug event {event:?}");
             StoreContextMut(self).with_blocking(|store, cx| {
                 cx.block_on(Pin::from(handler.handle(store, event)).as_mut())

tl;dr; when a store is dropped all fibers are resumed with a trap that says "please die immediately" but there are some circumstances in internal code where during the process of returning an error from everything something async is attempted. The async attempt fails because the blocking context (rightfully) wasn't configured.

This happened before with call hooks for example where we would try to invoke the call hook when the fiber is exiting saying "hey a trap happened". The fix here is to just skip debug events/call hooks/etc during this situation since if the store is being dropped that's no interesting anyway

Wasmtime GitHub notifications bot (Dec 22 2025 at 16:35):

alexcrichton edited a comment on PR #12183:

That'll get fixed by this:
diff --git a/crates/wasmtime/src/runtime/fiber.rs b/crates/wasmtime/src/runtime/fiber.rs
index 68dce6884f..1ab5120ef3 100644
--- a/crates/wasmtime/src/runtime/fiber.rs
+++ b/crates/wasmtime/src/runtime/fiber.rs
@@ -358,7 +358,7 @@ impl StoreOpaque {
     }

     /// Returns whether `block_on` will succeed or panic.
-    #[cfg(feature = "call-hook")]
+    #[cfg(any(feature = "call-hook", feature = "debug"))]
     pub(crate) fn can_block(&mut self) -> bool {
         self.fiber_async_state_mut().current_future_cx.is_some()
     }
diff --git a/crates/wasmtime/src/runtime/store.rs b/crates/wasmtime/src/runtime/store.rs
index 0ce69fc6c3..ac6929d376 100644
--- a/crates/wasmtime/src/runtime/store.rs
+++ b/crates/wasmtime/src/runtime/store.rs
@@ -2839,6 +2839,9 @@ unsafe impl<T> VMStore for StoreInner<T> {
     #[cfg(feature = "debug")]
     fn block_on_debug_handler(&mut self, event: crate::DebugEvent<'_>) -> anyhow::Result<()> {
         if let Some(handler) = self.debug_handler.take() {
+            if !self.can_block() {
+                bail!("couldn't grab async_cx for debug handler")
+            }
             log::trace!("about to raise debug event {event:?}");
             StoreContextMut(self).with_blocking(|store, cx| {
                 cx.block_on(Pin::from(handler.handle(store, event)).as_mut())
tl;dr; when a store is dropped all fibers are resumed with a trap that says "please die immediately" but there are some circumstances in internal code where during the process of returning an error from everything something async is attempted. The async attempt fails because the blocking context (rightfully) wasn't configured.

This happened before with call hooks for example where we would try to invoke the call hook when the fiber is exiting saying "hey a trap happened". The fix here is to just skip debug events/call hooks/etc during this situation since if the store is being dropped that's no interesting anyway

(if you wouldn't mind this'd be a good thing to add a test for too)

Wasmtime GitHub notifications bot (Dec 22 2025 at 18:20):

cfallin updated PR #12183.

Wasmtime GitHub notifications bot (Dec 22 2025 at 18:20):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 22 2025 at 18:20):

cfallin created PR review comment:

Added in cce7e85d8c.

Wasmtime GitHub notifications bot (Dec 22 2025 at 18:20):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 22 2025 at 18:20):

cfallin created PR review comment:

Ah, good point -- fixed in fac011ac15ccdef503841b073b1886215bd43bb2.

Wasmtime GitHub notifications bot (Dec 22 2025 at 18:21):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 22 2025 at 18:21):

cfallin created PR review comment:

Good point, thanks -- fixed in fac011ac15ccdef503841b073b1886215bd43bb2.

Wasmtime GitHub notifications bot (Dec 22 2025 at 18:21):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 22 2025 at 18:21):

cfallin created PR review comment:

Ah, yes, that was a copy-paste leftover -- fixed in fac011ac15ccdef503841b073b1886215bd43bb2.

Wasmtime GitHub notifications bot (Dec 22 2025 at 18:24):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 22 2025 at 18:24):

cfallin created PR review comment:

Yes, good point. Fixed in fac011ac15ccdef503841b073b1886215bd43bb2.

Wasmtime GitHub notifications bot (Dec 22 2025 at 20:44):

cfallin updated PR #12183.

Wasmtime GitHub notifications bot (Dec 22 2025 at 20:49):

cfallin updated PR #12183.

Wasmtime GitHub notifications bot (Dec 22 2025 at 20:50):

cfallin commented on PR #12183:

Awesome, thanks @alexcrichton for that fix! Updated and added a test with an early drop of debugger/store.

Wasmtime GitHub notifications bot (Dec 22 2025 at 20:57):

alexcrichton submitted PR review:

:+1: from me (I defer to you/Nick to possibly await his approval too)

Wasmtime GitHub notifications bot (Dec 22 2025 at 21:21):

cfallin commented on PR #12183:

Thanks! I'll go ahead and merge and we can always tweak anything we need as we build on top of this.

Wasmtime GitHub notifications bot (Dec 23 2025 at 04:06):

cfallin commented on PR #12183:

CI failure on i686 (i.e., pulley32) is due to a problem with PreserveAll interacting poorly with Pulley's call1/2/3/4 optimization; #12213 fixes (tested locally). Will merge that here once that PR lands.

Wasmtime GitHub notifications bot (Dec 23 2025 at 19:23):

alexcrichton submitted PR review:

:+1: from me (I defer to you/Nick to possibly await his approval too)

Wasmtime GitHub notifications bot (Dec 23 2025 at 19:23):

cfallin updated PR #12183.

Wasmtime GitHub notifications bot (Dec 23 2025 at 19:23):

cfallin has enabled auto merge for PR #12183.

Wasmtime GitHub notifications bot (Dec 23 2025 at 21:07):

alexcrichton commented on PR #12183:

The CI error there is:
Running: `"curl" "--user-agent" "bytecodealliance/wasmtime auto-publish script" "https://crates.io/api/v1/crates/wasmtime-internal-debugger/owners"`

thread 'main' (2599) panicked at scripts/publish.rs:623:17:

crate wasmtime-internal-debugger is not owned by wasmtime-publish, please run:

    cargo owner -a wasmtime-publish wasmtime-internal-debugger

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
notably we verify that all crates are owned by wasmtime-publish and already published to crates.io -- this ensures the auto-publish later succeeds. Can you publish a stub crate and invite the wasmtime-publish owner?

Wasmtime GitHub notifications bot (Dec 23 2025 at 21:41):

cfallin commented on PR #12183:

Yep, I've just published the crate and invited wasmtime-publish -- let me know when ready to re-attempt landing.

Wasmtime GitHub notifications bot (Dec 23 2025 at 22:43):

cfallin commented on PR #12183:

I'm not sure I understand the failure now -- I added wasmtime-publish and it shows as an owner; I removed myself as an owner; it's still failing. @alexcrichton any ideas?

Wasmtime GitHub notifications bot (Dec 23 2025 at 22:51):

alexcrichton commented on PR #12183:

Oh that's wasmtime-publish-the-github-team not wasmtime-publish-the-github user. If you run the comment it printed out it should be resolved

Wasmtime GitHub notifications bot (Dec 23 2025 at 23:06):

cfallin commented on PR #12183:

Hmm. Unfortunately I removed myself from the crate after wasmtime-publish accepted the invite. So I'm not able to make any more permissions changes.

Do you have credentials for that user so you can log in and invite the team? Or have we lost the crate name forever?

(FWIW, I want to reiterate my suggestion from earlier that we have a runbook for all of this now that we have such a complex publishing and permissions scheme, because this was all very ambiguous and undocumented to me)

Wasmtime GitHub notifications bot (Dec 24 2025 at 17:13):

alexcrichton commented on PR #12183:

Yeah the BA has a 1password with credentials for wasmtime-publish (that's how I accepted the invite in the first place). I added the team. Agreed this can be documented better (like pretty much everything else infrastructure-related), and it's something I hope to handle in January with the move to trusted publishing.

Wasmtime GitHub notifications bot (Dec 24 2025 at 18:47):

cfallin updated PR #12183.

Wasmtime GitHub notifications bot (Dec 24 2025 at 18:47):

cfallin has enabled auto merge for PR #12183.

Wasmtime GitHub notifications bot (Dec 24 2025 at 18:48):

cfallin commented on PR #12183:

Thanks!

Wasmtime GitHub notifications bot (Dec 24 2025 at 19:22):

cfallin merged PR #12183.

Last updated: Feb 24 2026 at 07:22 UTC