cfallin requested fitzgen for a review on PR #12176.
cfallin opened PR #12176 from cfallin:debugging-host-sandwich to bytecodealliance:main:
In the initial design for the
DebugFrameCursor, I was concerned about the effects of host async on the stability of visiting earlier activations (see also the discussion of async combinators in #11896). The basic hypothesized problem was that when Wasm calls host-code calls Wasm, the sequence of activations on the stack is not even stable between async polls; so any debugger hook, which is an async function, should not be allowed to hold a frame cursor across a yield point since it could become invalidated if the next poll stacks up the activations differently.In further conversations it's become clear that this is not actually a possibility, for the simple reason that the inner re-entrant activations into the same store take full ownership (mutably reborrow) that store, and that mut reborrow becomes part of the future; so the exact chain of activations will remain in the same sequence when re-polled. Said another way, it is impossible at any given level of async host-code to create more than one future that re-enters the same store and somehow poll those in different orders at different times. The worst that a host-code async combinator can do is drop the future that re-enters the store. This drops and invalidates whatever frames a cursor held over a yield might be referencing, but it also drops the async invocation of the debugger hook itself, and due to lifetimes the cursor cannot escape that hook, so everything is still sound.
This PR thus updates the
DebugFrameCursorto visit all activations. I've generalized the backtrace code a bit to enable this, and built an internalStoreBacktracethat is an iterator over all activations associated with the store.At the
DebugFrameCursor(public API) level, the two basic choices were to present a sentinel for host frame(s) explicitly and make all Wasm-specific accessors returnOption<T>, or skip over host frames. I opted for the latter, withmove_to_parent()returning an enum value now that indicates whether it moved to a new activation.A note regarding the async component ABI: once debugging is possible within a
run_concurrentenvironment, it will again be the case that a single frame cursor should see only one activation, because each (re)-entry into the store becomes a new task, if my understanding is correct. At that time, we should build an API that lets the debugger see the activation for each task separately. That's a simpler model ultimately, and it will be nice when we move to it, but as long as we have the sync component ABI with async host code and the ability to stack activations as we do today, we need to provide the debugger this visibility.(Aside: why does the debugger need to see more than one activation? In addition to presenting a weird and incoherent view of the world to the user if we don't, it is also necessary to implement the "next" (step-over) debugger action, because otherwise a call to a host function that re-enters the store may lead to a state with fewer, but completely disjoint, stack frames on the "one latest activation" from which it's not possible to reason about whether we've left the called-into function yet.)
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
cfallin requested wasmtime-core-reviewers for a review on PR #12176.
cfallin updated PR #12176.
cfallin updated PR #12176.
cfallin updated PR #12176.
fitzgen submitted PR review.
fitzgen created PR review comment:
Kind of feel like this constructor should be safe and the
StoreBacktrace::store_mutmethod should be unsafe, since that is the thing that actually lets you get the mutable store and potentially mutate the stack.
cfallin updated PR #12176.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, that's a great point -- updated in 0e6ec66e1d.
cfallin has enabled auto merge for PR #12176.
cfallin updated PR #12176.
cfallin merged PR #12176.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Could you clarify what you're worried about with the
unsafehere? I can't actually imagine anything myself that would, in safe code, do something that violates a stack trace. While the stack is sort of morally mutatable/owned by the store we don't allow arbitrary mutations anywhere. Allowing arbitrary mutations itself would be quite unsafe due to return pointers and such on the stack.Given that I would, perhaps naively, expect this to be able to be a safe function, which would in theory help clean up callers of this a bit too
cfallin submitted PR review.
cfallin created PR review comment:
I remember discussing this safety condition in an earlier PR, too, where we decided to keep this
unsafe(constructing an iterator over the stack has always had anunsafe, it isn't new in this PR). It's forward-looking to APIs that might, e.g., allow for early returns or exception-throws to be enacted from within the debugger. That would undoubtedly require using araise-libcall-like strategy to resume to the new execution point after returning into the breakpoint trampoline, but might edit stack frames in the meantime. It seems fine to me to say that we putunsafeon such APIs when they exist instead. Happy to clean this up in upcoming work.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I have vague recollections of that, but if this is actually
unsafe, even in a forward looking way, then this needs to be deleted:impl<'a, T: 'static> AsContextMut for DebugFrameCursor<'a, T> { fn as_context_mut(&mut self) -> StoreContextMut<'_, Self::Data> { StoreContextMut(self.iter.store.0) // SAFETY: `StoreContextMut` does not provide any methods that // could remove frames from the stack, so the iterator remains // valid. unsafe { StoreContextMut(self.iter.store_mut().0) } } }You're saying that looking forward we'll allow editing the stack which would make this genuinely unsafe, but this impl also is only possible to exist if there's not actually any unsafety.
Something about this I think needs to be reconciled. Ideally the iterator would actually borrow the stack and would force a
.collector something like that into a vector so we wouldn't have to deal with these sorts of questions. I don't have all this in my head to know what the best route forward is, but I do think that something here needs to change as I believe this PR is otherwise self-contradictory where it's trying to future-proof against something that it's also saying can't ever happen.
alexcrichton edited PR review comment.
cfallin submitted PR review.
cfallin created PR review comment:
I'm not sure I follow -- the suggestion I gave above was to make any methods that actually edit the stack
unsaferather than the cursor construction. Wouldn't that also be sufficient? In other words, you can get aStoreContextMutsafely, you can create a cursor safely, but using mutable access to the store for nefarious bits that invalidate a cursor is stillunsafe.A few other constraints:
- We can't collect into a vector first -- that makes the whole operation O(n) even if you only want to look at (e.g.) the latest frame. This is a hard constraint and was mentioned by Nick in initial frame-cursor review, as I had actually built it this way first before making it a true cursor...
- We need to provide mutable access to the store while the cursor exists because the debugger may want to use the values it pulls out to examine other bits of state, e.g. GC objects. This is also a hard constraint because it is needed for base functionality.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
With stack editing things as
unsafe, this API here is safe, right? That's mostly what I am trying to clarify.
cfallin submitted PR review.
cfallin created PR review comment:
Yep, exactly. So my suggestion from above is
It seems fine to me to say that we put unsafe on such APIs when they exist instead. Happy to clean this up in upcoming work.
So (i) this function is safe, (ii)
DebugFrameCursorcan still safely give us aStoreContextMut, and (iii)unsafeis used in the future for anything that might invalidate a cursor by editing frames. I think we're agreeing on that final state but let me know if I'm missing something :-)
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Sounds good! Sorry just wasn't clear to me from your original comment, but all good nonetheless
Last updated: Jan 09 2026 at 13:15 UTC