cfallin opened PR #11769 from cfallin:wasmtime-debug-instrumentation to bytecodealliance:main:
This PR implements ideas from the [recent RFC] to serve as the basis
for Wasm (guest) debugging: it adds a stackslot to each function
translated from Wasm, stores to replicate Wasm VM state in the
stackslot as the program runs, and metadata to describe the format of
that state and allow reading it out at runtime.As an initial user of this state, this PR adds a basic "stack view"
API that, from host code that has been called from Wasm, can examine
Wasm frames currently on the stack and read out all of their locals
and stack slots.Note in particular that this PR does not include breakpoints,
watchpoints, stepped execution, or any sort of user interface for any
of this; it is only a foundation.This PR still has a few unsatisfying bits that I intend to address:
The "stack view" performs some O(n) work when the view is initially
taken, computing some internal data per frame. This is forced by the
current design ofBacktrace, which takes a closure and walks that
closure over stack frames eagerly (rather than work as an
iterator). It's got some impressive iterator-chain stuff going on
internally, so refactoring it to the latter approach might not
be too bad, but I haven't tackled it yet.A O(1) stack view, that is, one that does work only for frames as
the host API is used to walk up the stack, is desirable because some
use-cases may want to quickly examine e.g. only the deepest
frame (say, running with a breakpoint condition that needs to read a
particular local's value after each step).It includes a new
Config::compiler_force_inlining()option that is
used only for testing that we get the correct frames after
inlining. I couldn't get the existing flags to work on a Wasmtime
config level and suspect there may be an existing bug there; I will
try to split out a fix for it.This PR renames the existing
debugoption tonative_debug, to
distinguish it from the new approach.(Stacked on #11768.)
[recent RFC]: https://github.com/bytecodealliance/rfcs/pull/44
cfallin requested wasmtime-default-reviewers for a review on PR #11769.
cfallin requested fitzgen for a review on PR #11769.
cfallin requested wasmtime-core-reviewers for a review on PR #11769.
cfallin requested wasmtime-compiler-reviewers for a review on PR #11769.
cfallin commented on PR #11769:
(Marking this as a draft until I resolve the two issues above; posting to show how the Cranelift side is used in practice, in case that's useful.)
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin edited PR #11769:
This PR implements ideas from the [recent RFC] to serve as the basis
for Wasm (guest) debugging: it adds a stackslot to each function
translated from Wasm, stores to replicate Wasm VM state in the
stackslot as the program runs, and metadata to describe the format of
that state and allow reading it out at runtime.As an initial user of this state, this PR adds a basic "stack view"
API that, from host code that has been called from Wasm, can examine
Wasm frames currently on the stack and read out all of their locals
and stack slots.Note in particular that this PR does not include breakpoints,
watchpoints, stepped execution, or any sort of user interface for any
of this; it is only a foundation.This PR renames the existing
debugoption tonative_debug, to
distinguish it from the new approach.(Stacked on #11768 and #11783, and depends on #11784.)
[recent RFC]: https://github.com/bytecodealliance/rfcs/pull/44
cfallin has marked PR #11769 as ready for review.
cfallin commented on PR #11769:
Alright, I've rebased this on the latest #11768, and updated it to take an iterator-based approach instead. The borrowing is a little tricky but does work out: we implement a custom iterator up the stack frames (
StackView), which owns the store while alive, and a mutable borrow of that iterator must be passed into methods on the objects it returns (FrameView) to actually read values. All of that gives a safe, lazy (O(1)) way of walking the stack.In #11783 (underlying iterator that this refactor used) I chose not to replicate all of the complex logic to walk all activations and continuation stack-chains; for now, this API walks only the last activation that entered the host. That seems sufficient for a basic debugger application to me, but I can update it as desired.
This is now stacked on #11768 and #11783, and depends on the just-filed #11784 to be resolved somehow.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin edited PR #11769:
This PR implements ideas from the [recent RFC] to serve as the basis
for Wasm (guest) debugging: it adds a stackslot to each function
translated from Wasm, stores to replicate Wasm VM state in the
stackslot as the program runs, and metadata to describe the format of
that state and allow reading it out at runtime.As an initial user of this state, this PR adds a basic "stack view"
API that, from host code that has been called from Wasm, can examine
Wasm frames currently on the stack and read out all of their locals
and stack slots.Note in particular that this PR does not include breakpoints,
watchpoints, stepped execution, or any sort of user interface for any
of this; it is only a foundation.This PR renames the existing
debugoption tonative_debug, to
distinguish it from the new approach.(Stacked on #11768, and depends on #11784.)
[recent RFC]: https://github.com/bytecodealliance/rfcs/pull/44
cfallin commented on PR #11769:
Rebased on #11783 now that it's landed.
Also a few more refined thoughts on
I chose not to replicate all of the complex logic to walk all activations and continuation stack-chains; for now, this API walks only the last activation that entered the host. That seems sufficient for a basic debugger application to me, but I can update it as desired.
I am realizing in building the next step that we probably do want a "debug session" to apply only to one Wasm activation -- if for no other reason than that the API is much more semantically clear when exits back to the host are atomic events rather than somehow "translucent" (host code invisible but nested Wasm invocations visible). And there are weird corner cases where, for example, the recursively invoked host code itself tries to set up a (nested) debug context that we'd want to rule out anyway. So: the limit to introspect only the most recent activation is a feature, not a bug!
cfallin updated PR #11769.
cfallin commented on PR #11769:
And rebased out #11768 as well; now this only depends on #11784 (either accepting the
compiler_force_inliningoption added here, or doing something else).
cfallin edited PR #11769:
This PR implements ideas from the [recent RFC] to serve as the basis
for Wasm (guest) debugging: it adds a stackslot to each function
translated from Wasm, stores to replicate Wasm VM state in the
stackslot as the program runs, and metadata to describe the format of
that state and allow reading it out at runtime.As an initial user of this state, this PR adds a basic "stack view"
API that, from host code that has been called from Wasm, can examine
Wasm frames currently on the stack and read out all of their locals
and stack slots.Note in particular that this PR does not include breakpoints,
watchpoints, stepped execution, or any sort of user interface for any
of this; it is only a foundation.This PR renames the existing
debugoption tonative_debug, to
distinguish it from the new approach.(Depends on #11784.)
[recent RFC]: https://github.com/bytecodealliance/rfcs/pull/44
cfallin updated PR #11769.
cfallin edited PR #11769:
This PR implements ideas from the [recent RFC] to serve as the basis
for Wasm (guest) debugging: it adds a stackslot to each function
translated from Wasm, stores to replicate Wasm VM state in the
stackslot as the program runs, and metadata to describe the format of
that state and allow reading it out at runtime.As an initial user of this state, this PR adds a basic "stack view"
API that, from host code that has been called from Wasm, can examine
Wasm frames currently on the stack and read out all of their locals
and stack slots.Note in particular that this PR does not include breakpoints,
watchpoints, stepped execution, or any sort of user interface for any
of this; it is only a foundation.This PR renames the existing
debugoption tonative_debug, to
distinguish it from the new approach.[recent RFC]: https://github.com/bytecodealliance/rfcs/pull/44
cfallin commented on PR #11769:
Rebased out temporary config fix to #11784 as well now that #11797 merged.
github-actions[bot] commented on PR #11769:
Label Messager: wasmtime:config
It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:
[ ] If you added a new
Configmethod, you wrote extensive documentation for
it.<details>
Our documentation should be of the following form:
```text
Short, simple summary sentence.More details. These details can be multiple paragraphs. There should be
information about not just the method, but its parameters and results as
well.Is this method fallible? If so, when can it return an error?
Can this method panic? If so, when does it panic?
Example
Optional example here.
```</details>
[ ] If you added a new
Configmethod, or modified an existing one, you
ensured that this configuration is exercised by the fuzz targets.<details>
For example, if you expose a new strategy for allocating the next instance
slot inside the pooling allocator, you should ensure that at least one of our
fuzz targets exercises that new strategy.Often, all that is required of you is to ensure that there is a knob for this
configuration option in [wasmtime_fuzzing::Config][fuzzing-config] (or one
of its nestedstructs).Rarely, this may require authoring a new fuzz target to specifically test this
configuration. See [our docs on fuzzing][fuzzing-docs] for more details.</details>
[ ] If you are enabling a configuration option by default, make sure that it
has been fuzzed for at least two weeks before turning it on by default.[fuzzing-config]: https://github.com/bytecodealliance/wasmtime/blob/ca0e8d0a1d8cefc0496dba2f77a670571d8fdcab/crates/fuzzing/src/generators.rs#L182-L194
[fuzzing-docs]: https://docs.wasmtime.dev/contributing-fuzzing.html
<details>
To modify this label's message, edit the <code>.github/label-messager/wasmtime-config.md</code> file.
To add new label messages or remove existing label messages, edit the
<code>.github/label-messager.json</code> configuration file.</details>
alexcrichton submitted PR review:
These are some initial thoughts I had from reading this last night and just now. I wanted to take a closer look at
debug.rsbut I stopped once I realized that I think the current API proposed here is unsound. Assuming you agree with my comment below I'll take another look once it's updated.
alexcrichton created PR review comment:
Could this go in
before_translate_operatorrather than being an additional "before" hook?
alexcrichton created PR review comment:
Would it make sense to store this in
stackdirectly? That'd avoid the need to always try to keep the two in sync
alexcrichton created PR review comment:
Is this safety comment still relevant? Naively I'd expect this to be a safe function since getting a backtrace of a store should always be a safe operation
alexcrichton created PR review comment:
Passing thought about this -- how do stack maps come into play here with GC? E.g. if we implement a moving GC, wouldn't all of these values need to be roots for the GC?
alexcrichton created PR review comment:
This is technically a breaking change for the CLI and our existing documentation around this. I don't have a great idea of what to do instead, but I'm wary of breaking existing users of our debug info without prior warning
alexcrichton created PR review comment:
s/EXTERNREF/FUNCREF/
alexcrichton created PR review comment:
Instead of duplicating this check from below, could this be refactored either into a loop outside of this conditional or some sort of helper method?
alexcrichton created PR review comment:
If
debugis the default here, should it also be in the default list for thewasmtimecrate?
alexcrichton created PR review comment:
This scoping seems like something we'll want to change perhaps? For example maybe the entire iterator holds
AutoAssertNoGc? It seems like if a GC happened in the middle of a stack walk that things would probably go bad...There's also the question of the scope that these rooted values are pushed onto. It seems like that should be a pseudo-requirement of a caller? Or maybe just documented? I'm not sure if there's too much we can do to help this from a caller's perspective...
alexcrichton created PR review comment:
Would it make sense to pass these around as a unit-of-sorts if they all live in some sort of
FuncEnvironmentstructure or similar? They're otherwise being passesd around pretty frequently and it's a lot of boilerplate to skip over
alexcrichton created PR review comment:
I think this is going to have to change one way or another to make this sound. Some things I can think of to break soundness with this are:
- I think a
FrameViewcould get paired with the wrongStore.- A
FrameViewcould be persisted outside of a stack trace and then later read once the frames are all gone (or with the wrong store)Personally I think it would be worth trying to consolidate the
unsafehere in this module as much as possible. Ideally we'd just have one tiny "thing", whatever that is, which does theunsafebits of actually reading from the stack and, ideally, we'd be able to read the definition of this "thing" and trivially conclude "yep that's fine".
alexcrichton created PR review comment:
GIven that
Valis self-describing in its type, could theValTypereturn be omitted?
alexcrichton created PR review comment:
GIven that wasm modules are generally all
u32-heavy on the encoding side, could this be au32andlocalwould also take au32below?
alexcrichton created PR review comment:
To make this more amenable to strict provenance (hypothetically one day being supported in Wasmtime...) instead of passing addresses around as
usizecould they be passed around as*const u8or similar?
cfallin commented on PR #11769:
Regarding unsoundness of
FrameViewbeing unattached to a lifetime:Hmm. I agree that as written this API permits those patterns; I had been mostly concerned with keeping the stack "frozen" with the
StackViewiterator, then realized that theStoreneeds to be kept separate from whatever the iterator yields (because you cannot yield a&mut _more than once from an iterator if they overlap, and the alternative would otherwise be to have the store mut-borrow be reborrowed byFrameView, but that's not viable here).I suspect that the easiest sound option may be to give up on providing an actual iterator, that literally yields externalizable values (
FrameView) for each frame, but rather integrate the reading-of-the-frame methods into theStackViewitself. That cleanly disallows any of the "save it for later" or "use it with the wrong store" cases because theStackViewholds the store borrow.It does mean a user would look something like
while !stack_view.done() { let instance = stack_view.instance(); let func = stack_view.defined_func_index(); let value = stack_view.local(0); // ... stack_view.move_up(); }What do you think?
alexcrichton commented on PR #11769:
Yeah that matches my intuition as well where the disconnect of the iterator value from the store felt a bit fishy. "Lending iterators" is definitely something that's not natural/easy to do in Rust, but I agree that having a custom API to handle this is reasonable. Best-in-class ergonomics seems ok to not have as a goal for a debugging-exclusive API.
cfallin submitted PR review.
cfallin created PR review comment:
It's possible; it would mean moving things around so that
stack.push*()helpers, and all other manipulations ofstack.stack, directly create a stack-shape node as well, which involves deeper refactoring. I was trying to keep changes to the code translator to a minimum in the initial PR. I'll play with this though!
cfallin submitted PR review.
cfallin created PR review comment:
Sure, I was trying to invent a policy to keep the old debugging approach separate by renaming it to "native debugging", but I'm happy to swap things around so existing native debug remains
debug-infoand the new thing becomesguest-debugor something like that. I guess I was leaning on our "semver lets us change things each major release" policy but I agree CLI stability here may be desirable.What do you think about
guest-debugas a general moniker for the new approach?
cfallin submitted PR review.
cfallin created PR review comment:
I believe so: the difference between this and
Backtrace::tracefor example is that the latter doesn't give you a mutable store while it's doing the tracing. In theory, a maximally evil client could take the owned store, translate that to full ownership of the raw stack, and mutate the frames that we're walking through.That said, we don't provide any other APIs on a store today (to my knowledge!) that could mutate the stack arbitrarily, e.g. on-stack replacement of some form; so perhaps we can kick this can down the road and re-assign responsibility for safety to whichever mechanism eventually does that. Happy to do what you feel is best!
cfallin submitted PR review.
cfallin created PR review comment:
Yes, I think so; I'll add a note about this.
I can see two options:
- We could extend the "tail is new and needs to be synced" state to arbitrary dirtyness, and mark all ref-typed values as dirty after a safepoint; this would then force the stack sync to re-write all (potentially updated) GC refs to the instrumentation slots.
- We could extend the instrumentation frame-shape metadata to be able to refer directly to other stackslots; then we could let the spilling that already happens for safepoints also give us full debug visibility.
I favor the second overall but probably want to defer this for now, to get the basic functionality working...
cfallin submitted PR review.
cfallin created PR review comment:
Yes, assert-no-gc seems reasonable; I initially was thinking I didn't want to impose that limitation because a mutating debugger interface may want to allow evaluations that could allocate other things, but that's far-future stuff; happy to defer it.
(And actually, just to note it, this intersects with your above point about GC rooting too; in the presence of a moving GC we'd have to be careful to have debug metadata refer directly to the existing user-stack-map for that to work, rather than re-spill values to instrumentation after a safepoint, because while still walking the stack that re-spill code wouldn't run.)
cfallin updated PR #11769.
cfallin submitted PR review.
cfallin created PR review comment:
(Updated in 2df6a9140c.)
cfallin created PR review comment:
Fixed in 2df6a9140c per the sketch below.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Good point; I think this is left over from something earlier (was I trying to return a precise type including runtime GC types? I'm not sure) but it's simpler not to. Removed in 2df6a9140c.
cfallin submitted PR review.
cfallin created PR review comment:
Fixed via removal in 2df6a9140c.
cfallin submitted PR review.
cfallin created PR review comment:
Note added in 2df6a9140c; to be addressed in a more thorough way later.
cfallin submitted PR review.
cfallin created PR review comment:
(Kept this but feel free to ask me to remove or reword if you feel differently.)
cfallin commented on PR #11769:
Pushed a fix to the API to encapsulate the frame with the iterator; will address the other bits later tonight or tomorrow. Thanks!
cfallin submitted PR review.
cfallin created PR review comment:
OK, I've pushed on this a bit locally, but I think it makes things much worse:
- a lot of the operand stack management assumes it can take slices of
&[Value]and even mutate those in some cases, so although(Value, FrameStackShape)is the natural type for a stack entry (or rather(Value, Option<FrameStackShape>)since instrumentation is optional), that alters a ton of code;- push/pop methods on
stacksrequire the builder now, which creates a huge diff that alters every operator translation (it's pretty mechanical, but still);- the distinction between
stackandstack_shapeactually corresponds an important algorithmic detail: we let them go out of sync (specifically,stack_shapecan always be shorter thanstackif not all values are flushed) then do the sync at the end of a given operator translation. This is important because we get the specific type from the validator's view of the Wasm state and it only makes sense to query that after the operator translation has updated the stack completely to match the current program point, rather than at each individual push/pop.The last one is the most fundamental, the other two are "only" big mechanical refactors, but for all these reasons I think the current approach is the most workable (and the doc-comment here does a not terrible job explaining the invariant imho).
cfallin updated PR #11769.
cfallin created PR review comment:
Sure, merged this (and the after-hook too) into existing hooks in 32b15c30f8.
cfallin submitted PR review.
cfallin updated PR #11769.
cfallin submitted PR review.
cfallin created PR review comment:
Sure, addressed in 645dfe1470.
cfallin updated PR #11769.
cfallin submitted PR review.
cfallin created PR review comment:
Sure, added in 618bd094ef.
cfallin updated PR #11769.
cfallin submitted PR review.
cfallin created PR review comment:
So currently
stackandfunc_envare kept separate throughout the whole translator for what I suspect are deep borrowing-related reasons; it just so happens that we needstackin a few more places.srclocon the other hand is in theory known by thebuilderanywhere we have that, so I've added an accessor and removed it in a few places in e19e867b0d.
cfallin submitted PR review.
cfallin created PR review comment:
Sure, addressed in 21c3b4fe4b (and used
with_exposed_provenanceto go from the FP that comes in from other runtime machinery as ausizeto the*const u8we use here).
cfallin submitted PR review.
cfallin created PR review comment:
Sure, addressed in 011f656292.
cfallin updated PR #11769.
cfallin submitted PR review.
cfallin created PR review comment:
Alright, I've renamed some things in f660ba9296 so the current (native DWARF) approach remains under the same option names on the CLI and
Configand the new thing is "guest debugging".
cfallin commented on PR #11769:
I believe I've addressed all review comments so far; thanks!
cfallin updated PR #11769.
cfallin updated PR #11769.
alexcrichton created PR review comment:
Nah yeah I understand the rationale for the rename and I agree that if we were to start from scratch I'd prefer to rename this, but also yeah the CLI generally needs more stability than the major version bumps technically allow so I think it'll cause unnecessary churn to rename the CLI option.
That being said I still like the idea of classifying debugging as "native debugging" or "guest debugging" internally in Wasmtime. For example the new tunables option
debug_guestand usingguest-debughere in the CLI sounds reasonable to me. I think it'd be reasonable to "deprecate" this-Ddebug-infooption in favor of-Dnative-debugwhere "deprecate" is in quotes because we don't actually warn on deprecations, but we could technicaly accept both.
alexcrichton submitted PR review.
fitzgen created PR review comment:
Similarly to how you already merged the debug instrumentation before/after each op into the existing before/after each op hook, can we put the
debug_instrumentation_at_{start,end}stuff inside the existing{before,after}_translate_functionhooks?
fitzgen submitted PR review:
Didn't get to finish looking over everything yet, but what I've seen looks great, one minor nitpick below.
alexcrichton submitted PR review:
:+1: from me, but @fitzgen should probably also sign off too
alexcrichton created PR review comment:
Could the tunables option here retain the "native" part of the name? To mirror the
debug_guestoption I think it'd be reasonable to rename this asdebug_nativetoo
alexcrichton created PR review comment:
s/teh/the/
alexcrichton created PR review comment:
If most of this module operated on
&StoreOpaquerather than&mut StoreOpaquethat might be a way to avoid thisVectoo perhaps
alexcrichton created PR review comment:
Could this, and maybe stackwalking in general, actually take
&StoreOpaque, notably not mutable? That way I think stack iteration could be 100% safe since mutations are impossible while the store is held, and it'd also prevent a GC since that requires a mutable store
alexcrichton created PR review comment:
Could the documentation here link to the types in the public API which are enabled when this option is turned on?
alexcrichton created PR review comment:
Mind adding a "SAFETY" comment about why this is safe?
alexcrichton created PR review comment:
Ah ok yeah that all makes sense, thanks for pushing on this regardless!
alexcrichton created PR review comment:
Could this point to a spot in code translation and/or Cranelift where offset 0 is the vmctx? (and ideally point from there to here too)
alexcrichton created PR review comment:
Ah ok that makes sense to me yeah, but can the safety comment here be worded specifically around how the store is a public field? The current docs feel a bit inaccurate because the store isn't yielded on each iteration nor is it possible to store this iterator, return to wasm, and then resume the iterator later.
(also unresolving for visibility)
alexcrichton created PR review comment:
Mind also filing an issue for this? And maybe making a tag for guest-debug related things? Mostly want to make sure we don't lose track of this as this seems pretty important and if it's just a comment we'll inevitably forget it a few months from now.
(unresolving this conversation for visibility)
alexcrichton created PR review comment:
Shouldn't this be a bug or debug assert if a program point isn't found?
cfallin created PR review comment:
The tricky thing here is that reading out a GC reference actually does mutate the store: it roots the reference and the root-set is in the store. This is not super-satisfying to me either but it's forced by the design today I think...
cfallin submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Ah good point, and yeah that forces our hand here.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin created PR review comment:
Done in 0b20becd0c!
cfallin submitted PR review.
cfallin updated PR #11769.
cfallin submitted PR review.
cfallin created PR review comment:
Done in 690d88c2fb!
cfallin updated PR #11769.
cfallin submitted PR review.
cfallin created PR review comment:
Done (to
debug_native) in 59df486fca.
cfallin updated PR #11769.
cfallin submitted PR review.
cfallin created PR review comment:
Done in 6b48c9fd61.
cfallin submitted PR review.
cfallin created PR review comment:
Done in 6b48c9fd61.
cfallin submitted PR review.
cfallin created PR review comment:
Done; added a few references in 6b48c9fd61.
cfallin submitted PR review.
cfallin created PR review comment:
Yes, I think this was left over from some earlier thinking before I realized that debug instrumentation should be uniform across the whole activation because we're in one store with one set of settings... removed in
cfallin edited PR review comment.
cfallin created PR review comment:
(see other comment about mutability -- unfortunately I don't think we can change this at the moment)
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
OK cool -- I'll add such deprecation to my TODO list for the public-facing interface once I get there on my marathon (I don't think we should do it before we have the alternative thing available) :-)
cfallin updated PR #11769.
cfallin submitted PR review.
cfallin created PR review comment:
Oops, sorry, I missed that outdated wording -- fixed in ef26d17428.
cfallin submitted PR review.
cfallin created PR review comment:
Filed #11841, but as I wrote that, I realized that even for our current non-moving GC this is a problem, because DCE could remove the part of the program that would otherwise root a value (e.g., for a Wasm program that takes an
anyrefas a parameter, pushes it on the operand stack, then drops it at the end; that's live for the debugger but won't be in any user stack maps).I'll see about adding debug metadata traversal to tracing directly, so every ref-typed value at the current program point gets traced...
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin submitted PR review.
cfallin created PR review comment:
OK, I've actually implemented the proper rooting in 66e650522e. I have a basic test with a GC ref semantically-live-but-optimizing-compiler-dead across a hostcall, with that hostcall observing the value in the slot. A really proper test would actually invoke a GC on the store during that hostcall, which I can't do in this PR yet (the followup gives a store context that would let me do that) but hopefully this is sufficient for now.
cfallin commented on PR #11769:
Alright, I think this is ready for a final look -- thanks for the feedback so far!
alexcrichton created PR review comment:
Musing about this, and I realize this isn't supported right now anyway, but will this be enough to support modifying the stack or locals during a debugging session. I was naively thinking "no" because locals can have a more refined type as well as values on the stack. Effectively the wasm validator understands a local/stack entry to have a particular type, and the runtime value may be that type or a subtype. Mutating the stack should allow anything that's a subtype of the known type, but by only recording the top type here we lose the knowledge of precisely what type it needs to be.
I suppose bringing it back to this PR: I would be surprised if
FrameValTypecontinues to support future use cases as-is. In the end what I might naively expect is thatFrameValTypebecomes literallyWasmValType. Do you think it might be worthwhile to go ahead and plumb that to avoid a future conversion? (and of course deferring this to an issue is also fine)
alexcrichton submitted PR review:
One final thought, but doesn't change the fact that I'm still :+1: on this landing as-is
cfallin updated PR #11769.
cfallin submitted PR review.
cfallin created PR review comment:
Interesting; yeah, I think you're right.
The driving factor here so far (as I'm sure you intuited) is that I didn't want to pull runtime types down into
environ; but I guess theenviron-level metadata can use module-level type indices and then we translate that at runtime via the engine type registry, like everywhere else.I've created #11847 to track. I think that's a bit much to put on top of this PR, IMHO, but happy to do it once needed for mutation. It'd also give a nice ability to query the precise type (in runtime terms) on any local/stack slot.
fitzgen submitted PR review:
r=me with the bit about stack tracing using stack maps instead of also debug info (https://github.com/bytecodealliance/wasmtime/issues/11841#issuecomment-3399148128) and then also a couple nitpicks/bikesheds that I am happy to be convinced otherwise about
Thanks Chris!
fitzgen created PR review comment:
Bikeshedding: So this doesn't view the whole stack anymore, just one frame in the linked list that makes up the stack, so what do you think of
FrameViewinstead? And it is for the purposes of debugging, so maybeDebugFrameView? Or maybe justDebugFrame?
fitzgen created PR review comment:
Placeholder for https://github.com/bytecodealliance/wasmtime/issues/11841#issuecomment-3399148128
fitzgen created PR review comment:
This name is assuming that the stack grows down, which isn't really defined at the Wasm level.
How about
pub fn parent(&'a mut self) -> Option<DebugFrame<'a>> { ... }?
Or if we want to keep it cursor-style, rather than returning a new frame, maybe we should call it
move_to_parentand also call the typeDebugStackCursor. (I thinkViewimplies a more slice-like, O(1)-access thing thanCursordoes).What I like about returning a new frame, over the cursor style, is that it is very clear when you've reached to initial stack frame and there is nothing else to traverse. I personally just find it more natural as well. (The lifetimes should make it so that you can't have two
DebugFrames alive at the same time for the same stack anyways).
fitzgen created PR review comment:
/// The flattened list of (symbol name, FuncKey, compiled /// function) triples, as they will be laid out in the object file.
cfallin updated PR #11769.
cfallin updated PR #11769.
cfallin commented on PR #11769:
@fitzgen
the bit about stack tracing using stack maps instead of also debug info (https://github.com/bytecodealliance/wasmtime/issues/11841#issuecomment-3399148128)
I built this out in e462fda5eb but I'm not super happy about it for three reasons:
- user stack maps were emitted only on calls (as they were designed for); we now need to add them to sequence point instructions, and in the future will need to add them to every trapping instruction, because on trap, we yield to the debugger, and the instrumentation-slot values need to remain rooted until we eventually tear down (and with mutable access to the store that could include a GC). So that means that e.g. every load/store, and every ALU instruction emit case on x86-64 (because mem folding).
- That would be fine if we handle it in one place (take stack map before emitting every inst), but now we have a pretty big blowup of metadata space. The stack map will contain an entry for every ref-typed local, and duplicate these for every Wasm memory instruction -- quadratic space. The debug frame state in contrast stores the types and offsets for all locals once.
- There's the awkward bit about zero-length sequence points and duplicated stack maps; again not fatal, we can handle it, but shows how this is pushing what stackmaps were designed for a bit.
The asymptotic efficiency bit is the fundamental one I think.
I can make an argument in principle for GC tracing using debug metadata as well: debug state has different requirements (much more frequent state points, and many of them are identical i.e. locals), and is logically a separate kind of rooting area (albeit one per frame), just as globals and user roots are separate areas.
What do you think? Happy to revert the last commit if you agree.
cfallin edited a comment on PR #11769:
@fitzgen
the bit about stack tracing using stack maps instead of also debug info (https://github.com/bytecodealliance/wasmtime/issues/11841#issuecomment-3399148128)
I built this out in e462fda5eb but I'm not super happy about it for three reasons:
- user stack maps were emitted only on calls (as they were designed for); we now need to add them to sequence point instructions, and in the future will need to add them to every trapping instruction, because on trap, we yield to the debugger, and the instrumentation-slot values need to remain rooted until we eventually tear down (and with mutable access to the store that could include a GC). So that means e.g. every load/store, and every ALU instruction emit case on x86-64 (because mem folding).
- That would be fine if we handle it in one place (take stack map before emitting every inst), but now we have a pretty big blowup of metadata space. The stack map will contain an entry for every ref-typed local, and duplicate these for every Wasm memory instruction -- quadratic space. The debug frame state in contrast stores the types and offsets for all locals once.
- There's the awkward bit about zero-length sequence points and duplicated stack maps; again not fatal, we can handle it, but shows how this is pushing what stackmaps were designed for a bit.
The asymptotic efficiency bit is the fundamental one I think.
I can make an argument in principle for GC tracing using debug metadata as well: debug state has different requirements (much more frequent state points, and many of them are identical i.e. locals), and is logically a separate kind of rooting area (albeit one per frame), just as globals and user roots are separate areas.
What do you think? Happy to revert the last commit if you agree.
cfallin edited a comment on PR #11769:
@fitzgen
the bit about stack tracing using stack maps instead of also debug info (https://github.com/bytecodealliance/wasmtime/issues/11841#issuecomment-3399148128)
I built this out in e462fda5eb but I'm not super happy about it for three reasons:
- user stack maps were emitted only on calls (as they were designed for); we now need to add them to sequence point instructions, and in the future will need to add them to every trapping instruction, because on trap, we yield to the debugger, and the instrumentation-slot values need to remain rooted until we eventually tear down (and with mutable access to the store that could include a GC). So that means e.g. every load/store, and every ALU instruction emit case on x86-64 (because mem folding).
- That would be fine if we handle it in one place (take stack map before emitting every inst), but now we have a pretty big blowup of metadata space. The stack map will contain an entry for every ref-typed local, and duplicate these for every Wasm instruction -- quadratic space. The debug frame state in contrast stores the types and offsets for all locals once.
- There's the awkward bit about zero-length sequence points and duplicated stack maps; again not fatal, we can handle it, but shows how this is pushing what stackmaps were designed for a bit.
The asymptotic efficiency bit is the fundamental one I think.
I can make an argument in principle for GC tracing using debug metadata as well: debug state has different requirements (much more frequent state points, and many of them are identical i.e. locals), and is logically a separate kind of rooting area (albeit one per frame), just as globals and user roots are separate areas.
What do you think? Happy to revert the last commit if you agree.
fitzgen commented on PR #11769:
Overall, the latest commit looks like what I expected, and seems good to me.
The asymptotic efficiency bit is the fundamental one I think.
It is true that the total size of stack maps is technically
O(num_live_gc_refs * num_safepoints). However, I don't think this will actually be an issue in practice because
- Each stack map is already a packed bitmap, minimizing the effect of
num_live_gc_refsto a degreenum_live_gc_refsis not actually large in practicenum_safepointsis only large when debugging in practice- All of this only affects code size in edge cases, not run time performance
But if we are concerned about this when adding debugging instrumentation's stack map entries, then I think we should also be concerned about stack maps as they exist today. It is certainly already possible to construct a Wasm that just does a ton of calls (making
num_safepointsgrow) with a ton of live GC refs on the stack. So if we are concerned about this, then we should make the stack maps in general have prefix sharing similar to what you've done for debug info. (Also I'm sure it is also possible to construct Wasm inputs that have similar Wasm:native code size blow ups in general due to Cranelift being an optimizing/non-template compiler with remat, demand-based egraph elaboration in the face of conditional branches, etc.)But I don't really want to keep holding up your debugging work, especially since you have so many PRs stacked on top of this one already. So if you still prefer the previous approach, then go for it. I won't draw any lines in the sand here.
cfallin updated PR #11769.
cfallin commented on PR #11769:
OK, yeah, I pushed on this a bit more to see if I could make it nicer; I do agree that in principle stack maps should be the sole owner of "GC refs on the stack", and also that any efficiency concerns with them could be addressed directly.
That said I was trying to harmonize the stackmap emission a bit so it isn't special-cased for
sequence_pointand calls only, since we'll ultimately need them on many different kinds of instructions; and working through that ended up calling for reinvention of a lot of what debug tags already handle, e.g. pre- and post-points on instructions, and seemed to be going against the grain in general. So I think I'll keep it as-is but we can always try this refactor later. Thanks for the push to try this in any case!
cfallin created PR review comment:
(Resolving for now but we can try this refactor later if we can make supporting stackmaps across all instructions nicer)
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Fixed in 4fdcd4eb1b, thanks!
cfallin updated PR #11769.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, I like the cursor naming here; switched to
DebugFrameCursor, an accessordebug_frames()on the store, andmove_to_parent(). Thanks! Updated in 4fdcd4eb1b.
cfallin submitted PR review.
cfallin created PR review comment:
Updated to
DebugFrameCursorper below in 4fdcd4eb1b.
cfallin has enabled auto merge for PR #11769.
cfallin updated PR #11769.
cfallin has enabled auto merge for PR #11769.
cfallin merged PR #11769.
Last updated: Dec 06 2025 at 07:03 UTC