cfallin opened PR #12133 from cfallin:wasm-breakpoints to bytecodealliance:main:
This is a PR that puts together a bunch of earlier pieces (patchable calls in #12061 and #12101, private copies of code in #12051, and all the prior debug event and instrumentation infrastructure) to implement breakpoints in the guest debugger.
These are implemented in the way we have planned in #11964: each sequence point (location prior to a Wasm opcode) is now a patchable call instruction, patched out (replaced with NOPs) by default. When patched in, the breakpoint callsite calls a trampoline with the
patchableABI which then invokes thebreakpointhostcall. That hostcall emits the debug event and nothing else.A few of the interesting bits in this PR include:
- Implementations of "unpublish" (switch permissions back to read/write from read/execute) for mmap'd code memory on all our platforms.
- Infrastructure in the frame-tables (debug info) metadata producer and parser to record "breakpoint patches".
- A tweak to the NOP metadata packaged with the
MachBufferto allow multiple NOP sizes. This lets us use one 5-byte NOP on x86-64, for example (did you know x86-64 had these?!) rather than five 1-byte NOPs.This PR also implements single-stepping with a global-per-
Storeflag, because at this point why not; it's a small additional bit of logic to do all patches in all modules registered in theStorewhen that flag is enabled.A few realizations for future work:
- The need for an introspection API available to a debugger to see the modules within a component is starting to become clear; either that, or the "module and PC" location identifier for a breakpoint switches to a "module or component" sum type. Right now, the tests for this feature use only core modules. Extending to components should not actually be hard at all, we just need to build the API for it.
- The interaction between inlining and
patchable_callis interesting: what happens if we inline apatchable_callat atry_callcallsite? Right now, we do not update thepatchable_callto atry_call, because there is nopatchable_try_call; this is fine in the Wasmtime embedding in practice because we never (today!) throw exceptions from a breakpoint handler. This does suggest to me that maybe we should make patchability a property of any callsite, and allow try-calls to be patchable too (with the same restriction about no return values as the only restriction); but happy to discuss that one further.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
cfallin requested abrown for a review on PR #12133.
cfallin requested wasmtime-compiler-reviewers for a review on PR #12133.
cfallin requested dicej for a review on PR #12133.
cfallin requested wasmtime-core-reviewers for a review on PR #12133.
cfallin requested wasmtime-default-reviewers for a review on PR #12133.
cfallin requested fitzgen for a review on PR #12133.
cfallin requested alexcrichton for a review on PR #12133.
cfallin updated PR #12133.
cfallin updated PR #12133.
cfallin commented on PR #12133:
The s390x failure looks like an oversight on my part in the patchable-ABI implementation on that ISA -- the clobber-save code implicitly assumes that clobber set fits in that ABI's special clobber-save region in each frame but that's no longer true when everything is clobbered. I'll rework it in a separate PR then rebase this.
alexcrichton created PR review comment:
Well, either all that, or the clause about other threads should be removed from the
unsafedocs since that should not be applicable given the rules of&mut
alexcrichton created PR review comment:
s/FnMut/FnOnce/
alexcrichton created PR review comment:
I think that the documentation here is out-of-date (this doesn't change permissions), and additionally I think we can probably make this a safe function. The safety requirement of
self.mmap.as_mut_slice()is satisfied by checkingself.publishedand other internal mapping checks to ensure that it's read/write.I'd also recommend changing this to return
-> &mut [u8]and perhaps renaming it totext_mut()to make it easier to call and more clear which part it's providing mutable access to.
alexcrichton created PR review comment:
How come this points to the end vs the start?
alexcrichton created PR review comment:
Personally I'd say that this
Safetysection andunsafeis unnecessary since, if&mutexists, it's already guaranteed that this has exclusive access over the memory.Put another way, the unsafety here, other threads executing code, already shows unsafety unrelated to calling this function insofar as it should not be possible to acquire
&mut CodeMemorywhile other threads are executing in the code.
alexcrichton created PR review comment:
Instead of adding a new kind here, I would have expected the ABI of the
breakpointbuiltin to be different from the rest. There's no use for a non-patchable version of thebreakpointbuiltin for example, right?
alexcrichton created PR review comment:
I think this'll want to return an error with
bail!for when virtual memory isn't supported
alexcrichton created PR review comment:
I think something around this is going to need some special care, specifically when
unpublishreturns an error. There are situations that this implementation doesn't currently handle such as:
- If
unpublishfails then it's documented as leaving things in an indeterminate state which means the entire store needs to be locked down and completely unusable (or something like that), but no such enforcement happens here.- If
unpublishsucceeds, there's no guarantee that the futurepublishinDropwill succeed. I don't think that a futuerpublishfailing should result in a panic-during-Dropsince that's easy to get into a double-panic situation that aborts the process.Overall I don't know how to handle this though. If
unpublishfails and things are indeterminate, we can't handle that because returning from wasm requires executing some wasm. Ifpublishfails then things definitely aren't executable, so we can't return to wasm to jump back to the other end of wasm. I hadn't really thought about this until reading this just now, but do you have ideas on how to handle the failures here?
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I suppose the closest thing that I can think of for handling this is to document abort-on-error semantics both here during
unpublishand inDropwhere the re-publish happens. It's on us to ensure that nothing outside of like "kernel exceeded limit of VMAs" errors happen and in such a situation there's not much we can do other than abort sort of...
alexcrichton submitted PR review.
cfallin updated PR #12133.
cfallin created PR review comment:
This is the "avoid storing the same number twice" trick that we use in various places in Cranelift too -- when the ranges are consecutive we just store one endpoint; and we store the end rather than the start because one can always get the end of the previous, or zero for first entry, but the end of the whole pool may have been padded. (I'll add to the comment to note that start is implicit as last end.)
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Hmm, yeah, I am just realizing that even an ordinary panic isn't sufficient here: that will unwind to the top of the hostcall stack frames, but then we'll catch-unwind and return an error through Wasm, but oops the trampoline isn't executable anymore, so we segfault.
We could almost make this work if we insisted that our trampolines (where the error check and raise that will longjmp back over the activation) were executed in engine code, not store code. Then an error would make Wasm non-executable but leave trampolines executable. We currently use engine-code pointers at some points (as you probably remember from the private-code PR) but for the Wasm-to-host trampolines for builtins, we'll use store-code variants because we emit direct calls from Wasm functions to those trampolines, so that's not a solution (or at least not without making all breakpoint patchable calls indirects and burning another register).
I agree that there's not much we can do if (in the Linux case) the kernel fails to allocate a new
struct vma, and also that that's the only error case (ENOMEM generally) that I see in the Linux manpage formprotectthat isn't a bad-parameter sort of error avoided by construction. I could almost make an argument that the re-publish case should only be merging VMAs or keeping the same number, not splitting them (the whole mmap is initially one; initially publishing text splits into three (R, R+X, R); flipping the middle to R+W won't cause a merge nor a split); of course we can't rely on that for edge-case arguments about kernel OOM'ing and syscall errors but it should put us slightly more at ease...Is there an idiomatic way to panic-and-really-abort-the-process even in a panic=unwind build?
cfallin edited PR review comment.
cfallin submitted PR review.
cfallin created PR review comment:
Alternately, an idea occurs: this all happens on a fiber in practice, since guest-debugging relies on that. Could we suspend out of the fiber and "poison" it somehow? Then we eliminate the re-execution of now-non-executable code by dropping those stackframes on the floor, never returning into them again.
cfallin submitted PR review.
cfallin created PR review comment:
That's true currently, yeah; the reason I built it this way is that ABI is already a notion baked into the
FuncKeyKind(explicitly during codegen for each such trampoline, and exposed viaAbitoo) and it seemed less error-prone to render that detail explicitly than to try to special-case one builtin (which would possibly require plumbing the builtin index into more places and conditionalizing a bunch of stuff). The lazy generation of trampolines on demand during compilation also means we don't ever actually generate abreakpointtrampoline with kindWasmToBuiltin. Maybe I'm missing somewhere this leads to an efficiency concern though?
cfallin created PR review comment:
The custom-publish case should otherwise work if we don't
bailhere, and is ordinarily used in non-virtual-memory builds; is there a reason we want to rule that out? This function body is basically derived by starting withpublishand then switching the permissions (and cprop'ing out the "if needs executable" because we want to make R/W regardless, e.g. for pulley)
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Yeah, I wasn't totally sure what to do here: it's not clear (to me) that executing code in the code memory is meant to be morally equivalent to an immutable borrow from that code. I guess it has to be since that's the only safe API we provide that can read the machine code (
.text()) and executing instructions implies reading them. In that sense, though, the Wasm code is technically violating the borrowing invariants because it is fetching the instructions (immutable borrow) and also has a separate mut borrow to the wholeStore.Seen another way: should it be possible to cause a segfault by executing non-executable-bit regions when using only safe APIs on this struct within a hostcall? If this method isn't
unsafe, then it is: we get a mutStoreas a parameter to a hostcall, we unpublish, we return. Wasm code passed in a mut borrow so that was our right; but ending that borrow does not really end its effect.(By the way that's what I was trying to capture with the
editfunction that takes the closure, though a refactor pulled the actual unpublish method out because I wanted to make the edits batched -- I think that's the right call but we need to figure out the above re: safe semantics)
cfallin submitted PR review.
cfallin created PR review comment:
(I don't see the word "thread" anywhere in the docs on this method but let me know which part you want clarified and I'm happy to do so!)
cfallin updated PR #12133.
cfallin created PR review comment:
Updated, thanks! Done in c90e749f9b.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Added comment in c90e749f9b; thanks!
cfallin submitted PR review.
cfallin created PR review comment:
Updated in c90e749f9b, thanks!
cfallin updated PR #12133.
cfallin updated PR #12133.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I may be misunderstanding, but it looks like the custom-unpublished returns early above meaning it won't reach here. Returning an error if
!mmap.supports_virtual_memory()is the same aspublishabove where if the host has virtual memory but the mapping doesn't support it then this operation fails. (technically this should be an assert of some kind since it would have failed inpublish, but that's ok)
alexcrichton created PR review comment:
Yeah modeling this all in Rust's borrowing I've never quite wrapped my head around and it feels a bit like a leaky abstraction. One way to interpret this is that wasm gave the host a
&mut Thingborrow and the host did*thing = Thing::default(). Another way to interpret it though is that wasm is holding pointers intoThing(e.g. return addresses on the stack) so it's not valid for wasm to hand out&mut Thingto the host.On the other hand though a controlled deterministic crash is not something I would consider
unsafe. For example aborting the process is a safe operation and here the possible fault of forgetting to callpublishis a deterministic "you don't have permissions to execute code there" failure.So overall I feel like this still doesn't meet the criteria of being
unsafe. I'm not aware of UB if this function is called in how it interacts with the rest of the system at least.As for
edit-- it makes sense to have that style of API but I don't think that this is the right layer to implement it. Rather theBreakpointEditabstraction you added feels more appropriate. Additionally when designingunsafeprimitives I've historically felt that the abstraction needs to be as easy-to-use as possible without increasing the risk of UB. For exampletext_mutdoesn't increase the risk of UB it just makes it easier to use. Effectively writing non-idiomatic code is in a sense more risky than using standard borrowing/etc idioms with clearly definedunsafecontracts and such.As for "thread", sorry about that I skimmed the docs and just interpreted actively-in-use as threads in the system. What I mean is that
&mut CodeMemoryprovides a static guarantee that it's not actively-in-use and thus that part of the unsafety contract can go away as Rust's type system provides such a guarantee.
alexcrichton created PR review comment:
For aborting that'd be
std::process::abort, and I agree that it's sort of our only recourse here.Personally I've always felt that man pages are not a reliable source of information when it comes to "what are the exhaustive set of error conditions that could apply to this function" and instead I frequently have seen errors not documented in man pages. Given that I don't think we can reason our way around this and conclude that it shouldn't ever actually panic on Linux so we're ok. Instead I think we should expect such errors to be rare (if ever) and still ensure something reasonable happens as a result (e.g. aborting)
Another option for aborting would be to funnel things through an
extern "C"function pointer which is defined as "cannot unwind" and will catch unwinds and abort the process. That's arguably better since we want to handle panics-while-editing in addition to actual mprotect errors. Or maybe have some sort of flag onBreakpointEditwhich indicates a "yes this is a normal drop" vs an unwinding drop.I do think it would be somewhat possible to basically leak the entire fiber but that's definitely resource leakage.
Do you know how other engines cope with this? I would expect other JITs to more-or-less run into the same problem here modulo they might not have to return back to code to execute a shim to jump to the start. Otherwise though this seems like something almost all JITs would eventually run into.
alexcrichton created PR review comment:
Personally I feel like functions are easier to understand when they can't have variable ABIs and instead just have a single ABI for any particular definition, so I'd lean towards removing this new kind and updating the ABI definition of the breakpoint builtin.
There might be perf-ish concerns related to Nick's recent refactorings where sets are either dense-or-not and how things are encoded in metadata, but I'm not too concerned about that. I'm mostly worried about accidentally using a non-patchable-abi at some point since everything would more-or-less "just work" until runtime if that happened.
cfallin submitted PR review.
cfallin created PR review comment:
I guess I'm still a little skeptical: there are baked-in notions in various places that the "Wasm ABI" means standard
tail, and finding any places where we have to add a special case to break that seems worse and more error-prone than saying that there is a new kind of trampoline. I guess I don't see this as a "variable ABI" so much as "we can generate as needed a trampoline of the desired ABI1-to-ABI2 variant", with current cases Wasm-to-host and Patchable-to-host. That seems cleaner to me? Curious what @fitzgen thinks as well...
cfallin updated PR #12133.
cfallin commented on PR #12133:
Fixed s390x in #12148; that commit is also on top here to see the fix in CI but I'll rebase out once that merges first.
cfallin submitted PR review.
cfallin created PR review comment:
It looks like SpiderMonkey also landed on the "just abort the process because there's no better option" option -- see AutoWritableJitCodeFallible. Notably the "fallible" bit is to make it writable; see the comment there about how it cannot fail to make it executable again in the destructor, and that giant blinking
MOZ_CRASHinvocation ifmakeExecutableAndFlushICachefails.
alexcrichton created PR review comment:
Ok that seems convincing that we should probably do similar yeah. I'm curious though, what led you to document in
unpublishthat if it returns an error that things are left in an indeterminate state? I would naively expect that to be able to document/implement "if this fails then nothing changes, that's a guarantee" which would mean that starting a breakpoint-edit session is fallible, and it's just dropping the session that's infallible/aborts
alexcrichton submitted PR review.
cfallin updated PR #12133.
fitzgen submitted PR review.
fitzgen created PR review comment:
We already have a bunch of specific
FooToBarTrampolinefunc keys for trampolines that convert from theFooABI/caller to theBarABI/callee -- why would we diverge from that precedent/convention here?@alexcrichton are you suggesting that we should reuse
WasmToBuiltinfor this case? That would make sense to me if these trampolines were from the Wasm ABI and calling out to a builtin, but unless I am misunderstanding that is not the case: they are from the patchable ABI to a builtin. We don't already have aFuncKeykind for that pair, so I think it does merit a new kind.There might be perf-ish concerns related to Nick's recent refactorings where sets are either dense-or-not and how things are encoded in metadata, but I'm not too concerned about that.
I also doubt that this would be an issue in this case.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Well, overall this is a pretty minor point, so I don't want to over-rotate on this. At the same time though I see
FuncKeyas a more "logically these are the buckets of functions we have". We don't really have all that manyFooToBarTrampolinevariants actually:
ArrayToWasmTrampoline- this has evolved into the purpose of an entry trampoline and would probably be more aptly named as such.WasmToArrayTrampoline- this is probably better named "exit trampoline" or something like thatWasmToBuiltinTrampoline- this is effectively a different flavor of exit trampoline where it's exiting via a statically known libcall as opposed to some host function of a particular type signatureOthers, however, have little to do with ABI naming-wise:
DefinedWasmFunction,PulleyHostCall,ComponentTrampoline,ResourceDropTrampoline,UnsafeIntrinsic.I suppose a better way to phrase my concern here is that the naming here hasn't stood up well to the changes over time that have happened, specifically switching from two host ABIs (native + array) to just the one host ABI (array).
To me, with the lens of "these are logical buckets", then there's no need for this new kind because the bucket is "it's an exit trampoline for a builtin" which we already have. ABI is heavily implied in all of these variants (and more explicitly specified in some), but the ABI also goes far beyond the literal choice of something like
system_vorpatchable. For exampleWasmToArrayTrampolineABI-wise assumes that it gets aVMArrayCallHostFuncContextwhich has its own ABI.Basically it doesn't feel right to me to focus so heavily on the literal Cranelift ABI used here in these keys. I think it makes more sense to focus on logical buckets, which I feel that this new builtin fits very well into.
cfallin submitted PR review.
cfallin created PR review comment:
I understand where you're coming from in the sense that there is more to this enum than just Cranelift ABI; but on the other hand, from the perspective of "I want to use the type system to ensure that I write correct software", it doesn't feel right to me to add a single exception to one of the details ("
WasmToBuiltinmeans all of these settings, except for this one special case, so be sure to check thebuiltinindex"). Basically: the upside to your approach is that we have one less enum entry; the downside is an unknown audit of the whole Wasm-to-Cranelift compiler to ensure that we aren't implicitly assuming things aboutWasmToBuiltin, and an ongoing complexity cost in this one special exception. We leverage the type system in lots of other places to represent reality and ensure that we cover all cases and get things right; why avoid that here?
fitzgen submitted PR review:
r=me with outstanding threads resolved
fitzgen created PR review comment:
Not 100% necessary in this PR but: we might want to newtype this, just because we are often also talking about native PCs and text offsets in various places, and we don't want to mix them all up. (We should probably newtype those things too, IMO).
cfallin updated PR #12133.
cfallin submitted PR review.
cfallin created PR review comment:
OK, yup, all of that makes sense -- removed the
unsafequalifier here. Thanks!
cfallin submitted PR review.
cfallin created PR review comment:
Ah, right, I was misreading the logic in
publishabove for this case -- added the equivalentbail!here too.
cfallin updated PR #12133.
cfallin submitted PR review.
cfallin created PR review comment:
OK, updated to
std::process::aborton failure to re-publish in the drop impl (see 809b8e54f9).I've also updated the docs on
unpublishto state that if it fails, permissions are left as executable. I believe this is the case for the actual virtual memory syscall on all supported platforms -- e.g. ifmprotectfails it will leave the original permissions.I'm curious though, what led you to document in unpublish that if it returns an error that things are left in an indeterminate state?
I suppose I hadn't thought too much about what
mprotectfailure actually meant.
cfallin submitted PR review.
cfallin created PR review comment:
Agreed -- I'll take this on in followup work as I build out more APIs for this.
cfallin commented on PR #12133:
OK, I'm going to go ahead and merge based on Nick's approval here -- thanks Alex and Nick for all the comments!
cfallin has enabled auto merge for PR #12133.
cfallin updated PR #12133.
cfallin commented on PR #12133:
I had to add some icache coherence handling for aarch64 to make macOS/aarch64 happy in CI (curiously did not reproduce locally on my M1 laptop; but there could be several reasons why cache incoherency would show up differently on different uarchs or nondeterministically in general). @fitzgen mind giving fecbc22 a look and re-r+'ing if OK?
cfallin commented on PR #12133:
(That commit should properly fix #3310 once it merges as well.)
fitzgen created PR review comment:
Better to use https://doc.rust-lang.org/nightly/std/primitive.usize.html#method.next_multiple_of than to open-code this stuff so that readers have to double-check their hacker's delight knowledge.
let mut start = (start - 1).next_multiple_of(CACHE_LINE_SIZE); let end = end.next_multiple_of(CACHE_LINE_SIZE);
fitzgen submitted PR review:
New commit LGTM with one nitpick below
fitzgen created PR review comment:
(FWIW, I verified that LLVM can recreate the hacker's delight style solution for calls to these methods with constant powers-of-two multiples.)
fitzgen submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, yeah, this is an old idiom I just have stuck in my brain. To illustrate the subtlety (and why I just regurgigate the idiom), though, the snippet in your comment is slightly wrong: it needs to be
start - CACHE_LINE_SIZE + 1).next_multiple_of(CACHE_LINE_SIZE)to round down properly (consider otherwise e.g. an address halfway through a cache-line).I'll update to use the methods but it sure would be nice for there to be a
previous_multiple_oforround_downor something like that too!
cfallin edited PR review comment.
cfallin updated PR #12133.
cfallin has enabled auto merge for PR #12133.
fitzgen submitted PR review.
fitzgen created PR review comment:
Ah yes of course, good eye.
I also often wish that
prev_multiple_ofexisted...
fitzgen submitted PR review.
fitzgen created PR review comment:
There is some discussion of it in https://github.com/rust-lang/rust/issues/88581 fwiw, but it also looks like a bit of a mess...
cfallin merged PR #12133.
Last updated: Dec 13 2025 at 19:03 UTC