wasmtime / PR #12051 Debugging: implement private copies ... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #12051 Debugging: implement private copies ...

Wasmtime GitHub notifications bot (Nov 20 2025 at 00:36):

cfallin opened PR #12051 from cfallin:you-get-a-code-segment-you-get-a-code-segment-everyone-gets-a-code-segment to bytecodealliance:main:

This will be needed to allow code-patching for our basic breakpoint strategy (see #11964). The idea is that when we have guest-debugging enabled in an engine's configuration, each store in that engine needs its own copy of each CodeMemory that executes any code with that store so that we can patch breakpoints.

This PR first adds "deep-cloning" to the MmapVec and CodeMemory abstractions: the idea is that when possible, we re-map the underlying file with another private mapping, and otherwise, in the worst case (when we don't know the underlying file or when the image was provided to us as an external slice), we make a full copy into a new allocation.

It then updates the per-store registry to track private copies of each CodeMemory, identified by CompiledModuleId, and deep-clone when it sees a new one for the first time. This is slightly subtle because of the way that components share a single CodeMemory (and do not have a CompiledModule) for all internal modules.

This CodeMemory is then threaded through to the appropriate places that feed back pointers to the code: e.g., accessors for trampolines and Wasm function pointers require the specific code memory to be specified.

Finally, at the instance level, we keep the instantiated CodeMemory alongside the Module in the ModuleRuntimeInfo.

I went back and forth between a few different ways to plumb all this together. I'm not completely happy with the "give me the CodeMemory you mean" API for getting functions -- that could be a footgun. On the other hand, pushing the cloning upward causes a lot more churn, and bifurcates the core-module vs. component situation more. This still seems reasonably manageable in the sense that only a few specific accessors are affected and they are used only internally, in places where we have the CodeMemory right at hand from the particular instance or from the (updated) "lookup module by PC" helper on the store's registry. Happy to take ideas for a better design here if one occurs to someone!

Wasmtime GitHub notifications bot (Nov 20 2025 at 00:36):

cfallin requested fitzgen for a review on PR #12051.

Wasmtime GitHub notifications bot (Nov 20 2025 at 00:36):

cfallin requested wasmtime-core-reviewers for a review on PR #12051.

Wasmtime GitHub notifications bot (Nov 20 2025 at 01:04):

cfallin updated PR #12051.

Wasmtime GitHub notifications bot (Nov 20 2025 at 01:06):

cfallin updated PR #12051.

Wasmtime GitHub notifications bot (Nov 20 2025 at 01:14):

cfallin updated PR #12051.

Wasmtime GitHub notifications bot (Nov 21 2025 at 15:22):

alexcrichton created PR review comment:

Technically the trampoline accessors don't need a CodeMemory argument, right? We'll only be patching/changing "finished functions" so the new argument here and for wasm->array trampolines below might be able to be dropped?

Wasmtime GitHub notifications bot (Nov 21 2025 at 15:22):

alexcrichton created PR review comment:

Since this doesn't actually use &self at all here any more, maybe this could move to CodeMemory directly?

Wasmtime GitHub notifications bot (Nov 21 2025 at 15:22):

alexcrichton submitted PR review:

I've left a few comments here and there, but a high-level thought I had here is that I was originally hoping that it would be possible to use Arc::get_mut to dynamically prove that it's safe to mutate the memory. In reading over this though we have quite a lot of clones in quite a lot of places so I'm realizing that it may not be quite as applicable as previously thought.

Another high-level thought though is that there's a fair bit of juggling of trying to make sure we're using the right code memory in the right context and it feels relatively cumbersome to maintain that everywhere.

Would it be possible to maybe do the deep clone of-a-sort on the Module itself to avoid having to juggle one-vs-the-other? I'd ideally like to keep the self-contained nature of Module as-is where callers don't have to worry about passing in the right object or making sure they're dynamically selecting the right object. One semi-rough idea is that we could change the representation of Module to be an enum which is either "pointer to the stuff" or "private code plus pointer to the stuff"

Wasmtime GitHub notifications bot (Nov 21 2025 at 15:22):

alexcrichton created PR review comment:

Technically I think this'll be wrong for private code memories? This sould use the "active" code as opposed to the default one, right?

Wasmtime GitHub notifications bot (Nov 21 2025 at 15:22):

alexcrichton created PR review comment:

Although as I say we try to reduce clones... there's another map lookup here for Instance::new-style constructors and then there's also module.clone() below. So maybe we have some refactoring to do to reduce all the clones here...

Wasmtime GitHub notifications bot (Nov 21 2025 at 15:22):

alexcrichton created PR review comment:

Similar comment to components above -- I think this new argument to register_module can be dropped and the engine can be derived from the module itself

Wasmtime GitHub notifications bot (Nov 21 2025 at 15:22):

alexcrichton created PR review comment:

To avoid having to re-guess what the alignment is here, could the alignment be passed in to more constructors?

Wasmtime GitHub notifications bot (Nov 21 2025 at 15:22):

alexcrichton created PR review comment:

Given that this is the hot path of instantiation I'd like to try to keep clones out of this path where possible. For this I think the engine is reachable via the component in addition to the store, so could the new engine argument to register_component come from component.engine() instead? (may not end up needing a new argument in that case)

Wasmtime GitHub notifications bot (Nov 21 2025 at 16:33):

cfallin commented on PR #12051:

Thanks for your comments @alexcrichton! A high-level discussion point:

Another high-level thought though is that there's a fair bit of juggling of trying to make sure we're using the right code memory in the right context and it feels relatively cumbersome to maintain that everywhere.

Would it be possible to maybe do the deep clone of-a-sort on the Module itself to avoid having to juggle one-vs-the-other? I'd ideally like to keep the self-contained nature of Module as-is where callers don't have to worry about passing in the right object or making sure they're dynamically selecting the right object. One semi-rough idea is that we could change the representation of Module to be an enum which is either "pointer to the stuff" or "private code plus pointer to the stuff"

I think this is the ModuleRuntimeInfo with my changes -- it goes from a simple Module to a Module plus Arc<CodeMemory>.

In a little more detail: I did explore the "clone all the way up" approach before taking the current direction, and the main issue I came across was that (to say it bluntly) "it's a lie": in the presence of introspection APIs, e.g. the ability to get a module from an Instance handle (and for one to get all instances via debugging introspection APIs), we want the same module to be the same. It also adds a lot more cloning cost obviously, which can be mitigated with a bunch of internal Arcs, but that's not great either. (It puts most of the cost in the debugging case rather than base case, but does mean there are more indirections in the base case.)

My thought was that the ModuleRuntimeInfo would be the place where a module is joined together with a particular copy of its code. The most proper ("most truthful") refactor from there would be to remove all code-related accessors from Module and put them on ModuleRuntimeInfo instead, but that's also a large refactor that affects a bunch of callsites. Actually now that I think about it, though -- I had been worried about the accessors that fetch data from the image rather than return pointers to code (Wasm data, exception table, ...) and what they would do, but we could create an ad-hoc ModuleRuntimeInfo at that point. Or describe the abstraction as a new ModuleWithCode, replace ModuleRuntimeInfo::Module's payload with that, and hang all the accessors that need code off ModuleWithCode. Then we can get a "module with default code" from the Module itself if we know we aren't going to return pointers to it.

What do you think?

Wasmtime GitHub notifications bot (Nov 21 2025 at 17:10):

cfallin commented on PR #12051:

Then we can get a "module with default code" from the Module itself if we know we aren't going to return pointers to it.

And, actually, as a refinement on that: we could do some type-system shenanigans so that a "module with default code" is not allowed to return raw code pointers, only other (meta)data from the image, while a "module with code" obtained from an instance is.

Wasmtime GitHub notifications bot (Nov 21 2025 at 17:13):

fitzgen commented on PR #12051:

but that's also a large refactor that affects a bunch of callsites.

But it also sounds really nice...

Spitballing without having dug into the PR yet: is it possible that some lsp actions could help make this mass refactor easier?

Wasmtime GitHub notifications bot (Nov 21 2025 at 17:20):

cfallin commented on PR #12051:

Yeah, I guess "large refactor that affects a lot of callsites" was my reason for not doing it so far, but I'm happy to do it to get the right conceptual model.

To be clear, that's ModuleWithCode and moving image-related accessors onto that (and getting at them from the ModuleRuntimeInfo), not clone-Module-all-the-way-up. @alexcrichton does this sound good to you? I want to make sure you're onboard before I spend a day doing this :-)

Wasmtime GitHub notifications bot (Nov 21 2025 at 18:32):

fitzgen created PR review comment:

I originally expected this to be an assert that the entry was not occupied, but I double checked and we do in fact potentially register the same code multiple times (e.g. we register a module's code on each Instance::new call and you can obviously instantiate the same module multiple times in the same store).

This subtlety pushes me towards asking for some kind of assertion that the existing private code object matches our code_object parameter, but I don't see how that is really possible without a full byte-for-byte comparison of the whole mmap, which seems like it is probably too expensive.

So ultimately maybe we just expand this method's doc comment to note that the method is idempotent and that modules will be registered multiple times in certain scenarios? That reminder/context probably would have let me avoid going down this rabbit hole.

Or maybe we should invert some logic here, and always use self.private_code (after renaming it to something like self.registered_code) so that we can super easily tell if a code object is already registered and return early if so, without needing to actually do the careful text range queries below? I think this inversion would make the idempotency logic much clearer. Thoughts?

Wasmtime GitHub notifications bot (Nov 21 2025 at 18:32):

fitzgen submitted PR review:

To briefly repeat some comments from DM: I am in favor of the refactor discussed in this PR's main thread where

we create a type separation between the shared code at the engine level and the (potentially private copy) code used at the store level

all accessors for function pointers and such move to the store-level code type (or a Module and store-level code pair type)

the act of creating the store-level code is the thing one place we are making a private copy or reusing the shared copy

bikeshedding names for these two new types:

EngineCode and StoreCode

CodeModule and CodeInstance

CodeFactory and CodeInstance lol

Wasmtime GitHub notifications bot (Nov 21 2025 at 18:32):

fitzgen created PR review comment:

We should probably have tests for non-aligned inputs to deserialize_raw if we don't already.

But also, this seems like the kind of thing where we can simply update the invariants of deserialize_raw if it becomes overly constraining. shrug

Wasmtime GitHub notifications bot (Nov 21 2025 at 18:32):

fitzgen created PR review comment:

I know this bit might go away after the refactoring we have been discussing in the main comment thread for this PR, but if not:

It would be good to clarify why it is okay to do this for trampolines to host functions (we never have private copies of host functions so we don't need to worry about which copy of the code we are using) because it is not okay to use the default code memory for trampolines to Wasm functions (and a reader might otherwise mistakenly believe it is okay to use the default code memory for all trampolines, without the suggested context added to the comment).

Wasmtime GitHub notifications bot (Nov 26 2025 at 23:28):

cfallin commented on PR #12051:

Just an update on progress here: I'm midway through the refactor, and it's proving quite a lot more invasive than I had hoped. On the positive side, I've worked out a way to have truly unique ownership of a StoreCode, so mutation can be proved to be safe given a mut borrow of the Store. But in terms of impact, this is very deeply invasive, and I worry that (for example) indirect function calls are going to be penalized: lazily setting up the funcref now requires looking up the StoreCode for the module in question, so rather than a few fast array accesses in the CompiledModule, we have a BTree lookup. The impact on the guest profiler is pretty deep (it handles all its metadata regarding PCs at the module level) and InstancePre (though the latter may be OK only if we allow executing Wasm-to-array trampolines in the engine code rather than the store code, allowing pre-resolution of host funcrefs; this breaks a desirable invariant that we never execute engine code directly though).

I'll keep plugging away at this, but just wanted to surface the cost of the above requested refactor :-) I will say that with the types as now defined, I feel much more reasonable about having not missed any place that would execute the wrong copy of code.

Wasmtime GitHub notifications bot (Nov 29 2025 at 23:53):

cfallin updated PR #12051.

Wasmtime GitHub notifications bot (Nov 29 2025 at 23:54):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Nov 29 2025 at 23:54):

cfallin created PR review comment:

Yes, the logic here is pre-existing and a little weird, but appears to be necessary because of the way that modules within a component share one compiled artifact (or at least, without a deeper refactor of the whole components runtime). I've added more doc-comments in the refactor on the ModuleRegistry that hopefully address this (most of this file is new now as well).

Wasmtime GitHub notifications bot (Nov 29 2025 at 23:56):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Nov 29 2025 at 23:56):

cfallin created PR review comment:

Yep, I've added some more comments to this effect, and also had to add a store_invariant_func(..) accessor and a notion of FuncKey::is_store_invariant to make this concept explicit. I've documented it more in the appropriate comment locations as well as the new commit message -- basically the structure of the component runtime and also InstancePre both require us to use engine-wide trampoline pointers in some places currently, and that's fine for trampolines that go out of Wasm (Wasm-to-array and Wasm-to-builtin).

Wasmtime GitHub notifications bot (Nov 29 2025 at 23:57):

cfallin edited PR #12051:

This will allow patching code to implement e.g. breakpoints. (That is, for now the copies are redundant, but soon they will not be.)

This change follows the discussion [here] and offline to define a few types that better encapsulate the distinction we want to enforce. Basically, there is almost never a bare CodeMemory; they are always wrapped in an EngineCode or StoreCode, the latter being a per-store instance of the former. Accessors are moved to the relevant place so that, for example, one cannot get a pointer to a Wasm function's body without being in the context of a Store where the containing module has been registered. The registry then returns a ModuleWithCode that boxes up a Module reference and StoreCode together for cases where we need both the metadata from the module and the raw code to derive something.

The only case where we return raw code pointers to the EngineCode directly have to do with Wasm-to-array trampolines: in some cases, e.g. InstancePre pre-creating data structures with references to host functions, it breaks our expected performance characteristics to make the function pointers store-specific. This is fine as long as the Wasm-to-array trampolines never bake in direct calls to Wasm functions; the strong invariant is that Wasm functions never execute from EngineCode directly. Some parts of the component runtime would also have to be substantially refactored if we wanted to do away with this exception.

The per-Store module registry is substantially refactored in this PR. I got rid of the modules-without-code distinction (the case where a module only has trampolines and no defined functions still works fine), and organized the BTreeMaps to key on start address rather than end address, which I find a little more intuitive (one then queries with the dual to the range -- 0-up-to-PC and last entry found).

[here]: https://github.com/bytecodealliance/wasmtime/pull/12051#pullrequestreview-3493711812

Last updated: Feb 24 2026 at 04:36 UTC