cfallin opened PR #12051 from cfallin:you-get-a-code-segment-you-get-a-code-segment-everyone-gets-a-code-segment to bytecodealliance:main:
This will be needed to allow code-patching for our basic breakpoint strategy (see #11964). The idea is that when we have guest-debugging enabled in an engine's configuration, each store in that engine needs its own copy of each
CodeMemorythat executes any code with that store so that we can patch breakpoints.This PR first adds "deep-cloning" to the
MmapVecandCodeMemoryabstractions: the idea is that when possible, we re-map the underlying file with another private mapping, and otherwise, in the worst case (when we don't know the underlying file or when the image was provided to us as an external slice), we make a full copy into a new allocation.It then updates the per-store registry to track private copies of each
CodeMemory, identified byCompiledModuleId, and deep-clone when it sees a new one for the first time. This is slightly subtle because of the way that components share a single CodeMemory (and do not have a CompiledModule) for all internal modules.This CodeMemory is then threaded through to the appropriate places that feed back pointers to the code: e.g., accessors for trampolines and Wasm function pointers require the specific code memory to be specified.
Finally, at the instance level, we keep the instantiated CodeMemory alongside the Module in the
ModuleRuntimeInfo.I went back and forth between a few different ways to plumb all this together. I'm not completely happy with the "give me the CodeMemory you mean" API for getting functions -- that could be a footgun. On the other hand, pushing the cloning upward causes a lot more churn, and bifurcates the core-module vs. component situation more. This still seems reasonably manageable in the sense that only a few specific accessors are affected and they are used only internally, in places where we have the
CodeMemoryright at hand from the particular instance or from the (updated) "lookup module by PC" helper on the store's registry. Happy to take ideas for a better design here if one occurs to someone!<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
cfallin requested fitzgen for a review on PR #12051.
cfallin requested wasmtime-core-reviewers for a review on PR #12051.
cfallin updated PR #12051.
cfallin updated PR #12051.
cfallin updated PR #12051.
alexcrichton created PR review comment:
Technically the trampoline accessors don't need a
CodeMemoryargument, right? We'll only be patching/changing "finished functions" so the new argument here and for wasm->array trampolines below might be able to be dropped?
alexcrichton created PR review comment:
Since this doesn't actually use
&selfat all here any more, maybe this could move toCodeMemorydirectly?
alexcrichton submitted PR review:
I've left a few comments here and there, but a high-level thought I had here is that I was originally hoping that it would be possible to use
Arc::get_mutto dynamically prove that it's safe to mutate the memory. In reading over this though we have quite a lot of clones in quite a lot of places so I'm realizing that it may not be quite as applicable as previously thought.Another high-level thought though is that there's a fair bit of juggling of trying to make sure we're using the right code memory in the right context and it feels relatively cumbersome to maintain that everywhere.
Would it be possible to maybe do the deep clone of-a-sort on the
Moduleitself to avoid having to juggle one-vs-the-other? I'd ideally like to keep the self-contained nature ofModuleas-is where callers don't have to worry about passing in the right object or making sure they're dynamically selecting the right object. One semi-rough idea is that we could change the representation ofModuleto be anenumwhich is either "pointer to the stuff" or "private code plus pointer to the stuff"
alexcrichton created PR review comment:
Technically I think this'll be wrong for private code memories? This sould use the "active" code as opposed to the default one, right?
alexcrichton created PR review comment:
Although as I say we try to reduce clones... there's another map lookup here for
Instance::new-style constructors and then there's alsomodule.clone()below. So maybe we have some refactoring to do to reduce all the clones here...
alexcrichton created PR review comment:
Similar comment to components above -- I think this new argument to
register_modulecan be dropped and the engine can be derived from the module itself
alexcrichton created PR review comment:
To avoid having to re-guess what the alignment is here, could the alignment be passed in to more constructors?
alexcrichton created PR review comment:
Given that this is the hot path of instantiation I'd like to try to keep clones out of this path where possible. For this I think the engine is reachable via the
componentin addition to the store, so could the newengineargument toregister_componentcome fromcomponent.engine()instead? (may not end up needing a new argument in that case)
cfallin commented on PR #12051:
Thanks for your comments @alexcrichton! A high-level discussion point:
Another high-level thought though is that there's a fair bit of juggling of trying to make sure we're using the right code memory in the right context and it feels relatively cumbersome to maintain that everywhere.
Would it be possible to maybe do the deep clone of-a-sort on the Module itself to avoid having to juggle one-vs-the-other? I'd ideally like to keep the self-contained nature of Module as-is where callers don't have to worry about passing in the right object or making sure they're dynamically selecting the right object. One semi-rough idea is that we could change the representation of Module to be an enum which is either "pointer to the stuff" or "private code plus pointer to the stuff"
I think this is the
ModuleRuntimeInfowith my changes -- it goes from a simpleModuleto aModuleplusArc<CodeMemory>.In a little more detail: I did explore the "clone all the way up" approach before taking the current direction, and the main issue I came across was that (to say it bluntly) "it's a lie": in the presence of introspection APIs, e.g. the ability to get a module from an
Instancehandle (and for one to get all instances via debugging introspection APIs), we want the same module to be the same. It also adds a lot more cloning cost obviously, which can be mitigated with a bunch of internalArcs, but that's not great either. (It puts most of the cost in the debugging case rather than base case, but does mean there are more indirections in the base case.)My thought was that the
ModuleRuntimeInfowould be the place where a module is joined together with a particular copy of its code. The most proper ("most truthful") refactor from there would be to remove all code-related accessors fromModuleand put them onModuleRuntimeInfoinstead, but that's also a large refactor that affects a bunch of callsites. Actually now that I think about it, though -- I had been worried about the accessors that fetch data from the image rather than return pointers to code (Wasm data, exception table, ...) and what they would do, but we could create an ad-hocModuleRuntimeInfoat that point. Or describe the abstraction as a newModuleWithCode, replaceModuleRuntimeInfo::Module's payload with that, and hang all the accessors that need code offModuleWithCode. Then we can get a "module with default code" from theModuleitself if we know we aren't going to return pointers to it.What do you think?
cfallin commented on PR #12051:
Then we can get a "module with default code" from the Module itself if we know we aren't going to return pointers to it.
And, actually, as a refinement on that: we could do some type-system shenanigans so that a "module with default code" is not allowed to return raw code pointers, only other (meta)data from the image, while a "module with code" obtained from an instance is.
fitzgen commented on PR #12051:
but that's also a large refactor that affects a bunch of callsites.
But it also sounds really nice...
Spitballing without having dug into the PR yet: is it possible that some lsp actions could help make this mass refactor easier?
cfallin commented on PR #12051:
Yeah, I guess "large refactor that affects a lot of callsites" was my reason for not doing it so far, but I'm happy to do it to get the right conceptual model.
To be clear, that's
ModuleWithCodeand moving image-related accessors onto that (and getting at them from theModuleRuntimeInfo), not clone-Module-all-the-way-up. @alexcrichton does this sound good to you? I want to make sure you're onboard before I spend a day doing this :-)
fitzgen created PR review comment:
I originally expected this to be an assert that the entry was not occupied, but I double checked and we do in fact potentially register the same code multiple times (e.g. we register a module's code on each
Instance::newcall and you can obviously instantiate the same module multiple times in the same store).This subtlety pushes me towards asking for some kind of assertion that the existing private code object matches our
code_objectparameter, but I don't see how that is really possible without a full byte-for-byte comparison of the whole mmap, which seems like it is probably too expensive.So ultimately maybe we just expand this method's doc comment to note that the method is idempotent and that modules will be registered multiple times in certain scenarios? That reminder/context probably would have let me avoid going down this rabbit hole.
Or maybe we should invert some logic here, and always use
self.private_code(after renaming it to something likeself.registered_code) so that we can super easily tell if a code object is already registered and return early if so, without needing to actually do the careful text range queries below? I think this inversion would make the idempotency logic much clearer. Thoughts?
fitzgen submitted PR review:
To briefly repeat some comments from DM: I am in favor of the refactor discussed in this PR's main thread where
- we create a type separation between the shared code at the engine level and the (potentially private copy) code used at the store level
- all accessors for function pointers and such move to the store-level code type (or a
Moduleand store-level code pair type)- the act of creating the store-level code is the thing one place we are making a private copy or reusing the shared copy
bikeshedding names for these two new types:
EngineCodeandStoreCodeCodeModuleandCodeInstanceCodeFactoryandCodeInstancelol
fitzgen created PR review comment:
We should probably have tests for non-aligned inputs to
deserialize_rawif we don't already.But also, this seems like the kind of thing where we can simply update the invariants of
deserialize_rawif it becomes overly constraining. shrug
fitzgen created PR review comment:
I know this bit might go away after the refactoring we have been discussing in the main comment thread for this PR, but if not:
It would be good to clarify why it is okay to do this for trampolines to host functions (we never have private copies of host functions so we don't need to worry about which copy of the code we are using) because it is not okay to use the default code memory for trampolines to Wasm functions (and a reader might otherwise mistakenly believe it is okay to use the default code memory for all trampolines, without the suggested context added to the comment).
cfallin commented on PR #12051:
Just an update on progress here: I'm midway through the refactor, and it's proving quite a lot more invasive than I had hoped. On the positive side, I've worked out a way to have truly unique ownership of a
StoreCode, so mutation can be proved to be safe given a mut borrow of theStore. But in terms of impact, this is very deeply invasive, and I worry that (for example) indirect function calls are going to be penalized: lazily setting up the funcref now requires looking up theStoreCodefor the module in question, so rather than a few fast array accesses in theCompiledModule, we have a BTree lookup. The impact on the guest profiler is pretty deep (it handles all its metadata regarding PCs at the module level) andInstancePre(though the latter may be OK only if we allow executing Wasm-to-array trampolines in the engine code rather than the store code, allowing pre-resolution of host funcrefs; this breaks a desirable invariant that we never execute engine code directly though).I'll keep plugging away at this, but just wanted to surface the cost of the above requested refactor :-) I will say that with the types as now defined, I feel much more reasonable about having not missed any place that would execute the wrong copy of code.
cfallin updated PR #12051.
cfallin submitted PR review.
cfallin created PR review comment:
Yes, the logic here is pre-existing and a little weird, but appears to be necessary because of the way that modules within a component share one compiled artifact (or at least, without a deeper refactor of the whole components runtime). I've added more doc-comments in the refactor on the
ModuleRegistrythat hopefully address this (most of this file is new now as well).
cfallin submitted PR review.
cfallin created PR review comment:
Yep, I've added some more comments to this effect, and also had to add a
store_invariant_func(..)accessor and a notion ofFuncKey::is_store_invariantto make this concept explicit. I've documented it more in the appropriate comment locations as well as the new commit message -- basically the structure of the component runtime and alsoInstancePreboth require us to use engine-wide trampoline pointers in some places currently, and that's fine for trampolines that go out of Wasm (Wasm-to-array and Wasm-to-builtin).
cfallin edited PR #12051:
This will allow patching code to implement e.g. breakpoints. (That is, for now the copies are redundant, but soon they will not be.)
This change follows the discussion [here] and offline to define a few types that better encapsulate the distinction we want to enforce. Basically, there is almost never a bare
CodeMemory; they are always wrapped in anEngineCodeorStoreCode, the latter being a per-store instance of the former. Accessors are moved to the relevant place so that, for example, one cannot get a pointer to a Wasm function's body without being in the context of aStorewhere the containing module has been registered. The registry then returns aModuleWithCodethat boxes up aModulereference andStoreCodetogether for cases where we need both the metadata from the module and the raw code to derive something.The only case where we return raw code pointers to the
EngineCodedirectly have to do with Wasm-to-array trampolines: in some cases, e.g.InstancePrepre-creating data structures with references to host functions, it breaks our expected performance characteristics to make the function pointers store-specific. This is fine as long as the Wasm-to-array trampolines never bake in direct calls to Wasm functions; the strong invariant is that Wasm functions never execute fromEngineCodedirectly. Some parts of the component runtime would also have to be substantially refactored if we wanted to do away with this exception.The per-
Storemodule registry is substantially refactored in this PR. I got rid of the modules-without-code distinction (the case where a module only has trampolines and no defined functions still works fine), and organized the BTreeMaps to key on start address rather than end address, which I find a little more intuitive (one then queries with the dual to the range -- 0-up-to-PC and last entry found).[here]: https://github.com/bytecodealliance/wasmtime/pull/12051#pullrequestreview-3493711812
Last updated: Dec 06 2025 at 06:05 UTC