In the interest of helping push the story around wasip3 forward, I'm interested in pushing a bit on the guest-language side of things. Currently wizer can be an important component of Rust/Go/JS initialization and component-init is the equivalent for componentize-py. It's expected that JS will move to requiring component-init in migrating to wit-dylib which componentize-py will also move to as well (componentize-py still going to use component-init).
Currently component-init lives in a repo under @Joel Dice's github account and @Pat Hickey has a PR to merge it into Wizer. I'd like to propose that we go a step further and merge both projects into the Wasmtime repository itself. My initial proposal would be a wasmtime-wizer crate and a wasmtime wizer subcommand. The goals of this merge would be:
wasmtime wizer would share all the same CLI arguments as wasmtime run effectively and it's easy to share implementations from within Wasmtime itself.After reviewing wizer and component-init I'm envisioning that a few changes would be made along the way. Nothing ground-shatteringly different but still worth noting:
memory.init because data segments change after initialization. While memory.init is valid during initialization it's meaning data-segment-wise would change after. LLVM-based toolchains don't emit memory.init though.ref.null, table.size, table.get, etc. I believe this is an artifact of "let's enable reference-types but just the call-overlong-leb bit" but I think it's worth being more precise about which instructions are allowed or not.Those are the things I can remember off the top of my head at least after reviewing these codebases. I suspect further changes/tweaks might be necessary.
Ok that's a lot of words and exposition, but I wanted to double-check that others are ok with this as well. There's been no activity on the issue I opened in April and my impression is that folks would prefer to have Wizer taken care of by someone else, so I wanted to provide an opportunity for any objections to be raised before committing to too much
Thanks Alex that sounds like a good plan to me. Its probably worthwhile to merge component-init and wizer as part of the changes that happen along the way, much of the implementation can be shared between the two
We'll want to reject globals of any reference type. These can't be snapshotted basically. Tables of non-funcref reference types also need to be disallowed, and tables of reference-types are allowed but only because element segment initialization is always replayed and known to not be mutated in the body of the module.
When you say "reject" you mean report an error if any of these are around at the time of snapshotting, right? I.e., it'd be perfectly fine to use reference types, as long as they're all dropped when the snapshot is taken? And same for Component Model resource handles?
Also, I guess this means there's no story for GC, right? Is it even possible to snapshot those, even theoretically, or is that not viable within wasm semantics?
just a small note, at least I currently rely on the @bytecodealliance/wizer npm package, would be nice if those were still released, but otherwise the change sounds great
Is it even possible to snapshot those, even theoretically, or is that not viable within wasm semantics?
There's no way to write constant initializer expressions that construct an arbitrary object graph with cycles, if I understand correctly (p. 73 of https://webassembly.github.io/spec/versions/core/WebAssembly-3.0-draft.pdf does show struct.new*, array.new*, etc as valid constant initializers, assuming their field values/operands are constant, but it's noted there that initializers for a global can only refer to previously defined globals, so there's no way to build a cycle).
A way around this would be to write out a sequence of initialization code that rebuilds the object graph, of course; either all objects for simplicity, or only "tying the knot" on the residual cycles/forward refs if we analyze for that. Then prepend this code to the start function (or add one).
I don't think any of this would require adding mutability to globals or fields, since the original program to construct the cycle would already require that mutability to construct the original graph we're snapshotting.
That's my quick analysis anyway, not to say any of this is easy only that it seems possible with enough work :-)
thanks, that's very helpful. My hunch would be that only the "tying the knot" version of this would lead to viable performance, but I might be wrong about that
My thinking was that tables-of-references and globals-of-references would be outright rejected at validation time. No escape hatch for "ok well you didn't modify it so that's ok". The main reason for me is that with the component model it's not actually possible to look at the value of any reference-typed global or table.
The reason function tables sort-of-work is that we don't modify any element segments and all table mutations are forbidden (e.g. table.set will be disallowed and you can't mutate items you fetch via table.get since they're funcref). Tables or globals of anyref could, for example, have mutated contents after you get them and there'd be no way to observe that across a component boundary.
We could consider relaxing these restrictions for the core wasm version of initialization where we can actually witness the shape of object graphs and such, though. My thinking was that this probably wasn't necessary so we'd disallow it anyway. Otherwise Chris's idea is the best I'd have. All start functions are already removed so we'd basically just be replacing the start function, if any, and that'd replay all mutations made that we couldn't stick in constant initializers
just a small note, at least I currently rely on the
@bytecodealliance/wizer npm package, would be nice if those were still
released, but otherwise the change sounds great
Thanks for highlighting that! I've got no experience setting up npm pubilshing from CI, much less from the Wasmtime repo, so this'll be interesting...
Alex Crichton said:
There's been no activity on the issue I opened in April
Huh, I never saw that issue, sorry!
In general I'm in favor.
Have you thought at all about how to expose Wizer's programmatic API from Wasmtime's API?
regarding snapshotting references and mutating tables, we have an issue open with some brainstormed ideas from a while ago: https://github.com/bytecodealliance/wizer/issues/29
straightline code on an injected __init_gc_heap function or whatever to create the initial heap graphs is also what I have been imagining for GC snapshots.
but given that this tool is effectively becoming wasmtime-specific anyways, we could potentially have a custom section that encodes the heap graph, and then compilation to a .cwasm could create a GC heap CoW image from that custom section or something. Or we could do use the custom section only as a hint to analyze the entry points and check that the init-gc-heap function is the first thing called on all paths, and if so then do compile-time evaluation of the init-gc-heap function and make a CoW image from that, and then remove the calls to the init-gc-heap function from the .cwasm
but this stuff can all be discussed after the initial merge into wasmtime
FWIW, "GC object initializers including cycles" seems like a reasonable topic to bring up at the CG -- for parity with linear memories, the ability to set up the Wasm module state to any state reachable at runtime seems like a clearly missing capability...
(I realize "let's standardize it!" is not at all the shortest path here but just adding my voice in favor of that route)
(also, in my infinite free cycles, I'd be interested in helping to explore what that would look like; maybe something like rec-groups for types? each "GC data segment" is a letrec?)
Chris Fallin said:
A way around this would be to write out a sequence of initialization code that rebuilds the object graph, of course; either all objects for simplicity, or only "tying the knot" on the residual cycles/forward refs if we analyze for that. Then prepend this code to the start function (or add one).
This seems like the most pragmatic, runtime-agnostic option to me (at least until there's a standard, declarative way to do it). component-init already removes any and all start functions as part of its job; adding a new one just for building (or adding cycles to) an object graph seems reasonable.
Standards-wise we're also likely going to want to figure out a way to plumb reference types through the component model WIT boundary, or perhaps make all this wasmtime-internal, but somehow dealing with reference types and components
I agree that having good standards support for this is very important, yes. In the meantime, in addition to the options laid out above, an option could be to make snapshotting emit cwasm files, not wasm, and do whatever works best inside Wasmtime—i.e., snapshot the GC heap as effectively a linear memory
we can't literally snapshot the GC heap's linear memory because it contains VMSharedTypeIndexs, which will change in different Engines at runtime (depending on the order that modules are loaded/unloaded and all that)
we could try to factor those out to a side table somehow, and use module type indices instead, but then casts and subtyping and everything would get slower, especially when linking modules that use the same GC types together
ah, that makes sense. And building a patch table wouldn't work?
(and also instance IDs in exception objects' headers, because of the generativity)
That said, it does seem desirable to make the GC heap image a pure function of the Wasm execution only, just for the CoW startup advantages, especially in a future where many more languages use GC...
Till Schneidereit said:
ah, that makes sense. And building a patch table wouldn't work?
we could always do that, but every single object's header would need patching, so I doubt we would actually gain much in practice (it would be rare that a page wouldn't get patched, forcing a copy)
Chris Fallin said:
(and also instance IDs in exception objects' headers, because of the generativity)
ah yes, good point. not used to thinking about exceptions yet!
Chris Fallin said:
That said, it does seem desirable to make the GC heap image a pure function of the Wasm execution only, just for the CoW startup advantages, especially in a future where many more languages use GC...
yeah if we can figure out how to keep ref.{test,cast} and br_on_cast[_fail] fast, that would be fantastic.
I guess the common case is same-module stuff, so maybe we could replace the VMSharedTypeIndex in the object header with (ModuleId, ModuleInternedTypeIndex) packed as a u64 and the fast path goes from a pair of u32 loads and a comparison to a pair of u64 loads and a comparison and it doesn't handle cross-module compares anymore. then we have a series of side tables that jit code can access to go from (ModuleId, ModuleInternedTypeIndex) to VMSharedTypeIndex at runime or something. So with this scheme, every GC object gets a little bigger, our fast path is a tiny bit slower in theory (maybe not actually in practice?) and handles slightly fewer cases. but we gain snapshot-ability. seems not too bad
first step is here: https://github.com/bytecodealliance/wasmtime/pull/11805
(ModuleId, ModuleInternedTypeIndex)packed as au64
Do we need that much index space for both of these? As in, could we get by with using 16 bit for each, or something a bit more annoying like 14 or 12 for module IDs and the rest for type indices?
Details aside, this makes sense to me
probably we could? but also the existing system assumes 32 bits and it might be annoying to track down all the places we make those assumptions. but maybe not, worth trying
Ok Wizer is now merged by merging two unrelated histories, and a subsequent PR has landed showing repo CI is still green.
If any issues crop up feel free to ping my. Fingers crossed my assertion that bisection/blame all works well will bear true in the limit of time...
Thanks Alex!
https://github.com/bytecodealliance/wasmtime/pull/11878 is the next major step for component-init
Ok this is now more-or-less done. Wizer's README is updated to point to wasmtime and the source code of wizer itself now uses wasmtime (git dep for now). I've got a PR for component-init too.
I think that wasmtime-wizer is at feature parity with both component-init and wizer, so anything remaining is a bug I'm not aware of
Alex Crichton said:
I think that wasmtime-wizer is at feature parity with both component-init and wizer
Does that include the module index mapping hack we talked about which componentize-py uses, or are we hoping to make that obsolete by way of improved component support in wasmtime-py?
I'm hoping that no one ever learns about that but you and me so I can continue to lie and say there's feature parity
well, I was hoping that, not any more lol
but yeah I'm hoping to obsolete it
what about the guest crate for component-init?
it should be renamed from component-init to wizer probably?
that's true, the guest side of tooling isn't fleshed out yet
the init function was also renamed to wizer-initialize to avoid ambiguity
yeah, thats a good name
also to be the same as core wasm which is also now wizer-initialize (to be the same as components)
related: if we want to have library code which can take advantage of wizer, can we allow it to be wizer-initialize-*?
naively I'd expect that to be a C++-like ctor/initializer ideally
wizer-init? initialize sure is a lot of syllables... :slight_smile:
what do you mean c++-like ctor/initializer
e.g. _initialize basically
the old wasip1 convention which wasm-tools component new recognizes
i dont think i follow
Lann Martin said:
wizer-init?initializesure is a lot of syllables... :slight_smile:
You can change the default when invoking wasmtime wizer, fwiw
sry heading out to dinner now, I'll respond more later though
Ok to expand on _initialize more -- for C++ for example when compiled with -mexec-model=reactor all C++ intializers (e.g. constructors for statics) are executed from _initialize and the linker is responsible for synthesizing __wasm_call_ctors which lists all functions from all libraries. That is then executed during wizer-initialize since it's an export as wasm-tools component new recognizes _initialize and runs it. Effectively __wasm_call_ctors should be run during initialization and, IMO, is a better vector for library-based initialization.
fwiw I always just call __wasm_call_ctors() from wizer.initialize before doing anything else since it doesn't hurt to call it if there are no ctors
I didn’t know this worked. That’s super helpful
Is there enough exposed so that wasmtime wizer can run _initialize and nothing else if there’s no wizer-initialize export?
For core wasm there's support for specifying the initialization function as _initialize, but for components there's no support just yet for "just instantiate this component, don't actually invoke anything"
right but is there anything that stops us from doing that besides adding the right logic to wasmtime wizer itself?
basically what im looking for is that, if im producing components and intend to wizer them, i can stop using the #[component-init] proc macro and just use #[ctor] instead, and not lose anything by doing so
yeah it would be pretty easy to configure wizer to just instantiate and do nothing else
now I'll also caution that iunno if #[ctor] supports wasm just yet
I forget if rustc exposes enough things to plumb through to LLVM to support that
it claims to
oh nice
Last updated: Dec 06 2025 at 06:05 UTC