wasmtime / PR #12633 Add incremental snapshotting support... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #12633 Add incremental snapshotting support...

Wasmtime GitHub notifications bot (Feb 21 2026 at 16:37):

Gentle opened PR #12633 from Gentle:feat-savestates to bytecodealliance:main:

Rationale: If you keep the instrumentation information inside the output wasm file, then you can at any time use wizer to snapshot it again, allowing you to effectively save and resume later or on another machine

adds a keep_instrumentation option that preserves __wizer_* exports in the output module

adds a parse_instrumented method that re-parses a previously snapshotted module so it can be snapshotted again.

parser::parse has been adjusted to either reject or require existing __wizer_* exports depending on new bool arg

parse_instrumented skips validating the wasm module, the logic behind this is that if the wasm file was successfully instrumented before, then it must be valid. From micro benchmarking, wasm_validate takes longer than instrumentation itself, so there is a significant speedup by reusing the instrumentation information

Wasmtime GitHub notifications bot (Feb 21 2026 at 16:37):

Gentle requested fitzgen for a review on PR #12633.

Wasmtime GitHub notifications bot (Feb 21 2026 at 16:37):

Gentle requested wasmtime-core-reviewers for a review on PR #12633.

Wasmtime GitHub notifications bot (Feb 21 2026 at 16:52):

Gentle updated PR #12633.

Wasmtime GitHub notifications bot (Feb 21 2026 at 20:48):

github-actions[bot] added the label wizer on PR #12633.

Wasmtime GitHub notifications bot (Feb 21 2026 at 20:49):

github-actions[bot] commented on PR #12633:

Subscribe to Label Action

cc @fitzgen

<details>
This issue or pull request has been labeled: "wizer"

Thus the following users have been cc'd because of the following labels:

fitzgen: wizer

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Feb 21 2026 at 21:32):

bjorn3 commented on PR #12633:

parse_instrumented skips validating the wasm module, the logic behind this is that if the wasm file was successfully instrumented before, then it must be valid.

Is that safe to do when the wasm module comes from an untrusted source?

Wasmtime GitHub notifications bot (Feb 22 2026 at 03:35):

Gentle commented on PR #12633:

well no, at that point you are trusting that whoever added the __wizer_* exports did not lie and did validate the wasm before adding them

it's possible that my microbenchmarks were bad, but with a 20MB wasm file, validate, parse, then snapshot was 80% slower than just parse and snapshot. It doesn't break anything to add self.wasm_validate() in parse_instrumented, I just think it's redundant but I'd be happy to implement it differently if there are concerns

Wasmtime GitHub notifications bot (Feb 22 2026 at 03:35):

Gentle edited a comment on PR #12633:

well no, at that point you are trusting that whoever added the __wizer_* exports did not lie and did validate the wasm before adding them

it's possible that my microbenchmarks were bad, but with a 20MB wasm file, validate, parse, then snapshot was 80% slower than just parse and snapshot. It doesn't break anything to add self.wasm_validate() in parse_instrumented, I just think it's redundant but I'd be happy to implement it differently if there are concerns

Wasmtime GitHub notifications bot (Feb 22 2026 at 03:44):

Gentle edited a comment on PR #12633:

well no, at that point you are trusting that whoever added the __wizer_* exports did not lie and did validate the wasm before adding them

it's possible that my microbenchmarks were bad, but with a 20MB wasm file, validate, parse, then snapshot took roughly 6 times as long as just parse and snapshot. It doesn't break anything to add self.wasm_validate() in parse_instrumented, I just think it's redundant but I'd be happy to implement it differently if there are concerns

Wasmtime GitHub notifications bot (Feb 22 2026 at 17:26):

tschneidereit commented on PR #12633:

well no, at that point you are trusting that whoever added the __wizer_* exports did not lie and did validate the wasm before adding them

Validation is there in part for security reasons: the following steps can then assume they're handling valid wasm content. If the validation step is skipped, that's not a safe assumption anymore, so any code depending on it might do wrong things—potentially in ways that an attacker can exploit.

If you think about it, if we trusted the author of a module with these exports to create a valid module, why wouldn't we do that for all content in general?

with a 20MB wasm file, validate, parse, then snapshot took roughly 6 times as long as just parse and snapshot.

6x overhead for validation seems like a lot. Would you be able to share the module in question, by any chance?

Wasmtime GitHub notifications bot (Feb 23 2026 at 00:29):

Gentle updated PR #12633.

Wasmtime GitHub notifications bot (Feb 23 2026 at 00:34):

Gentle commented on PR #12633:

I'm sorry for the confusion, my test setup was indeed flawed, in a clean reproduction I can see that validate takes reasonable time even with large code sections. I added it back to parse_instrumented

(there was an issue in the allocator of my runtime, so I actually found a bug in the way I use wasmtime in my code, but that was entirely unrelated to wizer)

Wasmtime GitHub notifications bot (Feb 23 2026 at 00:34):

Gentle edited PR #12633:

Rationale: If you keep the instrumentation information inside the output wasm file, then you can at any time use wizer to snapshot it again, allowing you to effectively save and resume later or on another machine

adds a keep_instrumentation option that preserves __wizer_* exports in the output module

adds a parse_instrumented method that re-parses a previously snapshotted module so it can be snapshotted again.

parser::parse has been adjusted to either reject or require existing __wizer_* exports depending on new bool arg

Wasmtime GitHub notifications bot (Feb 23 2026 at 22:52):

alexcrichton commented on PR #12633:

Could you detail a bit more your use case here? I think it should work today to wizen a module/component twice, e.g. running one export on one machine and another export on another machine. The only downside I can think of to doing this is that the instrumentation takes a small amount of time to generate, but I would expect that to be negligible compared to compilation as a whole.

So, for more info, could you describe if performance is a primary concern here? Or if wizening a module twice is or isn't appropriate? Or if you're doing something else with the instrumented wizer artifact?

Wasmtime GitHub notifications bot (Feb 23 2026 at 23:58):

Gentle commented on PR #12633:

I use this for basically durable execution and stateful modules

I have an RPC system and there are complex tasks that need processing. I always keep instrumentation info in the modules. When starting a task, the fresh module is instantiated, told to load the input task it should process, then it runs the task, possibly sending out RPC requests. The guest runs eagerly either until it finishes the task or until it only has pending requests that await responses.

In the case of nothing left to do but not finished yet, I take the current state in memory and the last snapshot I made and use these to snapshot the long-running instance, then store that file and shut down the worker. When the RPC responses arrive, I wake the worker up, resolve the response, snapshot the module again and save it until the next response arrives. This can loop until the task is finished even if some outgoing requests may take days or require human intervention

Wasmtime GitHub notifications bot (Feb 24 2026 at 00:14):

Gentle commented on PR #12633:

from previous attempts, the instrumentation info has to be in the wasm file when it is instantiated or it will fail to snapshot, I tried naively just running the regular wasm file, then running the file through instrument; snapshot to learn that this won't work

so at a minimum I need --keep-instrumentation and parser::parse needs to not reject files that were already instrumented to make this feature work, but skipping instrument when possible would be optimal

Wasmtime GitHub notifications bot (Feb 24 2026 at 00:32):

Gentle edited a comment on PR #12633:

from previous attempts, the instrumentation info has to be in the wasm file when it is instantiated or it will fail to snapshot, I tried naively just running the regular wasm file, then running the file through instrument; snapshot to learn that this won't work

so at a minimum I need --keep-instrumentation and parser::parse needs to not reject files that were already instrumented to make this feature work, but skipping instrument when possible would be optimal

Edit: to clarify, my use case is

instantiate

at some later point load wizer only if needed

snapshot

in that order. This is because I am actually running the wasm as V8 WebAssembly.Instance and I use wizer compiled to wasm, so effectively I can dynamically load wizer only if required, but otherwise run the wasm file regularly. But for that to work the file has to always include instrumentation exports

Wasmtime GitHub notifications bot (Feb 24 2026 at 00:34):

Gentle edited a comment on PR #12633:

from previous attempts, the instrumentation info has to be in the wasm file when it is instantiated or it will fail to snapshot, I tried naively just running the regular wasm file, then running the file through instrument; snapshot to learn that this won't work

so at a minimum I need --keep-instrumentation and parser::parse needs to not reject files that were already instrumented to make this feature work, but skipping instrument when possible would be optimal

Edit: to clarify, my use case is

instantiate

at some later point load wizer only if needed

snapshot if not finished

in that order. This is because I am actually running the wasm as V8 WebAssembly.Instance and I use wizer compiled to wasm, so effectively I can dynamically load wizer only if required, but otherwise run the wasm file regularly. But for that to work the file has to always include instrumentation exports

Wasmtime GitHub notifications bot (Feb 24 2026 at 15:38):

alexcrichton commented on PR #12633:

I think what I'm confused about is it sounds like your use case is satisfied today with the API of wasmtime-wizer, so I'm not sure why this PR is needed. For example:

When you load a module, you call Wizer::instrument to get a context + wasm

You execute the wasm provided with RPCs and such

Eventually it's decided the wasm should be serialized, so Wizer::snapshot is used

The result of Wizer::snapshot is saved

Eventually the process is repeated when the module is reloaded.

Am I missing something though? Would this flow not work for your use case?

Wasmtime GitHub notifications bot (Feb 24 2026 at 17:28):

Gentle commented on PR #12633:

for that approach I would need to somehow save ModuleContext to a file that can be loaded again on demand

what's currently possible:

wizer always instruments and we need to keep ctx alive

if the running wasm wants to suspend, ctx and the instance are given to Wizer::snapshot

what I want to do/am doing with the branch:

instantiate the module

if the module wants to suspend:

instantiate wizer.wasm

snapshot the module using wizer.wasm

throw away wizer instance, freeing memory

Wasmtime GitHub notifications bot (Feb 24 2026 at 17:29):

Gentle edited a comment on PR #12633:

for that approach I would need to somehow save ModuleContext to a file that can be loaded again on demand

what's currently possible:

wizer always instruments and we need to keep ctx alive

if the running wasm wants to suspend, ctx and the instance are given to Wizer::snapshot

what I want to do/am doing with the branch:

instantiate the module

if the module wants to suspend:

instantiate wizer.wasm

snapshot the module using wizer.wasm

throw away wizer instance, freeing memory

and I am using wizer compiled to wasm so that this can be run on cloudflare and browsers

Wasmtime GitHub notifications bot (Feb 24 2026 at 22:09):

alexcrichton commented on PR #12633:

Oh I was imagining that you'd just save ModuleContext in-process. You'd assume that any module might suspend so it's pre-instrumented with a ModuleContext on the side. If the module isn't suspended then the instrumentation isn't really any overhead, and if it's suspended the context is available to know how to suspend it.

Would that solve the need of needing to put ModuleContext into a file?

Wasmtime GitHub notifications bot (Feb 24 2026 at 23:15):

Gentle commented on PR #12633:

yeah if I do "instantiate wizer, instrument, run my instrumented instance, maybe snapshot" it's pretty wasteful since wizer.wasm keeps all the memory it needed during instrumentation, I would much rather only instantiate wizer if I actually want to snapshot and not sooner

Wasmtime GitHub notifications bot (Feb 24 2026 at 23:43):

Gentle edited a comment on PR #12633:

yeah if I do "instantiate wizer, instrument, run my instrumented instance, maybe snapshot" it's pretty wasteful since wizer.wasm keeps all the memory it needed during instrumentation, I would much rather only instantiate wizer if I actually want to snapshot and not sooner

I would absolutely do exactly what you said if I was using wasmtime and wizer, but I'm using a standalone wizer together with JS WebAssembly.Instance

Wasmtime GitHub notifications bot (Feb 25 2026 at 15:55):

alexcrichton commented on PR #12633:

Is this perhaps a case where making ModuleSnapshot serializable would help? That would mean you could throw away the Wizer instance I believe, only retaining the snapshot on the side?

Wasmtime GitHub notifications bot (Feb 25 2026 at 16:09):

Gentle commented on PR #12633:

do you mean ModuleContext?

If I could serialize ModuleContext to a file then I could drop wizer and dynamically import it again when I need it.

Right now, ModuleContext is <'a>, keeping the input bytes &'a [u8] alive, a version of ModuleContext that only holds what it actually needs would probably also be more efficient than re-parsing the whole wasm before snapshotting :)

Wasmtime GitHub notifications bot (Feb 25 2026 at 20:09):

Gentle commented on PR #12633:

some investigations and dirty code later, I realized that, since we need the raw sections to build the new wasm module, the ModuleContext is always roughly the same size as the wasm, I still think a public method that allows you to parse instrumented wasm bytes into a ModuleContext is the most straightforward way to allow lazy loading wizer only when snapshotting

Wasmtime GitHub notifications bot (Feb 26 2026 at 15:35):

alexcrichton commented on PR #12633:

Ah yes sorry I meant ModuleContext, and actually yeah that's a good point. You could keep the un-instrumented wasm bytes around too and create a ModuleContext from that during snapshotting since it's pretty cheap and it's also idempotent.

Instead of creating ModuleContext from an instrumented module, could the original module be preserved and ModuleContext re-created?

Wasmtime GitHub notifications bot (Feb 26 2026 at 16:01):

Gentle commented on PR #12633:

if you run the uninstrumented wasm and then later come with after-the-fact instrumented wasm then the running instance has no __wizer_ globals so snapshotting fails

Wasmtime GitHub notifications bot (Feb 26 2026 at 16:02):

Gentle edited a comment on PR #12633:

if you run the uninstrumented wasm and then later come with after-the-fact instrumented wasm then the running instance has no __wizer_ globals so snapshotting fails

Wasmtime GitHub notifications bot (Feb 26 2026 at 21:41):

alexcrichton commented on PR #12633:

Indeed, yes, what I meant was:

Given a wasm W, you produce and run the instrumented version I(W)

Eventually you wan to snapshot, so you re-instrument W to produce a ModuleContext.

This ModuleContext is used to snapshot the live instance of I(W).

This new wasm W' is what you save to disk and rinse/repeat with later.

Wasmtime GitHub notifications bot (Feb 27 2026 at 18:35):

Gentle commented on PR #12633:

that does work but still requires wizer at instantiation and again at snapshotting, but my changes are indeed just an optimization for the workflow you described

I understand how currently, __wizer_ exports are an internal side-effect and can evolve freely in the code, while my change would make this an official API. I would also be hesitant to accept a change like this.

The thought of "function F produces outputs A and B, we run F twice, first discarding A, later discarding B" feels unneccessary but is an understandable restriction if you want to keep the instrumentation opaque

is there another reason why my approach is a bad idea?

Wasmtime GitHub notifications bot (Feb 27 2026 at 20:32):

alexcrichton commented on PR #12633:

My main personal motivation is reducing the maintenance burden of wizer. I feel like its steps today are primitive enough that it's understandable to maintainers and users alike of how everything is combined, and additions like this at a high level inevitably increase the possible state space of what needs to be managed and maintained. If the preexisting primitives work well enough for your use case I'd say that would ideally be the way to go to avoid increasing maintainership burden, but if the primitives don't work for your use case (e.g. for performance reasons) that's a different conversation.

Wasmtime GitHub notifications bot (Mar 03 2026 at 16:41):

Gentle commented on PR #12633:

thank you for the fruitful discussion. I'll explore possible optimizations in a different branch, as you say this PR as formulated is invalid since it only adds performance optimizations, not a new feature, so it should probably be closed and I might come back if the differences are big enough to warrant the api changes

Wasmtime GitHub notifications bot (Mar 03 2026 at 20:14):

alexcrichton closed without merge PR #12633.

Wasmtime GitHub notifications bot (Mar 03 2026 at 20:14):

alexcrichton commented on PR #12633:

Ok sounds reasonable to me. And yeah if the performance is a problem and/or limiting factor definitely feel free to resubmit this or reopen this, I think it would be quite reasonable to explore changes from that angle.

Last updated: Jun 01 2026 at 09:49 UTC