Stream: wasmtime

Topic: Why does Wasmtime compile entire modules


view this post on Zulip Yage Hu (Mar 02 2026 at 16:42):

I'm trying to understand the history behind Wasmtime's development and why it adopted Lucet's AOT model. I initially understood Wasmtime as a QEMU-style JIT but apparently it compiles all module functions in parallel at the start.

Some obvious reasons why compiling whole modules and using a static cache are:

I want to understand were there any other reasons? Were there any public benchmarks that informed the decision.

view this post on Zulip Chris Fallin (Mar 02 2026 at 16:48):

@Dan Gohman probably could answer best about early history and the actual path that decisions took, but a few additional considerations I can add are:

view this post on Zulip Dan Gohman (Mar 02 2026 at 18:22):

Agreed on all the above points. Another reason is that unlike in native code, Wasm code never has any ambiguity about where the function boundaries are or what the code alignment is, never has any ambiguity about code vs jump tables vs constant pools or other things, and never has any dynamically modified code. So unlike native code translators, it doesn't need any knowledge from the runtime state of the application to determine which code to compile.

view this post on Zulip fitzgen (he/him) (Mar 02 2026 at 21:40):

Yage Hu said:

I'm trying to understand the history behind Wasmtime's development and why it adopted Lucet's AOT model.

I may be misremembering, but I am 98% sure that even before Wasmtime gained Lucet's features, it was still compiling all Wasm functions in a module eagerly, and it never waited until a function was called to compile the function.

view this post on Zulip Chris Fallin (Mar 02 2026 at 22:35):

Yes, that's right, it's been that way at least since late 2019 when I started working on it

view this post on Zulip Yage Hu (Mar 03 2026 at 14:38):

Thanks for the info!

view this post on Zulip Till Schneidereit (Mar 03 2026 at 15:10):

There is experimental support for something somewhat related, FWIW: an incremental cache that allows reusing compilation results for unchanged functions. I think it's been quite a while since anyone touched that though, so I'm not sure what the current state is

view this post on Zulip Chris Fallin (Mar 03 2026 at 15:37):

IIRC it was in production at least at Embark; we never turned it on by default because of vague concerns transferred from the Rust incremental compilation trail of bugs, but this infra is designed much differently (uses types and struct nesting to separate state so there can't be cache invalidation bugs, more or less by construction) and honestly I think we should consider turning it on by default (we even have a pretty thorough fuzz target for it)

view this post on Zulip Wenwen Wang (Mar 03 2026 at 15:43):

A follow-up question about this compiling-whole-module strategy is: how does Wasmtime address the startup latency for the first time when a Wasm module is executed, especially when the module contains many functions? It seems this compiling strategy differs from those adopted in JavaScript engines like V8 and SpiderMonkey.

view this post on Zulip Chris Fallin (Mar 03 2026 at 15:58):

The short answer is that we don't: it's a different use-case. We aren't primarily targeting a Web-engine world where time to first frame / interaction / execution / ... is important; we're targeting use-cases where start-up time is ok (e.g., server-side modules that process requests can tolerate some latency in compiling and switching to a new version of a module, and then the module is resident while it processes many requests).

view this post on Zulip Chris Fallin (Mar 03 2026 at 15:59):

I'll note that Cranelift's compile time is still ~10x faster than LLVM's, so things aren't as bad as one naively might expect from a "compile the whole module" workflow. We also have Winch as an alternative backend that is a few times faster still

view this post on Zulip Wenwen Wang (Mar 03 2026 at 16:12):

Thanks for the clarification. That makes sense. Another thing is that separating compilation from execution may also miss dynamic optimization opportunities, e.g., based on execution profiling, like tiered compilation/optimization in V8, allowing more optimized code to be generated.

view this post on Zulip Chris Fallin (Mar 03 2026 at 16:36):

That's true, but keep in mind that there is much more opportunity for that in JS (for example) than in Wasm -- no dynamic types, no virtual dispatch, no hairy language semantics with happy paths you'd want to optimize, etc. I've heard that V8 uses ICs for some parts of Wasm GC but MVP-ish Wasm can get away without them entirely. Profiling info could inform inlining but for single modules that's marginal as well. Wasm really is more like a conventional binary for a conventional ISA than it is like JavaScript (for these purposes at least!), and (absent far-out research like the DynamoRIO stuff) "JIT optimization for conventional binaries" is not too much of a thing

view this post on Zulip Chris Fallin (Mar 03 2026 at 16:38):

Said another way: tiering for Wasm in Web engines lets one trade off compile time and run-time by not optimizing some functions as much (or at all). We optimize all functions all the way, so we don't have any deficits in run-time from not tiering. We just take longer to compile.

view this post on Zulip Wenwen Wang (Mar 03 2026 at 16:52):

Thanks very much for the explanation. It is great to learn the design rationale behind Wasmtime. Indeed, Wasmtime is a great runtime system, giving us unprecedented opportunities to explore new design choices and to rethink conventional strategies in different usage scenarios. Again, thank you!

view this post on Zulip fitzgen (he/him) (Mar 04 2026 at 05:07):

Regarding start up latency, see:


Last updated: Mar 23 2026 at 16:19 UTC