I'm trying to understand the history behind Wasmtime's development and why it adopted Lucet's AOT model. I initially understood Wasmtime as a QEMU-style JIT but apparently it compiles all module functions in parallel at the start.
Some obvious reasons why compiling whole modules and using a static cache are:
I want to understand were there any other reasons? Were there any public benchmarks that informed the decision.
@Dan Gohman probably could answer best about early history and the actual path that decisions took, but a few additional considerations I can add are:
Agreed on all the above points. Another reason is that unlike in native code, Wasm code never has any ambiguity about where the function boundaries are or what the code alignment is, never has any ambiguity about code vs jump tables vs constant pools or other things, and never has any dynamically modified code. So unlike native code translators, it doesn't need any knowledge from the runtime state of the application to determine which code to compile.
Yage Hu said:
I'm trying to understand the history behind Wasmtime's development and why it adopted Lucet's AOT model.
I may be misremembering, but I am 98% sure that even before Wasmtime gained Lucet's features, it was still compiling all Wasm functions in a module eagerly, and it never waited until a function was called to compile the function.
Yes, that's right, it's been that way at least since late 2019 when I started working on it
Thanks for the info!
There is experimental support for something somewhat related, FWIW: an incremental cache that allows reusing compilation results for unchanged functions. I think it's been quite a while since anyone touched that though, so I'm not sure what the current state is
IIRC it was in production at least at Embark; we never turned it on by default because of vague concerns transferred from the Rust incremental compilation trail of bugs, but this infra is designed much differently (uses types and struct nesting to separate state so there can't be cache invalidation bugs, more or less by construction) and honestly I think we should consider turning it on by default (we even have a pretty thorough fuzz target for it)
A follow-up question about this compiling-whole-module strategy is: how does Wasmtime address the startup latency for the first time when a Wasm module is executed, especially when the module contains many functions? It seems this compiling strategy differs from those adopted in JavaScript engines like V8 and SpiderMonkey.
The short answer is that we don't: it's a different use-case. We aren't primarily targeting a Web-engine world where time to first frame / interaction / execution / ... is important; we're targeting use-cases where start-up time is ok (e.g., server-side modules that process requests can tolerate some latency in compiling and switching to a new version of a module, and then the module is resident while it processes many requests).
I'll note that Cranelift's compile time is still ~10x faster than LLVM's, so things aren't as bad as one naively might expect from a "compile the whole module" workflow. We also have Winch as an alternative backend that is a few times faster still
Thanks for the clarification. That makes sense. Another thing is that separating compilation from execution may also miss dynamic optimization opportunities, e.g., based on execution profiling, like tiered compilation/optimization in V8, allowing more optimized code to be generated.
That's true, but keep in mind that there is much more opportunity for that in JS (for example) than in Wasm -- no dynamic types, no virtual dispatch, no hairy language semantics with happy paths you'd want to optimize, etc. I've heard that V8 uses ICs for some parts of Wasm GC but MVP-ish Wasm can get away without them entirely. Profiling info could inform inlining but for single modules that's marginal as well. Wasm really is more like a conventional binary for a conventional ISA than it is like JavaScript (for these purposes at least!), and (absent far-out research like the DynamoRIO stuff) "JIT optimization for conventional binaries" is not too much of a thing
Said another way: tiering for Wasm in Web engines lets one trade off compile time and run-time by not optimizing some functions as much (or at all). We optimize all functions all the way, so we don't have any deficits in run-time from not tiering. We just take longer to compile.
Thanks very much for the explanation. It is great to learn the design rationale behind Wasmtime. Indeed, Wasmtime is a great runtime system, giving us unprecedented opportunities to explore new design choices and to rethink conventional strategies in different usage scenarios. Again, thank you!
Regarding start up latency, see:
Last updated: Mar 23 2026 at 16:19 UTC