Stream: git-wasmtime

Topic: wasmtime / issue #6517 Eventual API support for instantia...


view this post on Zulip Wasmtime GitHub notifications bot (Jun 04 2023 at 09:26):

dubrowgn opened issue #6517:

I'm currently writing infrastructure for a toy language where the design was to stream source from disk -> tokenizer -> parser -> compiler -> wasmtime, avoiding the need to hold entire various forms of the program in memory at any point. Unfortunately, I don't see API's in wasmtime compatible with this design (to be fair, this seems to be common among non-browser wasm runtimes).

I may have missed something, but it appears wasmtime expects all wasm source to be collected into memory before submission to the Module compiler. For example Module::from_file() seems to just spool the entire file into memory, and then give it to Module::new(). What I was hoping to find was something like instantiateStreaming or compileStreaming from the JS WebAssembly API, which takes a stream of wasm bytecode instead.

Is this kind of API something wasmtime might implement in the future, or is there a reason/rationale for not wanting to support it?

Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 06 2023 at 17:05):

jameysharp commented on issue #6517:

This is an interesting question, thanks for asking! I assume you want this for ahead-of-time compilation, where the generated machine code is written to disk, rather than just-in-time compilation where the entire program has to be in memory anyway before you can run it. Is that right?

You're right that Wasmtime does not currently have an interface for streaming in WebAssembly bytecode. As far as I know, we could implement that, in theory.

There are a couple of things we get by loading the entire module up-front. One is that we can compile all the functions in parallel, for a nice compile-time speedup on multi-core systems. Another is that we can put the compiled output in a more efficient order: for example, Wasmtime needs to generate extra trampoline functions in some cases, and clustering those together turns out to reduce total code size a little bit (#6302).

There are probably other assumptions I'm not aware of that are affected by this. For example, I don't know off-hand whether wasm-parser can provide all the information we need for both validation and compilation without parsing to the end of the module.

That said, the Cranelift compiler doesn't care whether all the functions are known when it starts compiling. I think that making Wasmtime feed functions to Cranelift incrementally would be a somewhat self-contained change, confined mostly to the existing parallel-compilation driver. I don't expect it to be particularly easy to figure out exactly where the rest of the changes need to happen though.

So, in short, I think this would be an interesting experiment! It's not something that I think existing Wasmtime contributors are likely to work on, but we could certainly discuss how to do it if you, or someone else, wants to give it a shot. We can get started discussing that in this issue, or in our Zulip chat (https://bytecodealliance.zulipchat.com/#narrow/stream/217126-wasmtime), whichever is more convenient.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 06 2023 at 17:18):

cfallin commented on issue #6517:

One design axis here that would be good to discuss is "early execution": can we start to run code in the module before all functions are even compiled? This is I assume one of the benefits one would want a streaming API for, even when JIT'ing.

The main technical challenges are "hot-patching", wherein we asynchronously fill in pointers to code as it is compiled, and finding a way to block when a called function hasn't been loaded/compiled yet (perhaps by initializing all calls to call into a libcall stub that async-yields). The patching could work either by indirecting through a function-pointer table (a PLT / "procedure linkage table" of sorts) or by patching code directly; the latter is a bit trickier on non-x86 platforms because of icache coherence requirements.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 10 2023 at 08:26):

dubrowgn commented on issue #6517:

I assume you want this for ahead-of-time compilation, where the generated machine code is written to disk, rather than just-in-time compilation where the entire program has to be in memory anyway before you can run it. Is that right?

My use case is a shell-like language, so I was primarily thinking about JIT. Rather than invent my own bytecode VM and/or JIT, I thought I would see if I could leverage an existing wasm runtime somehow.

Streaming has two potential benefits: 1) reduced memory consumption. You have to have the whole program in memory, as you way, but not streaming actually forces you to have 2 copies of the program in memory at some point. The first of the entire emitted byte code, and the second with the entire compiled byte code. This benefit is probably quite small in general though. 2) is reduced execution latency. For short lived programs you could imagine having different threads for the compilation and execution pipelines such that by the time the compilation pipeline is done streaming, the execution pipeline can be nearly done with the results. This seems to be more along the lines of what @cfallin was thinking about.

But, another use case for "early execution" is supporting REPL's. For a shell like language this is probably the most important aspect. If a user is typing and evaluating one expression at a time, you never have the whole module to compile.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 10 2023 at 08:28):

dubrowgn edited a comment on issue #6517:

I assume you want this for ahead-of-time compilation, where the generated machine code is written to disk, rather than just-in-time compilation where the entire program has to be in memory anyway before you can run it. Is that right?

My use case is a shell-like language, so I was primarily thinking about JIT. Rather than invent my own bytecode VM and/or JIT, I thought I would see if I could leverage an existing wasm runtime somehow.

Streaming has two potential benefits: 1) reduced memory consumption. You have to have the whole program in memory, as you say, but not streaming actually forces you to have 2 copies of the program in memory at some point. The first of the entire emitted byte code, and the second with the entire compiled byte code. This benefit is probably quite small in general though. 2) is reduced execution latency. For short lived programs you could imagine having different threads for the compilation and execution pipelines such that by the time the compilation pipeline is done streaming, the execution pipeline can be nearly done with the results. This seems to be more along the lines of what @cfallin was thinking about.

But, another use case for "early execution" is supporting REPL's. For a shell-like language this is probably the most important aspect. If a user is typing and evaluating one expression at a time, you never have the whole module to compile.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 10 2023 at 09:47):

bjorn3 commented on issue #6517:

But, another use case for "early execution" is supporting REPL's. For a shell-like language this is probably the most important aspect. If a user is typing and evaluating one expression at a time, you never have the whole module to compile.

Part of the information necessary to actually instantiate a module (the data section) comes after the code section and to add new functions you have to modify the function section which cones before the code section, so you have to finish the module even in a REPL. You can keep creating new modules and instantiate them on the same linear memory with later modules importing functions from earlier modules though.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 12 2023 at 17:37):

alexcrichton commented on issue #6517:

One thing to note perhaps is that instantiateStreaming I don't think is necessarily what we'd want to model here for @dubrowgn's use case. That API is intended for compiling and instantiating in a non-blocking way to get a Promise on the web, but it still requires that the entire module is resident in memory at one point basically. While it's compiled as it's downloaded the final artifact of a WebAssembly.Instance still points to all the compiled machine code ready to run.

Additionally the wasm format isn't really suitable to a "partial module" of sorts. For example you can't really dynamically add functions to a module later in a streaming fashion by simply appending more data. What might work for your use case, however, is to generate smaller wasm modules and then link the modules together via the embedding API.

This isn't to say that we couldn't add something like instantiateStreaming to Wasmtime, however. All the pieces are theoretically there in wasmparser and in Wasmtime more-or-less to manage all this state. I'm not sure that such an API would necessarily solve your use case though.


Last updated: Jan 24 2025 at 00:11 UTC