Is there a timeline for being able to use wasmtime in a Rust/wasm32-unknown-unknown project?
The use case is as follows:
(1) I am trying to use wasm as a JIT for generating code that operates on float32* on the fly (think signal processing type work)
(2) I want this app to run both on desktop (appears fine for now) and in browser (does not appear to be able to build wasmtime on arch=wasm32-unknown-unknown)
I don't need the full power of cranelift. I am just looking for a "simple" wasm API that works on both desktop & in browser, where by simple I mean:
(*) create a module (from either *.wat or *.wasm)
(*) allocate memory
(*) call functions
(*) write to / read from wasm-memory from Rust
Can you say a bit more about the use-case of using wasmtime-on-wasm to run a wasm module vs. simply running the wasm module itself on that Wasm platform?
The reason I ask is that at least the JIT portion of Wasmtime becomes sort of a no-op if we were to host Wasmtime on a Wasm platform (with some hypothetical future ability to load and run dynamically generated code; one could polyfill this with JS today). The WASI API implementation still has value, I suppose, but most of Wasmtime's work would simply be to pass through your Wasm module to the underlying Wasm platform, I would think.
That said, perhaps there's something I'm missing here; or perhaps the right answer is an API layer at the top level that can delegate either to wasmtime (on native) or to a "call into JS to load another Wasm module" shim when running in a browser?
@zeroexcuses I recently asked a simmilar question about interpreter support in wasmtime, but your question seems different in that you apparently need the performance that JIT provides. JIT is not supported on wasm32-unknown-unknown and requires the hypothetical future ability that Chris already mentioned, because compiled code is immutable and not observable at runtime: https://webassembly.org/docs/security/
If you need the performance of dynamic code generation, I assume that you also need the performance of JIT all the way to the host instruction set. If not, directly interpreting something optimised for your use-case probably has better performance than compiling to wasm first, and interpreting the result. Do you agree, or do you think that wasmtime interpreter support could help your use-case?
@Chris Fallin @Morteza Milani
In reading your responses, I can definitely see how my original question was ambigiolus. I'm now going to take a few steps back and try to motivate the problem.
*** back story
Suppose we are working on an APL / J / K interpreter or something num-py like, where we perform arithmetric operations over scalars and vectors. The user types in a mathematical expression at runtime and we evaluate it. For this example, suppose X and Y are vectors of length n.
Consider the following expression:
1.0 + 2.0 * sqrt(X + Y) + 3.0 * sqrt(X + 2 * Y) + 4.0 * sqrt(X + 3 * Y).
Evaluating this is going to be memory bound. A dumb / naive approach would have us (1) traversing X & Y multiple times (2) constructing intermediate arrays of length n and (3) traversing those intermediate arrays.
A better approach would be to compile this to:
for i = 0 .. n {
let x = X[i];
let y = Y[i];
out[i] = 1.0 + 2.0 * sqrt(x + y) + 3.0 * sqrt(x + 2 * y) + 4.0 * sqrt(x + 3.0 * y);
}
Now, to do this, we need to do dynamic code generation.
To have this work both on desktop & in browser, I am considering doing this in wasm.
In particular, I will be writing a function fn compile(expr: MathExpr) -> WebAssemblyText
which I then want to evaluate on some mut *f32 's
.
*** back to original question
Perhaps wasmtime is overkill for this. What I am looking for is a common (both desktop & browser) API where I can:
(1) define a new webassembly 'vm'
(2) allocate some space on this vm (all the f32's will be stored in wasm memory space)
(3) execute arbitrary dynamically generated WebAsemblyText, passing the *mut f32 of (2) as arguments
(4) read/write to the *mut f32 in webassembly 'vm'
I believe that https://github.com/paritytech/wasmi satisfies all the above requirements. However, I suspect wasmtime (being more 'native' compared to wasmi being an 'interpreter) is faster -- and performance does matter to me.
If there is a way to do this in wasmtime -- great! If not, I would appreciate pointers to solving this problem.
Having written the above, I do admit it is a bit weird that the original problem of:
"I want large blocks of dynamically allocated *f32's, and to execute dynamically generated assembly code on them"
has somehow translated into
"Is there some simple API that lets me execute wasm/desktop and wasm/wasm"
This is getting more into your problem domain rather than wasm, but it would be interesting to at least have baseline data for "small tight interpreter loop that runs arithmetic ops on f32s". It's probably ... 10x?... 100x?... slower than native SIMD-vectorized high-performance math, but that's something you could do on wasm today
Alternately, coarser-grained kernels that are compiled (Wasm or even Wasm/SIMD when that's widely available) and invoked from your expression bytecode. I guess that's the numpy approach, basically. If you need arbitrary expressions per loop iter then maybe not so much though.
Ah, there is one more thing I was not clear on. We have:
MathExpr -> Desktop/Cuda
MathExpr -> Desktop/Wasm
MathExpr -> Browser/Wasm
I care about the performanc of Desktop/Cuda (fastest on Desktop) and Brwoser/Wasm (fastest in browser).
I don't care about the performance of Desktop/Wasm. It exists mainly so that I can check Desktop/Cuda ==up-to-floating-point-numerical-errors== Desktop/Wasm, and just assume that Desktop/Wasm == Browser/Wasm.
Oh, GPUs, right... also out of Wasm's domain, but is GPU-based compute using shaders with WebGL an option?
Otherwise I would say that dynamically generating a Wasm module is probably your best bet. If you know you're in Wasm in a browser you can bounce out to JS to call WebAssembly.compile
; doesn't help your desktop case directly though, without some more work
Shaders w/ WebGL is definitely in theory possible. However, I a control Desktop/Server, so I know I can do Cuda there. On the other hand, Browser runs on user's device and afaik, I can't rely on more than just Wasm.
I guess refining the original question a bit more, what I need is:
and preferably the same API, so I don't have #cfg's everywhere and can have faith that Desktop/wasm codepath == Browser/wasm codepath.
Now that we have refined the question to this point -- this actually throws out most of the advantage that wasmtime provides?
Well, I would say that wasmtime-compiled-to-wasm is out, at least; and for near-zero overhead in the browser, you'll want to go directly to the native wasm compiler. Then probably what you'd want to do is paper over this with an abstraction that has two backends, the other of which plugs into whatever desktop-Wasm engine you're using via some support code outside the wasm module (missed this above; is it wasmtime also, or Node, or...?).
Afaik, I'm not using Node on Rust/x86. Are you suggesting:
browser: https://rustwasm.github.io/wasm-bindgen/examples/wasm-in-wasm.html
desktop: wasmi or wasmtime
then write a very thin abstraction layer myself over the two abstracting out: creating vm, malloc/free, read/write *f32, executing arbitrary WAT ?
Last updated: Dec 23 2024 at 12:05 UTC