wasmtime on Rust/wasm · general · Zulip Chat Archive

Is there a timeline for being able to use wasmtime in a Rust/wasm32-unknown-unknown project?

The use case is as follows:
(1) I am trying to use wasm as a JIT for generating code that operates on float32* on the fly (think signal processing type work)
(2) I want this app to run both on desktop (appears fine for now) and in browser (does not appear to be able to build wasmtime on arch=wasm32-unknown-unknown)

I don't need the full power of cranelift. I am just looking for a "simple" wasm API that works on both desktop & in browser, where by simple I mean:
(*) create a module (from either *.wat or *.wasm)
(*) allocate memory
(*) call functions
(*) write to / read from wasm-memory from Rust

Chris Fallin (Sep 17 2020 at 22:45):

Can you say a bit more about the use-case of using wasmtime-on-wasm to run a wasm module vs. simply running the wasm module itself on that Wasm platform?

Chris Fallin (Sep 17 2020 at 22:48):

The reason I ask is that at least the JIT portion of Wasmtime becomes sort of a no-op if we were to host Wasmtime on a Wasm platform (with some hypothetical future ability to load and run dynamically generated code; one could polyfill this with JS today). The WASI API implementation still has value, I suppose, but most of Wasmtime's work would simply be to pass through your Wasm module to the underlying Wasm platform, I would think.

Chris Fallin (Sep 17 2020 at 22:49):

That said, perhaps there's something I'm missing here; or perhaps the right answer is an API layer at the top level that can delegate either to wasmtime (on native) or to a "call into JS to load another Wasm module" shim when running in a browser?

Môshe van der Sterre (Sep 18 2020 at 03:25):

@zeroexcuses I recently asked a simmilar question about interpreter support in wasmtime, but your question seems different in that you apparently need the performance that JIT provides. JIT is not supported on wasm32-unknown-unknown and requires the hypothetical future ability that Chris already mentioned, because compiled code is immutable and not observable at runtime: https://webassembly.org/docs/security/

If you need the performance of dynamic code generation, I assume that you also need the performance of JIT all the way to the host instruction set. If not, directly interpreting something optimised for your use-case probably has better performance than compiling to wasm first, and interpreting the result. Do you agree, or do you think that wasmtime interpreter support could help your use-case?

zeroexcuses (Sep 18 2020 at 04:35):

In reading your responses, I can definitely see how my original question was ambigiolus. I'm now going to take a few steps back and try to motivate the problem.

Suppose we are working on an APL / J / K interpreter or something num-py like, where we perform arithmetric operations over scalars and vectors. The user types in a mathematical expression at runtime and we evaluate it. For this example, suppose X and Y are vectors of length n.

1.0 + 2.0 * sqrt(X + Y) + 3.0 * sqrt(X + 2 * Y) + 4.0 * sqrt(X + 3 * Y).

Evaluating this is going to be memory bound. A dumb / naive approach would have us (1) traversing X & Y multiple times (2) constructing intermediate arrays of length n and (3) traversing those intermediate arrays.

for i = 0 .. n {
  let x = X[i];
  let y = Y[i];
  out[i] = 1.0 + 2.0 * sqrt(x + y) + 3.0 * sqrt(x + 2 * y) + 4.0 * sqrt(x + 3.0 * y);
}

To have this work both on desktop & in browser, I am considering doing this in wasm.

In particular, I will be writing a function fn compile(expr: MathExpr) -> WebAssemblyText
which I then want to evaluate on some mut *f32 's.

Perhaps wasmtime is overkill for this. What I am looking for is a common (both desktop & browser) API where I can:

(1) define a new webassembly 'vm'
(2) allocate some space on this vm (all the f32's will be stored in wasm memory space)
(3) execute arbitrary dynamically generated WebAsemblyText, passing the *mut f32 of (2) as arguments
(4) read/write to the *mut f32 in webassembly 'vm'

I believe that https://github.com/paritytech/wasmi satisfies all the above requirements. However, I suspect wasmtime (being more 'native' compared to wasmi being an 'interpreter) is faster -- and performance does matter to me.

If there is a way to do this in wasmtime -- great! If not, I would appreciate pointers to solving this problem.

paritytech/wasmi

Wasm interpreter in Rust. Contribute to paritytech/wasmi development by creating an account on GitHub.

zeroexcuses (Sep 18 2020 at 04:42):

Having written the above, I do admit it is a bit weird that the original problem of:

"I want large blocks of dynamically allocated *f32's, and to execute dynamically generated assembly code on them"

Chris Fallin (Sep 18 2020 at 04:46):

This is getting more into your problem domain rather than wasm, but it would be interesting to at least have baseline data for "small tight interpreter loop that runs arithmetic ops on f32s". It's probably ... 10x?... 100x?... slower than native SIMD-vectorized high-performance math, but that's something you could do on wasm today

Chris Fallin (Sep 18 2020 at 04:47):

Alternately, coarser-grained kernels that are compiled (Wasm or even Wasm/SIMD when that's widely available) and invoked from your expression bytecode. I guess that's the numpy approach, basically. If you need arbitrary expressions per loop iter then maybe not so much though.

zeroexcuses (Sep 18 2020 at 04:56):

I care about the performanc of Desktop/Cuda (fastest on Desktop) and Brwoser/Wasm (fastest in browser).
I don't care about the performance of Desktop/Wasm. It exists mainly so that I can check Desktop/Cuda ==up-to-floating-point-numerical-errors== Desktop/Wasm, and just assume that Desktop/Wasm == Browser/Wasm.

Chris Fallin (Sep 18 2020 at 04:58):

Oh, GPUs, right... also out of Wasm's domain, but is GPU-based compute using shaders with WebGL an option?

Chris Fallin (Sep 18 2020 at 04:59):

Otherwise I would say that dynamically generating a Wasm module is probably your best bet. If you know you're in Wasm in a browser you can bounce out to JS to call WebAssembly.compile; doesn't help your desktop case directly though, without some more work

zeroexcuses (Sep 18 2020 at 05:00):

Shaders w/ WebGL is definitely in theory possible. However, I a control Desktop/Server, so I know I can do Cuda there. On the other hand, Browser runs on user's device and afaik, I can't rely on more than just Wasm.

and preferably the same API, so I don't have #cfg's everywhere and can have faith that Desktop/wasm codepath == Browser/wasm codepath.

zeroexcuses (Sep 18 2020 at 05:02):

Now that we have refined the question to this point -- this actually throws out most of the advantage that wasmtime provides?

Chris Fallin (Sep 18 2020 at 05:04):

Well, I would say that wasmtime-compiled-to-wasm is out, at least; and for near-zero overhead in the browser, you'll want to go directly to the native wasm compiler. Then probably what you'd want to do is paper over this with an abstraction that has two backends, the other of which plugs into whatever desktop-Wasm engine you're using via some support code outside the wasm module (missed this above; is it wasmtime also, or Node, or...?).

zeroexcuses (Sep 18 2020 at 05:12):

then write a very thin abstraction layer myself over the two abstracting out: creating vm, malloc/free, read/write *f32, executing arbitrary WAT ?

Stream: general

Topic: wasmtime on Rust/wasm

zeroexcuses (Sep 17 2020 at 22:42):