wasmtime cranelift speed · general

I've benchmarked a a very simple C++ VM (branch, compare & addition instructions) against a wasmtime's cranelift-generated function that simply adds two numbers and returns the result. The benchmark results are the cranelift function is substantially slower than this very simple VM. I am expecting that cranelift generates a callable machine-code function directly; if that were the case, it would be faster than the VM; but it's not. Am I missing something?
The function is called as a predicate to some filtering operation, is there a big cost to calling wasmtime generated functions? Would it be faster to loop inside wasm?

Alex Crichton (Sep 24 2021 at 20:50):

Are you benchmarking the overhead of calling a wasm function from Wasmtime's C/C++ API? Or the Rust API?

Alex Crichton (Sep 24 2021 at 20:52):

If you're looking to benchmark the generated code then I'd recommend a loop yeah, you'll want to stay in the vm. Otherwise you're mostly benchmarking the vm enter/exit stuff which doesn't have much to do with Cranelift and has more to do with Wasmtime's embedding of cranelift (which we of course still want to be fast too)

Rom Grk (Sep 24 2021 at 20:54):

From C/C++. Alright so there's no way to go from wasm -> machine-code callable function right? It absolutely needs to go through the enter/exit code?

Alex Crichton (Sep 24 2021 at 20:55):

Add `*_unchecked` variants of `Func` APIs for the C API by alexcrichton · Pull Request #3350 · bytecodealliance/wasmtime

This commit is what is hopefully going to be my last installment within the saga of optimizing function calls in/out of WebAssembly modules in the C API. This is yet another alternative approach to...

Bind the new `*_unchecked` function APIs by alexcrichton · Pull Request #17 · bytecodealliance/wasmtime-cpp

This commit binds two new APIs added to wasmtime's C API recently: wasmtime_func_new_unchecked wamstime_func_call_unchecked These two functions are a more accelerated path of invoking a WebAs...

Alex Crichton (Sep 24 2021 at 20:55):

there's no way to go directly to the machine code because that's a private implementation detail of Wasmtime and we need to do other things like set up stack overflow checks and such

Rom Grk (Sep 24 2021 at 20:59):

Cool stuff! Do you think there are technical limitations that would prevent compiling wasm directly to callable machine code eventually? Looking into various jit options, was hoping to find some portable assembly so I don't need to do x86/arm codegen myself :|

Alex Crichton (Sep 24 2021 at 21:01):

we do have that effectively as-is, if you run wasmtime compile it'll generate something you can feed to objdump and look at the disassembly. These functions are callable directly but not in a way that's really feasible from general applications, if you did that then that would break how we implement traps, interrupts, stack overflow, etc. Implementing all of wasm's semantics requires careful planning, especially if this is all suppose to be stable API-wise

Alex Crichton (Sep 24 2021 at 21:02):

We try to minimize these overheads as much as possible though and look for ways to improve them over time, and theoretically a call instruction is all that's necessary, but we'd have to figure out other ways to implement things like traps and such

Rom Grk (Sep 24 2021 at 21:04):

Ok ok. Well then, I'll test those unchecked variants see if they're good enough. Thanks for the assistance

Chris Fallin (Sep 24 2021 at 21:07):

@Rom Grk if the alternative you're considering is direct machine-code generation, it's worth considering whether Cranelift itself (without the Wasmtime runtime) would still help you out; then you could generate the IR (CLIF) and compile a function that you truly could call directly, as Cranelift knows about e.g. the normal System V calling convention

Rom Grk (Sep 24 2021 at 21:07):

Chris Fallin (Sep 24 2021 at 21:08):

the advantage that the Wasmtime runtime gives you is that it provides all of the Wasm-specific concepts, like heap memories, tables, imports, and the like; but if you're considering doing direct codegen instead, it sounds like you're not too attached to using Wasm specifically

Chris Fallin (Sep 24 2021 at 21:09):

(Cranelift is used in this way by e.g. cg_clif, a rust compiler backend, so it's semi-regularly-tested to be able to interact with native code in a direct way)

Rom Grk (Sep 24 2021 at 21:09):

No indeed I really just need a portable assembly-like target that can be compiled to x86 & arm. Will look into that :]

Stream: general

Topic: wasmtime cranelift speed

Rom Grk (Sep 24 2021 at 20:45):

Alex Crichton (Sep 24 2021 at 20:50):

Alex Crichton (Sep 24 2021 at 20:52):

Rom Grk (Sep 24 2021 at 20:54):

Alex Crichton (Sep 24 2021 at 20:55):

Alex Crichton (Sep 24 2021 at 20:55):

Rom Grk (Sep 24 2021 at 20:59):

Alex Crichton (Sep 24 2021 at 21:01):

Alex Crichton (Sep 24 2021 at 21:02):

Rom Grk (Sep 24 2021 at 21:04):

Chris Fallin (Sep 24 2021 at 21:07):

Rom Grk (Sep 24 2021 at 21:07):

Chris Fallin (Sep 24 2021 at 21:08):

Chris Fallin (Sep 24 2021 at 21:09):

Rom Grk (Sep 24 2021 at 21:09):