I've benchmarked a a very simple C++ VM (branch, compare & addition instructions) against a wasmtime's cranelift-generated function that simply adds two numbers and returns the result. The benchmark results are the cranelift function is substantially slower than this very simple VM. I am expecting that cranelift generates a callable machine-code function directly; if that were the case, it would be faster than the VM; but it's not. Am I missing something?
The function is called as a predicate to some filtering operation, is there a big cost to calling wasmtime generated functions? Would it be faster to loop inside wasm?
Are you benchmarking the overhead of calling a wasm function from Wasmtime's C/C++ API? Or the Rust API?
If you're looking to benchmark the generated code then I'd recommend a loop yeah, you'll want to stay in the vm. Otherwise you're mostly benchmarking the vm enter/exit stuff which doesn't have much to do with Cranelift and has more to do with Wasmtime's embedding of cranelift (which we of course still want to be fast too)
From C/C++. Alright so there's no way to go from wasm -> machine-code callable function
right? It absolutely needs to go through the enter/exit code?
we actually just landed https://github.com/bytecodealliance/wasmtime/pull/3350 and the corresponding https://github.com/bytecodealliance/wasmtime-cpp/pull/17 is the C++ bindings, and those should be much lower overhead than the existing versions
there's no way to go directly to the machine code because that's a private implementation detail of Wasmtime and we need to do other things like set up stack overflow checks and such
Cool stuff! Do you think there are technical limitations that would prevent compiling wasm directly to callable machine code eventually? Looking into various jit options, was hoping to find some portable assembly so I don't need to do x86/arm codegen myself :|
we do have that effectively as-is, if you run wasmtime compile
it'll generate something you can feed to objdump
and look at the disassembly. These functions are callable directly but not in a way that's really feasible from general applications, if you did that then that would break how we implement traps, interrupts, stack overflow, etc. Implementing all of wasm's semantics requires careful planning, especially if this is all suppose to be stable API-wise
We try to minimize these overheads as much as possible though and look for ways to improve them over time, and theoretically a call
instruction is all that's necessary, but we'd have to figure out other ways to implement things like traps and such
Ok ok. Well then, I'll test those unchecked variants see if they're good enough. Thanks for the assistance
@Rom Grk if the alternative you're considering is direct machine-code generation, it's worth considering whether Cranelift itself (without the Wasmtime runtime) would still help you out; then you could generate the IR (CLIF) and compile a function that you truly could call directly, as Cranelift knows about e.g. the normal System V calling convention
Oh sure, very interesting option. I'll look into that!
the advantage that the Wasmtime runtime gives you is that it provides all of the Wasm-specific concepts, like heap memories, tables, imports, and the like; but if you're considering doing direct codegen instead, it sounds like you're not too attached to using Wasm specifically
(Cranelift is used in this way by e.g. cg_clif, a rust compiler backend, so it's semi-regularly-tested to be able to interact with native code in a direct way)
No indeed I really just need a portable assembly-like target that can be compiled to x86 & arm. Will look into that :]
Last updated: Jan 24 2025 at 00:11 UTC