Robbepop added the bug label to Issue #9690.
Robbepop opened issue #9690:
In Wasmi's benchmark suite I have the following Wasm test case:
(module (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64) (if (i64.eqz (local.get $N)) (then (return (local.get $a)) ) ) (if (i64.eq (local.get $N) (i64.const 1)) (then (return (local.get $b)) ) ) (return_call $fib (i64.sub (local.get $N) (i64.const 1)) (local.get $b) (i64.add (local.get $a) (local.get $b)) ) ) (func (export "run") (param $N i64) (result i64) (return_call $fib (local.get $N) (i64.const 0) (i64.const 1)) ) )
It is a simple fibonacci routines based on Wasm's
call_return
tail calls.When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 8-12x slower than Wasmtime usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.
After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.
Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.
Robbepop edited issue #9690:
In Wasmi's benchmark suite I have the following Wasm test case:
cc @alexcrichton
(module (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64) (if (i64.eqz (local.get $N)) (then (return (local.get $a)) ) ) (if (i64.eq (local.get $N) (i64.const 1)) (then (return (local.get $b)) ) ) (return_call $fib (i64.sub (local.get $N) (i64.const 1)) (local.get $b) (i64.add (local.get $a) (local.get $b)) ) ) (func (export "run") (param $N i64) (result i64) (return_call $fib (local.get $N) (i64.const 0) (i64.const 1)) ) )
It is a simple fibonacci routines based on Wasm's
call_return
tail calls.When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 8-12x slower than Wasmtime usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.
After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.
Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.
Robbepop edited issue #9690:
In Wasmi's benchmark suite I have the following Wasm test case:
cc @alexcrichton
(module (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64) (if (i64.eqz (local.get $N)) (then (return (local.get $a)) ) ) (if (i64.eq (local.get $N) (i64.const 1)) (then (return (local.get $b)) ) ) (return_call $fib (i64.sub (local.get $N) (i64.const 1)) (local.get $b) (i64.add (local.get $a) (local.get $b)) ) ) (func (export "run") (param $N i64) (result i64) (return_call $fib (local.get $N) (i64.const 0) (i64.const 1)) ) )
It is a simple fibonacci routines based on Wasm's
call_return
tail calls.When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 10-15x slower than Wasmtime usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.
After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.
Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.
Robbepop edited issue #9690:
In Wasmi's benchmark suite I have the following Wasm test case:
cc @alexcrichton
(module (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64) (if (i64.eqz (local.get $N)) (then (return (local.get $a)) ) ) (if (i64.eq (local.get $N) (i64.const 1)) (then (return (local.get $b)) ) ) (return_call $fib (i64.sub (local.get $N) (i64.const 1)) (local.get $b) (i64.add (local.get $a) (local.get $b)) ) ) (func (export "run") (param $N i64) (result i64) (return_call $fib (local.get $N) (i64.const 0) (i64.const 1)) ) )
It is a simple fibonacci routines based on Wasm's
call_return
tail calls.When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 10-15x slower than Wasmtime on
aarch64
usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.
Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.
Robbepop edited issue #9690:
In Wasmi's benchmark suite I have the following Wasm test case:
cc @alexcrichton
(module (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64) (if (i64.eqz (local.get $N)) (then (return (local.get $a)) ) ) (if (i64.eq (local.get $N) (i64.const 1)) (then (return (local.get $b)) ) ) (return_call $fib (i64.sub (local.get $N) (i64.const 1)) (local.get $b) (i64.add (local.get $a) (local.get $b)) ) ) (func (export "run") (param $N i64) (result i64) (return_call $fib (local.get $N) (i64.const 0) (i64.const 1)) ) )
It is a simple fibonacci routines based on Wasm's
call_return
tail calls.When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 10-15x slower than Wasmtime on
aarch64
usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.
Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.
Benchmarks from my machine:
execute/fib.tailrec/wasmi-old/1000000 time: [22.361 ms 22.367 ms 22.381 ms] execute/fib.tailrec/wasmi-new.eager.checked/1000000 time: [15.106 ms 15.123 ms 15.144 ms] execute/fib.tailrec/wasmi-new.lazy.checked/1000000 time: [15.062 ms 15.081 ms 15.102 ms] execute/fib.tailrec/wasmtime.cranelift/1000000 time: [4.0465 ms 4.0740 ms 4.1016 ms]
alexcrichton closed issue #9690:
In Wasmi's benchmark suite I have the following Wasm test case:
cc @alexcrichton
(module (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64) (if (i64.eqz (local.get $N)) (then (return (local.get $a)) ) ) (if (i64.eq (local.get $N) (i64.const 1)) (then (return (local.get $b)) ) ) (return_call $fib (i64.sub (local.get $N) (i64.const 1)) (local.get $b) (i64.add (local.get $a) (local.get $b)) ) ) (func (export "run") (param $N i64) (result i64) (return_call $fib (local.get $N) (i64.const 0) (i64.const 1)) ) )
It is a simple fibonacci routines based on Wasm's
call_return
tail calls.When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 10-15x slower than Wasmtime on
aarch64
usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.
Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.
Benchmarks from my machine:
execute/fib.tailrec/wasmi-old/1000000 time: [22.361 ms 22.367 ms 22.381 ms] execute/fib.tailrec/wasmi-new.eager.checked/1000000 time: [15.106 ms 15.123 ms 15.144 ms] execute/fib.tailrec/wasmi-new.lazy.checked/1000000 time: [15.062 ms 15.081 ms 15.102 ms] execute/fib.tailrec/wasmtime.cranelift/1000000 time: [4.0465 ms 4.0740 ms 4.1016 ms]
alexcrichton commented on issue #9690:
Inspecting the disassemblies nothing looks awry to me. The x64 and aarch64 outputs are basically 1:1 here. My guess is that the differences in timing are probably cpu-specific. I'm going to close this because I think it's as-expected from the Wasmtime side at least, but thanks for opening this as it's still good to investigate!
Last updated: Jan 24 2025 at 00:11 UTC