wasmtime / issue #9690 Performance of Wasm tail calls is ... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #9690 Performance of Wasm tail calls is ...

Wasmtime GitHub notifications bot (Nov 28 2024 at 12:34):

Robbepop added the bug label to Issue #9690.

Wasmtime GitHub notifications bot (Nov 28 2024 at 12:34):

Robbepop opened issue #9690:

In Wasmi's benchmark suite I have the following Wasm test case:
(module
    (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64)
        (if (i64.eqz (local.get $N))
            (then
                (return (local.get $a))
            )
        )
        (if (i64.eq (local.get $N) (i64.const 1))
            (then
                (return (local.get $b))
            )
        )
        (return_call $fib
            (i64.sub (local.get $N) (i64.const 1))
            (local.get $b)
            (i64.add (local.get $a) (local.get $b))
        )
    )

    (func (export "run") (param $N i64) (result i64)
        (return_call $fib (local.get $N) (i64.const 0) (i64.const 1))
    )
)
It is a simple fibonacci routines based on Wasm's call_return tail calls.

When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 8-12x slower than Wasmtime usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.

After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.

Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.

Wasmtime GitHub notifications bot (Nov 28 2024 at 12:35):

Robbepop edited issue #9690:

In Wasmi's benchmark suite I have the following Wasm test case:

cc @alexcrichton
(module
    (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64)
        (if (i64.eqz (local.get $N))
            (then
                (return (local.get $a))
            )
        )
        (if (i64.eq (local.get $N) (i64.const 1))
            (then
                (return (local.get $b))
            )
        )
        (return_call $fib
            (i64.sub (local.get $N) (i64.const 1))
            (local.get $b)
            (i64.add (local.get $a) (local.get $b))
        )
    )

    (func (export "run") (param $N i64) (result i64)
        (return_call $fib (local.get $N) (i64.const 0) (i64.const 1))
    )
)
It is a simple fibonacci routines based on Wasm's call_return tail calls.

When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 8-12x slower than Wasmtime usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.

After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.

Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.

Wasmtime GitHub notifications bot (Nov 28 2024 at 12:35):

Robbepop edited issue #9690:

In Wasmi's benchmark suite I have the following Wasm test case:

cc @alexcrichton
(module
    (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64)
        (if (i64.eqz (local.get $N))
            (then
                (return (local.get $a))
            )
        )
        (if (i64.eq (local.get $N) (i64.const 1))
            (then
                (return (local.get $b))
            )
        )
        (return_call $fib
            (i64.sub (local.get $N) (i64.const 1))
            (local.get $b)
            (i64.add (local.get $a) (local.get $b))
        )
    )

    (func (export "run") (param $N i64) (result i64)
        (return_call $fib (local.get $N) (i64.const 0) (i64.const 1))
    )
)
It is a simple fibonacci routines based on Wasm's call_return tail calls.

When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 10-15x slower than Wasmtime usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.

After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.

Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.

Wasmtime GitHub notifications bot (Nov 28 2024 at 12:36):

Robbepop edited issue #9690:

In Wasmi's benchmark suite I have the following Wasm test case:

cc @alexcrichton
(module
    (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64)
        (if (i64.eqz (local.get $N))
            (then
                (return (local.get $a))
            )
        )
        (if (i64.eq (local.get $N) (i64.const 1))
            (then
                (return (local.get $b))
            )
        )
        (return_call $fib
            (i64.sub (local.get $N) (i64.const 1))
            (local.get $b)
            (i64.add (local.get $a) (local.get $b))
        )
    )

    (func (export "run") (param $N i64) (result i64)
        (return_call $fib (local.get $N) (i64.const 0) (i64.const 1))
    )
)
It is a simple fibonacci routines based on Wasm's call_return tail calls.

When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 10-15x slower than Wasmtime on aarch64 usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.

After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.

Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.

Wasmtime GitHub notifications bot (Nov 28 2024 at 13:16):

Robbepop edited issue #9690:

In Wasmi's benchmark suite I have the following Wasm test case:

cc @alexcrichton
(module
    (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64)
        (if (i64.eqz (local.get $N))
            (then
                (return (local.get $a))
            )
        )
        (if (i64.eq (local.get $N) (i64.const 1))
            (then
                (return (local.get $b))
            )
        )
        (return_call $fib
            (i64.sub (local.get $N) (i64.const 1))
            (local.get $b)
            (i64.add (local.get $a) (local.get $b))
        )
    )

    (func (export "run") (param $N i64) (result i64)
        (return_call $fib (local.get $N) (i64.const 0) (i64.const 1))
    )
)
It is a simple fibonacci routines based on Wasm's call_return tail calls.

When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 10-15x slower than Wasmtime on aarch64 usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.

After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.

Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.

Benchmarks from my machine:
execute/fib.tailrec/wasmi-old/1000000
                        time:   [22.361 ms 22.367 ms 22.381 ms]
execute/fib.tailrec/wasmi-new.eager.checked/1000000
                        time:   [15.106 ms 15.123 ms 15.144 ms]
execute/fib.tailrec/wasmi-new.lazy.checked/1000000
                        time:   [15.062 ms 15.081 ms 15.102 ms]
execute/fib.tailrec/wasmtime.cranelift/1000000
                        time:   [4.0465 ms 4.0740 ms 4.1016 ms]

Wasmtime GitHub notifications bot (Dec 03 2024 at 00:11):

alexcrichton closed issue #9690:

In Wasmi's benchmark suite I have the following Wasm test case:

cc @alexcrichton
(module
    (func $fib (param $N i64) (param $a i64) (param $b i64) (result i64)
        (if (i64.eqz (local.get $N))
            (then
                (return (local.get $a))
            )
        )
        (if (i64.eq (local.get $N) (i64.const 1))
            (then
                (return (local.get $b))
            )
        )
        (return_call $fib
            (i64.sub (local.get $N) (i64.const 1))
            (local.get $b)
            (i64.add (local.get $a) (local.get $b))
        )
    )

    (func (export "run") (param $N i64) (result i64)
        (return_call $fib (local.get $N) (i64.const 0) (i64.const 1))
    )
)
It is a simple fibonacci routines based on Wasm's call_return tail calls.

When I ran those benchmarks on my Macbook M2 Pro I saw that Wasmi is roughly 10-15x slower than Wasmtime on aarch64 usually. However, for this particular test-case it is just ~4x slower than Wasmtime. Back then I found this suspicious which is why I didn't mention this in the article I wrote about Wasmi.

After having had a short discussion with @alexcrichton he told me to open an issue since this kind of performance gap is considered a bug for Wasmtime maintainers.

Feel free to clone Wasmi benchmarks and test it out on your own hardware. Unfortunately I only have a Macbook M2 Pro and nothing else, so I cannot rerun those benchmarks on different hardware for this issue.

Benchmarks from my machine:
execute/fib.tailrec/wasmi-old/1000000
                        time:   [22.361 ms 22.367 ms 22.381 ms]
execute/fib.tailrec/wasmi-new.eager.checked/1000000
                        time:   [15.106 ms 15.123 ms 15.144 ms]
execute/fib.tailrec/wasmi-new.lazy.checked/1000000
                        time:   [15.062 ms 15.081 ms 15.102 ms]
execute/fib.tailrec/wasmtime.cranelift/1000000
                        time:   [4.0465 ms 4.0740 ms 4.1016 ms]

Wasmtime GitHub notifications bot (Dec 03 2024 at 00:11):

alexcrichton commented on issue #9690:

Inspecting the disassemblies nothing looks awry to me. The x64 and aarch64 outputs are basically 1:1 here. My guess is that the differences in timing are probably cpu-specific. I'm going to close this because I think it's as-expected from the Wasmtime side at least, but thanks for opening this as it's still good to investigate!

Last updated: Apr 18 2025 at 13:08 UTC