wasmtime / Issue #2644 Runtime invocation overhead 800ns/op · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / Issue #2644 Runtime invocation overhead 800ns/op

Wasmtime GitHub notifications bot (Feb 07 2021 at 07:28):

zhuxiujia edited Issue #2644:

How do I enable JIT compilation in the code under Example?

Hi, I'm trying to use the code in Example to perform a JIT operation，But the performance is very slow

toml

default = ["jitdump", "wasmtime/wat", "wasmtime/parallel-compilation","experimental_x64"]

(module
  (func $sum_f (param $x i32) (param $y i32) (result i32)
    local.get $x
    local.get $y
    i32.add)
(export "run" (func $sum_f)))

example/hello.rs

  println!("Instantiating module...");
    let instance = Instance::new(&store, &module, &[])?;

    // Next we poke around a bit to extract the `run` function from the module.
    println!("Extracting export...");
    let run = instance
        .get_func("run")
        .ok_or(anyhow::format_err!("failed to find `run` function export"))?
        .get2::<i32,i32,i32>()?;
let now=std::time::Instant::now();
    let total=1000000;
    for _ in 0..total{
        run(1,1)?;
    }
   let time = now.elapsed();
        println!(
            "use Time: {:?} ,each:{} ns/op",
            &time,
            time.as_nanos() / (total as u128)
        );

cargo run result(This is very slow, even though I'm using --release,it should be 1ns/op)

cargo run --release  --example hello
//use Time: 852.04292ms ,each:852 ns/op

Wasmtime GitHub notifications bot (Feb 07 2021 at 07:28):

zhuxiujia edited Issue #2644:

Runtime invocation overhead 800ns/op

Hi, I'm trying to use the code in Example to perform a JIT operation，But the performance is very slow

toml

default = ["jitdump", "wasmtime/wat", "wasmtime/parallel-compilation","experimental_x64"]

(module
  (func $sum_f (param $x i32) (param $y i32) (result i32)
    local.get $x
    local.get $y
    i32.add)
(export "run" (func $sum_f)))

example/hello.rs

  println!("Instantiating module...");
    let instance = Instance::new(&store, &module, &[])?;

    // Next we poke around a bit to extract the `run` function from the module.
    println!("Extracting export...");
    let run = instance
        .get_func("run")
        .ok_or(anyhow::format_err!("failed to find `run` function export"))?
        .get2::<i32,i32,i32>()?;
let now=std::time::Instant::now();
    let total=1000000;
    for _ in 0..total{
        run(1,1)?;
    }
   let time = now.elapsed();
        println!(
            "use Time: {:?} ,each:{} ns/op",
            &time,
            time.as_nanos() / (total as u128)
        );

cargo run result(This is very slow, even though I'm using --release,it should be 1ns/op)

cargo run --release  --example hello
//use Time: 852.04292ms ,each:852 ns/op

Wasmtime GitHub notifications bot (Feb 08 2021 at 15:59):

alexcrichton commented on Issue #2644:

Thanks for the report! Can you clarify what platform you're using?

Entry/exit into wasm isn't entirely trivial because we need to set up infrastructure to catch traps and such. Locally on x86_64 macOS I also get ~700ns overhead, but some time profiling shows that ~80% of that time is spent in setjmp which is how we implement traps in WebAssembly (using longjmp back to the start). I posted https://github.com/bytecodealliance/wasmtime/pull/2645 which helps there, but there's possibly other low-hanging fruit here too.

In any case it'd be good to see what platform you're running on!

Wasmtime GitHub notifications bot (Feb 08 2021 at 16:05):

zhuxiujia commented on Issue #2644:

Thanks for the report! Can you clarify what platform you're using?

Entry/exit into wasm isn't entirely trivial because we need to set up infrastructure to catch traps and such. Locally on x86_64 macOS I also get ~700ns overhead, but some time profiling shows that ~80% of that time is spent in setjmp which is how we implement traps in WebAssembly (using longjmp back to the start). I posted #2645 which helps there, but there's possibly other low-hanging fruit here too.

In any case it'd be good to see what platform you're running on!

Hi:
Locally on x86_64 macOS

I tried to use WASM to implement the interpreter crate(for example:'1+1'=2, "'1'+'1'"="11" ), so both WASM and the host were called frequently

Frequent comings and goings in and out of WASM can take a long time

Wasmtime GitHub notifications bot (Feb 08 2021 at 16:06):

zhuxiujia edited a comment on Issue #2644:

Thanks for the report! Can you clarify what platform you're using?

Entry/exit into wasm isn't entirely trivial because we need to set up infrastructure to catch traps and such. Locally on x86_64 macOS I also get ~700ns overhead, but some time profiling shows that ~80% of that time is spent in setjmp which is how we implement traps in WebAssembly (using longjmp back to the start). I posted #2645 which helps there, but there's possibly other low-hanging fruit here too.

In any case it'd be good to see what platform you're running on!

Hi:
Locally on x86_64 macOS
But it's fast on Windows10

I tried to use WASM to implement the interpreter crate(for example:'1+1'=2, "'1'+'1'"="11" ), so both WASM and the host were called frequently

Frequent comings and goings in and out of WASM can take a long time

Wasmtime GitHub notifications bot (Feb 08 2021 at 16:21):

alexcrichton commented on Issue #2644:

Oh great! Then we're running on the same platform :)

Is the 55ns overhead I recorded in #2645 still too larger for your use case?

Wasmtime GitHub notifications bot (Feb 08 2021 at 16:25):

zhuxiujia commented on Issue #2644:

Oh great! Then we're running on the same platform :)

Is the 55ns overhead I recorded in #2645 still too larger for your use case?

Maybe that's why, anyway, it's on my Mac Book

Wasmtime GitHub notifications bot (Feb 08 2021 at 16:27):

zhuxiujia edited a comment on Issue #2644:

Oh great! Then we're running on the same platform :)

Is the 55ns overhead I recorded in #2645 still too larger for your use case?

Maybe that's why, anyway, it's on my Mac Book
The same issue arose with Wasmer crate

Wasmtime GitHub notifications bot (Feb 08 2021 at 16:28):

zhuxiujia edited a comment on Issue #2644:

Oh great! Then we're running on the same platform :)

Is the 55ns overhead I recorded in #2645 still too larger for your use case?

Maybe that's why, anyway, it's on my Mac Book
The same issue arose with Wasmer crate

Wasmtime GitHub notifications bot (Feb 08 2021 at 16:29):

zhuxiujia commented on Issue #2644:

Is it possible to have something to do with Cranelift？？

Wasmtime GitHub notifications bot (Feb 08 2021 at 16:52):

alexcrichton commented on Issue #2644:

Sorry but to clarify, can you benchmark with #2645 applied? Is wasmtime with that patch fast enough for your use case or is it still too slow?

Also, are you saying that Windows is fast locally for you? If so, what is the overhead you're seeing on Windows?

As for other sources of overhead, the main source seems to be accessing thread locals at this point (after #2645), I don't think Cranelift needs to be improved in any regards here.

Wasmtime GitHub notifications bot (Mar 19 2021 at 18:50):

alexcrichton commented on Issue #2644:

I believe the original issue has been fixed so I'm going to close this.

Wasmtime GitHub notifications bot (Mar 19 2021 at 18:50):

alexcrichton closed Issue #2644:

Runtime invocation overhead 800ns/op

Hi, I'm trying to use the code in Example to perform a JIT operation，But the performance is very slow

toml

default = ["jitdump", "wasmtime/wat", "wasmtime/parallel-compilation","experimental_x64"]

(module
  (func $sum_f (param $x i32) (param $y i32) (result i32)
    local.get $x
    local.get $y
    i32.add)
(export "run" (func $sum_f)))

example/hello.rs

  println!("Instantiating module...");
    let instance = Instance::new(&store, &module, &[])?;

    // Next we poke around a bit to extract the `run` function from the module.
    println!("Extracting export...");
    let run = instance
        .get_func("run")
        .ok_or(anyhow::format_err!("failed to find `run` function export"))?
        .get2::<i32,i32,i32>()?;
let now=std::time::Instant::now();
    let total=1000000;
    for _ in 0..total{
        run(1,1)?;
    }
   let time = now.elapsed();
        println!(
            "use Time: {:?} ,each:{} ns/op",
            &time,
            time.as_nanos() / (total as u128)
        );

cargo run result(This is very slow, even though I'm using --release,it should be 1ns/op)

cargo run --release  --example hello
//use Time: 852.04292ms ,each:852 ns/op

Last updated: Apr 17 2025 at 04:04 UTC