Stream: git-wasmtime

Topic: wasmtime / issue #7340 riscv64: Refactor float-to-int tra...


view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2023 at 21:53):

alexcrichton commented on issue #7340:

@afonso360 this is what I was thinking in terms of frobbing fflags around float-to-int conversions that need to trap. I'm not sure whether this is actually beneficial though in terms of perf, although it does look like a bit of a code size win. I'd be curious to test this on hardware you have in terms of perf, and I can try to whip up something later to test that.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 08:53):

afonso360 commented on issue #7340:

Super curious to see how this pans out! I'm going to run sightglass to check if we see any difference there, but we might need something more targeted.

I'd be curious to test this on hardware you have in terms of perf, and I can try to whip up something later to test that.

Yes, that'd be helpful, I'm not too familiar with writing these sorts of things.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 08:53):

afonso360 edited a comment on issue #7340:

Super curious to see how this pans out! I'm going to run sightglass to check if we see any difference there, but we might need something more targeted.

I'd be curious to test this on hardware you have in terms of perf, and I can try to whip up something later to test that.

That'd be helpful, I'm not too familiar with writing these sorts of things.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 14:27):

alexcrichton commented on issue #7340:

I'm not sure how representative it is, but another idea is bench.wasm.gz which is:

use criterion::{criterion_group, criterion_main, Criterion};
use std::hint::black_box;

fn fcvt_to_uint(c: &mut Criterion) {
    c.bench_function("f32->u32", |b| {
        b.iter(|| black_box(black_box(2.0f32) as u32))
    });
    c.bench_function("f32->u64", |b| {
        b.iter(|| black_box(black_box(2.0f32) as u64))
    });
    c.bench_function("f64->u32", |b| {
        b.iter(|| black_box(black_box(2.0f64) as u32))
    });
    c.bench_function("f64->u64", |b| {
        b.iter(|| black_box(black_box(2.0f64) as u64))
    });
}

fn fcvt_to_sint(c: &mut Criterion) {
    c.bench_function("f32->i32", |b| {
        b.iter(|| black_box(black_box(2.0f32) as i32))
    });
    c.bench_function("f32->i64", |b| {
        b.iter(|| black_box(black_box(2.0f32) as i64))
    });
    c.bench_function("f64->i32", |b| {
        b.iter(|| black_box(black_box(2.0f64) as i32))
    });
    c.bench_function("f64->i64", |b| {
        b.iter(|| black_box(black_box(2.0f64) as i64))
    });
}

criterion_group!(benches, fcvt_to_uint, fcvt_to_sint);
criterion_main!(benches);

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 16:29):

afonso360 commented on issue #7340:

I have some interesting results, and some confusing results.

I first ran sightglass with a smaller iteration count and got the results that main was faster than fflags by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.

<details>
<summary>Sightglass Run 1</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla
gs.so --engine ./engine-main.so  -- ./benchmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/
regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 611139.96 ± 69194.06 (confidence = 99%)

  main.so is 1.03x to 1.03x faster than fflags.so!

  [22003870 22177803.84 22296431] fflags.so
  [21400346 21566663.88 21700502] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6516 7450.52 14467] fflags.so
  [6500 6845.04 13012] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [950 1073.44 1500] fflags.so
  [939 1039.24 1814] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [991 1109.80 1603] fflags.so
  [1018 1129.44 1811] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3600 4488.00 8483] fflags.so
  [3660 4418.44 9216] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1025843 1035182.96 1079490] fflags.so
  [1026383 1044678.80 1099227] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2930 3202.92 3354] fflags.so
  [2983 3231.48 3832] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10667840 10819084.56 10947259] fflags.so
  [10725380 10845151.60 10951025] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111451448 113120311.48 115050440] fflags.so
  [111395436 112890160.52 115262339] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2033630 2237015.88 2365106] fflags.so
  [2032061 2233455.48 2332395] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [3845964 3852698.64 3870163] fflags.so
  [3845116 3849177.08 3880686] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2641545 2668781.48 2727010] fflags.so
  [2621339 2667785.92 2797660] main.so

</details>

<details>
<summary>Sightglass Run 2</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so  -- ./ben
chmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon
key/benchmark.wasm
execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 528886.85 ± 30356.42 (confidence = 99%)

  main.so is 1.02x to 1.03x faster than fflags.so!

  [21985977 22155294.58 22282664] fflags.so
  [21399233 21626407.73 21778731] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  Δ = 4771.68 ± 2965.39 (confidence = 99%)

  main.so is 1.00x to 1.00x faster than fflags.so!

  [3847607 3856010.05 3909326] fflags.so
  [3845022 3851238.37 3912075] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6515 6877.80 14550] fflags.so
  [6489 7310.42 13787] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [927 1029.46 1697] fflags.so
  [923 1067.02 2110] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3555 4316.19 8421] fflags.so
  [3588 4375.00 8089] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2981 3225.85 3511] fflags.so
  [2986 3242.21 3598] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [990 1147.83 1970] fflags.so
  [1000 1153.04 2124] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2607381 2665940.83 2808423] fflags.so
  [2599335 2655289.42 2750262] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111352544 113374288.89 115552774] fflags.so
  [111363591 113073838.73 115549133] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1024902 1040201.83 1092704] fflags.so
  [1025677 1041126.17 1101387] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2020133 2223545.22 2354573] fflags.so
  [2067731 2224627.35 2372602] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10651375 10816177.71 10994312] fflags.so
  [10625726 10814925.59 11001537] main.so

</details>


I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module using the wasmtime cli, but the results don't seem very significant. Do you have any suggestions on how I can do this better?

afonso@starfive:~$ hyperfine     --warmup 1000     --runs 5000     '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm'     '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm'
Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
  Time (mean ± σ):      24.5 ms ±   4.0 ms    [User: 13.3 ms, System: 9.9 ms]
  Range (min  max):    21.3 ms   37.0 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm
  Time (mean ± σ):      23.8 ms ±   3.6 ms    [User: 13.3 ms, System: 9.2 ms]
  Range (min  max):    20.7 ms   40.1 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran
    1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 16:30):

afonso360 edited a comment on issue #7340:

I have some interesting results, and some confusing results.

I first ran sightglass with a smaller iteration count and got the results that main was faster than fflags by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.

<details>
<summary>Sightglass Run 1</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla
gs.so --engine ./engine-main.so  -- ./benchmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/
regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 611139.96 ± 69194.06 (confidence = 99%)

  main.so is 1.03x to 1.03x faster than fflags.so!

  [22003870 22177803.84 22296431] fflags.so
  [21400346 21566663.88 21700502] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6516 7450.52 14467] fflags.so
  [6500 6845.04 13012] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [950 1073.44 1500] fflags.so
  [939 1039.24 1814] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [991 1109.80 1603] fflags.so
  [1018 1129.44 1811] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3600 4488.00 8483] fflags.so
  [3660 4418.44 9216] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1025843 1035182.96 1079490] fflags.so
  [1026383 1044678.80 1099227] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2930 3202.92 3354] fflags.so
  [2983 3231.48 3832] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10667840 10819084.56 10947259] fflags.so
  [10725380 10845151.60 10951025] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111451448 113120311.48 115050440] fflags.so
  [111395436 112890160.52 115262339] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2033630 2237015.88 2365106] fflags.so
  [2032061 2233455.48 2332395] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [3845964 3852698.64 3870163] fflags.so
  [3845116 3849177.08 3880686] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2641545 2668781.48 2727010] fflags.so
  [2621339 2667785.92 2797660] main.so

</details>

<details>
<summary>Sightglass Run 2</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so  -- ./ben
chmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon
key/benchmark.wasm
execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 528886.85 ± 30356.42 (confidence = 99%)

  main.so is 1.02x to 1.03x faster than fflags.so!

  [21985977 22155294.58 22282664] fflags.so
  [21399233 21626407.73 21778731] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  Δ = 4771.68 ± 2965.39 (confidence = 99%)

  main.so is 1.00x to 1.00x faster than fflags.so!

  [3847607 3856010.05 3909326] fflags.so
  [3845022 3851238.37 3912075] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6515 6877.80 14550] fflags.so
  [6489 7310.42 13787] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [927 1029.46 1697] fflags.so
  [923 1067.02 2110] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3555 4316.19 8421] fflags.so
  [3588 4375.00 8089] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2981 3225.85 3511] fflags.so
  [2986 3242.21 3598] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [990 1147.83 1970] fflags.so
  [1000 1153.04 2124] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2607381 2665940.83 2808423] fflags.so
  [2599335 2655289.42 2750262] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111352544 113374288.89 115552774] fflags.so
  [111363591 113073838.73 115549133] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1024902 1040201.83 1092704] fflags.so
  [1025677 1041126.17 1101387] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2020133 2223545.22 2354573] fflags.so
  [2067731 2224627.35 2372602] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10651375 10816177.71 10994312] fflags.so
  [10625726 10814925.59 11001537] main.so

</details>


I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module using the wasmtime cli, but the results have quite a large uncertanty. Do you have any suggestions on how I can do this better?

afonso@starfive:~$ hyperfine     --warmup 1000     --runs 5000     '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm'     '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm'
Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
  Time (mean ± σ):      24.5 ms ±   4.0 ms    [User: 13.3 ms, System: 9.9 ms]
  Range (min  max):    21.3 ms   37.0 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm
  Time (mean ± σ):      23.8 ms ±   3.6 ms    [User: 13.3 ms, System: 9.2 ms]
  Range (min  max):    20.7 ms   40.1 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran
    1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 16:30):

afonso360 edited a comment on issue #7340:

I have some interesting results, and some confusing results.

I first ran sightglass with a smaller iteration count and got the results that main was faster than fflags by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.

<details>
<summary>Sightglass Run 1</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla
gs.so --engine ./engine-main.so  -- ./benchmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/
regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 611139.96 ± 69194.06 (confidence = 99%)

  main.so is 1.03x to 1.03x faster than fflags.so!

  [22003870 22177803.84 22296431] fflags.so
  [21400346 21566663.88 21700502] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6516 7450.52 14467] fflags.so
  [6500 6845.04 13012] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [950 1073.44 1500] fflags.so
  [939 1039.24 1814] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [991 1109.80 1603] fflags.so
  [1018 1129.44 1811] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3600 4488.00 8483] fflags.so
  [3660 4418.44 9216] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1025843 1035182.96 1079490] fflags.so
  [1026383 1044678.80 1099227] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2930 3202.92 3354] fflags.so
  [2983 3231.48 3832] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10667840 10819084.56 10947259] fflags.so
  [10725380 10845151.60 10951025] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111451448 113120311.48 115050440] fflags.so
  [111395436 112890160.52 115262339] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2033630 2237015.88 2365106] fflags.so
  [2032061 2233455.48 2332395] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [3845964 3852698.64 3870163] fflags.so
  [3845116 3849177.08 3880686] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2641545 2668781.48 2727010] fflags.so
  [2621339 2667785.92 2797660] main.so

</details>

<details>
<summary>Sightglass Run 2</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so  -- ./ben
chmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon
key/benchmark.wasm
execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 528886.85 ± 30356.42 (confidence = 99%)

  main.so is 1.02x to 1.03x faster than fflags.so!

  [21985977 22155294.58 22282664] fflags.so
  [21399233 21626407.73 21778731] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  Δ = 4771.68 ± 2965.39 (confidence = 99%)

  main.so is 1.00x to 1.00x faster than fflags.so!

  [3847607 3856010.05 3909326] fflags.so
  [3845022 3851238.37 3912075] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6515 6877.80 14550] fflags.so
  [6489 7310.42 13787] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [927 1029.46 1697] fflags.so
  [923 1067.02 2110] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3555 4316.19 8421] fflags.so
  [3588 4375.00 8089] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2981 3225.85 3511] fflags.so
  [2986 3242.21 3598] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [990 1147.83 1970] fflags.so
  [1000 1153.04 2124] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2607381 2665940.83 2808423] fflags.so
  [2599335 2655289.42 2750262] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111352544 113374288.89 115552774] fflags.so
  [111363591 113073838.73 115549133] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1024902 1040201.83 1092704] fflags.so
  [1025677 1041126.17 1101387] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2020133 2223545.22 2354573] fflags.so
  [2067731 2224627.35 2372602] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10651375 10816177.71 10994312] fflags.so
  [10625726 10814925.59 11001537] main.so

</details>


I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module using the wasmtime cli, but the results have quite a large uncertainty. Do you have any suggestions on how I can do this better?

afonso@starfive:~$ hyperfine     --warmup 1000     --runs 5000     '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm'     '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm'
Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
  Time (mean ± σ):      24.5 ms ±   4.0 ms    [User: 13.3 ms, System: 9.9 ms]
  Range (min  max):    21.3 ms   37.0 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm
  Time (mean ± σ):      23.8 ms ±   3.6 ms    [User: 13.3 ms, System: 9.2 ms]
  Range (min  max):    20.7 ms   40.1 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran
    1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 16:32):

afonso360 edited a comment on issue #7340:

I have some interesting results, and some confusing results.

I first ran sightglass with a smaller iteration count and got the results that main was faster than fflags by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.

<details>
<summary>Sightglass Run 1</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla
gs.so --engine ./engine-main.so  -- ./benchmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/
regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 611139.96 ± 69194.06 (confidence = 99%)

  main.so is 1.03x to 1.03x faster than fflags.so!

  [22003870 22177803.84 22296431] fflags.so
  [21400346 21566663.88 21700502] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6516 7450.52 14467] fflags.so
  [6500 6845.04 13012] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [950 1073.44 1500] fflags.so
  [939 1039.24 1814] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [991 1109.80 1603] fflags.so
  [1018 1129.44 1811] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3600 4488.00 8483] fflags.so
  [3660 4418.44 9216] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1025843 1035182.96 1079490] fflags.so
  [1026383 1044678.80 1099227] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2930 3202.92 3354] fflags.so
  [2983 3231.48 3832] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10667840 10819084.56 10947259] fflags.so
  [10725380 10845151.60 10951025] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111451448 113120311.48 115050440] fflags.so
  [111395436 112890160.52 115262339] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2033630 2237015.88 2365106] fflags.so
  [2032061 2233455.48 2332395] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [3845964 3852698.64 3870163] fflags.so
  [3845116 3849177.08 3880686] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2641545 2668781.48 2727010] fflags.so
  [2621339 2667785.92 2797660] main.so

</details>

<details>
<summary>Sightglass Run 2</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so  -- ./ben
chmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon
key/benchmark.wasm
execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 528886.85 ± 30356.42 (confidence = 99%)

  main.so is 1.02x to 1.03x faster than fflags.so!

  [21985977 22155294.58 22282664] fflags.so
  [21399233 21626407.73 21778731] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  Δ = 4771.68 ± 2965.39 (confidence = 99%)

  main.so is 1.00x to 1.00x faster than fflags.so!

  [3847607 3856010.05 3909326] fflags.so
  [3845022 3851238.37 3912075] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6515 6877.80 14550] fflags.so
  [6489 7310.42 13787] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [927 1029.46 1697] fflags.so
  [923 1067.02 2110] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3555 4316.19 8421] fflags.so
  [3588 4375.00 8089] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2981 3225.85 3511] fflags.so
  [2986 3242.21 3598] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [990 1147.83 1970] fflags.so
  [1000 1153.04 2124] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2607381 2665940.83 2808423] fflags.so
  [2599335 2655289.42 2750262] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111352544 113374288.89 115552774] fflags.so
  [111363591 113073838.73 115549133] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1024902 1040201.83 1092704] fflags.so
  [1025677 1041126.17 1101387] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2020133 2223545.22 2354573] fflags.so
  [2067731 2224627.35 2372602] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10651375 10816177.71 10994312] fflags.so
  [10625726 10814925.59 11001537] main.so

</details>


I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module and the wasmtime cli, but the results have quite a large uncertainty. Do you have any suggestions on how I can do this better?

afonso@starfive:~$ hyperfine     --warmup 1000     --runs 5000     '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm'     '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm'
Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
  Time (mean ± σ):      24.5 ms ±   4.0 ms    [User: 13.3 ms, System: 9.9 ms]
  Range (min  max):    21.3 ms   37.0 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm
  Time (mean ± σ):      23.8 ms ±   3.6 ms    [User: 13.3 ms, System: 9.2 ms]
  Range (min  max):    20.7 ms   40.1 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran
    1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 16:34):

afonso360 edited a comment on issue #7340:

I have some interesting results, and some confusing results.

I first ran sightglass with a smaller iteration count and got the results that main was faster than fflags by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.

<details>
<summary>Sightglass Run 1</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla
gs.so --engine ./engine-main.so  -- ./benchmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/
regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 611139.96 ± 69194.06 (confidence = 99%)

  main.so is 1.03x to 1.03x faster than fflags.so!

  [22003870 22177803.84 22296431] fflags.so
  [21400346 21566663.88 21700502] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6516 7450.52 14467] fflags.so
  [6500 6845.04 13012] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [950 1073.44 1500] fflags.so
  [939 1039.24 1814] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [991 1109.80 1603] fflags.so
  [1018 1129.44 1811] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3600 4488.00 8483] fflags.so
  [3660 4418.44 9216] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1025843 1035182.96 1079490] fflags.so
  [1026383 1044678.80 1099227] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2930 3202.92 3354] fflags.so
  [2983 3231.48 3832] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10667840 10819084.56 10947259] fflags.so
  [10725380 10845151.60 10951025] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111451448 113120311.48 115050440] fflags.so
  [111395436 112890160.52 115262339] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2033630 2237015.88 2365106] fflags.so
  [2032061 2233455.48 2332395] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [3845964 3852698.64 3870163] fflags.so
  [3845116 3849177.08 3880686] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2641545 2668781.48 2727010] fflags.so
  [2621339 2667785.92 2797660] main.so

</details>

<details>
<summary>Sightglass Run 2</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so  -- ./ben
chmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon
key/benchmark.wasm
execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 528886.85 ± 30356.42 (confidence = 99%)

  main.so is 1.02x to 1.03x faster than fflags.so!

  [21985977 22155294.58 22282664] fflags.so
  [21399233 21626407.73 21778731] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  Δ = 4771.68 ± 2965.39 (confidence = 99%)

  main.so is 1.00x to 1.00x faster than fflags.so!

  [3847607 3856010.05 3909326] fflags.so
  [3845022 3851238.37 3912075] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6515 6877.80 14550] fflags.so
  [6489 7310.42 13787] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [927 1029.46 1697] fflags.so
  [923 1067.02 2110] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3555 4316.19 8421] fflags.so
  [3588 4375.00 8089] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2981 3225.85 3511] fflags.so
  [2986 3242.21 3598] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [990 1147.83 1970] fflags.so
  [1000 1153.04 2124] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2607381 2665940.83 2808423] fflags.so
  [2599335 2655289.42 2750262] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111352544 113374288.89 115552774] fflags.so
  [111363591 113073838.73 115549133] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1024902 1040201.83 1092704] fflags.so
  [1025677 1041126.17 1101387] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2020133 2223545.22 2354573] fflags.so
  [2067731 2224627.35 2372602] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10651375 10816177.71 10994312] fflags.so
  [10625726 10814925.59 11001537] main.so

</details>


I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module and the wasmtime cli, but the results have quite a large uncertainty. Do you have any suggestions on how I can do this better?

afonso@starfive:~$ hyperfine     --warmup 1000     --runs 5000     '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm'     '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm'
Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
  Time (mean ± σ):      24.5 ms ±   4.0 ms    [User: 13.3 ms, System: 9.9 ms]
  Range (min  max):    21.3 ms   37.0 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm
  Time (mean ± σ):      23.8 ms ±   3.6 ms    [User: 13.3 ms, System: 9.2 ms]
  Range (min  max):    20.7 ms   40.1 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran
    1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm

Edit: I should note, here fflags is this PR and main is 9d8ca828d1888013b45570a94267778962846ad6 which does include the previous FRM changes.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 16:34):

afonso360 edited a comment on issue #7340:

I have some interesting results, and some confusing results.

I first ran sightglass with a smaller iteration count and got the results that main was faster than fflags by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.

<details>
<summary>Sightglass Run 1</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla
gs.so --engine ./engine-main.so  -- ./benchmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/
regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 611139.96 ± 69194.06 (confidence = 99%)

  main.so is 1.03x to 1.03x faster than fflags.so!

  [22003870 22177803.84 22296431] fflags.so
  [21400346 21566663.88 21700502] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6516 7450.52 14467] fflags.so
  [6500 6845.04 13012] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [950 1073.44 1500] fflags.so
  [939 1039.24 1814] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [991 1109.80 1603] fflags.so
  [1018 1129.44 1811] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3600 4488.00 8483] fflags.so
  [3660 4418.44 9216] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1025843 1035182.96 1079490] fflags.so
  [1026383 1044678.80 1099227] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2930 3202.92 3354] fflags.so
  [2983 3231.48 3832] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10667840 10819084.56 10947259] fflags.so
  [10725380 10845151.60 10951025] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111451448 113120311.48 115050440] fflags.so
  [111395436 112890160.52 115262339] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2033630 2237015.88 2365106] fflags.so
  [2032061 2233455.48 2332395] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [3845964 3852698.64 3870163] fflags.so
  [3845116 3849177.08 3880686] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2641545 2668781.48 2727010] fflags.so
  [2621339 2667785.92 2797660] main.so

</details>

<details>
<summary>Sightglass Run 2</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so  -- ./ben
chmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon
key/benchmark.wasm
execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 528886.85 ± 30356.42 (confidence = 99%)

  main.so is 1.02x to 1.03x faster than fflags.so!

  [21985977 22155294.58 22282664] fflags.so
  [21399233 21626407.73 21778731] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  Δ = 4771.68 ± 2965.39 (confidence = 99%)

  main.so is 1.00x to 1.00x faster than fflags.so!

  [3847607 3856010.05 3909326] fflags.so
  [3845022 3851238.37 3912075] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6515 6877.80 14550] fflags.so
  [6489 7310.42 13787] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [927 1029.46 1697] fflags.so
  [923 1067.02 2110] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3555 4316.19 8421] fflags.so
  [3588 4375.00 8089] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2981 3225.85 3511] fflags.so
  [2986 3242.21 3598] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [990 1147.83 1970] fflags.so
  [1000 1153.04 2124] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2607381 2665940.83 2808423] fflags.so
  [2599335 2655289.42 2750262] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111352544 113374288.89 115552774] fflags.so
  [111363591 113073838.73 115549133] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1024902 1040201.83 1092704] fflags.so
  [1025677 1041126.17 1101387] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2020133 2223545.22 2354573] fflags.so
  [2067731 2224627.35 2372602] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10651375 10816177.71 10994312] fflags.so
  [10625726 10814925.59 11001537] main.so

</details>


I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module and the wasmtime cli, but the results have quite a large uncertainty. Do you have any suggestions on how I can do this better?

afonso@starfive:~$ hyperfine     --warmup 1000     --runs 5000     '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm'     '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm'
Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
  Time (mean ± σ):      24.5 ms ±   4.0 ms    [User: 13.3 ms, System: 9.9 ms]
  Range (min  max):    21.3 ms   37.0 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm
  Time (mean ± σ):      23.8 ms ±   3.6 ms    [User: 13.3 ms, System: 9.2 ms]
  Range (min  max):    20.7 ms   40.1 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran
    1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm

Edit: I should note, here fflags is this PR and main is 9d8ca828d1888013b45570a94267778962846ad6 which includes the previous FRM changes.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 16:39):

afonso360 edited a comment on issue #7340:

I have some interesting results, and some confusing results.

I first ran sightglass with a smaller iteration count and got the results that main was faster than fflags by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.

<details>
<summary>Sightglass Run 1</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla
gs.so --engine ./engine-main.so  -- ./benchmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/
regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 611139.96 ± 69194.06 (confidence = 99%)

  main.so is 1.03x to 1.03x faster than fflags.so!

  [22003870 22177803.84 22296431] fflags.so
  [21400346 21566663.88 21700502] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6516 7450.52 14467] fflags.so
  [6500 6845.04 13012] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [950 1073.44 1500] fflags.so
  [939 1039.24 1814] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [991 1109.80 1603] fflags.so
  [1018 1129.44 1811] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3600 4488.00 8483] fflags.so
  [3660 4418.44 9216] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1025843 1035182.96 1079490] fflags.so
  [1026383 1044678.80 1099227] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2930 3202.92 3354] fflags.so
  [2983 3231.48 3832] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10667840 10819084.56 10947259] fflags.so
  [10725380 10845151.60 10951025] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111451448 113120311.48 115050440] fflags.so
  [111395436 112890160.52 115262339] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2033630 2237015.88 2365106] fflags.so
  [2032061 2233455.48 2332395] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [3845964 3852698.64 3870163] fflags.so
  [3845116 3849177.08 3880686] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2641545 2668781.48 2727010] fflags.so
  [2621339 2667785.92 2797660] main.so

</details>

<details>
<summary>Sightglass Run 2</summary>

afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so  -- ./ben
chmarks/blake3-scalar/benchmark.wasm  benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon
key/benchmark.wasm
execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 528886.85 ± 30356.42 (confidence = 99%)

  main.so is 1.02x to 1.03x faster than fflags.so!

  [21985977 22155294.58 22282664] fflags.so
  [21399233 21626407.73 21778731] main.so

execution :: cycles :: benchmarks/regex/benchmark.wasm

  Δ = 4771.68 ± 2965.39 (confidence = 99%)

  main.so is 1.00x to 1.00x faster than fflags.so!

  [3847607 3856010.05 3909326] fflags.so
  [3845022 3851238.37 3912075] main.so

execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [6515 6877.80 14550] fflags.so
  [6489 7310.42 13787] main.so

instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [927 1029.46 1697] fflags.so
  [923 1067.02 2110] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [3555 4316.19 8421] fflags.so
  [3588 4375.00 8089] main.so

instantiation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [2981 3225.85 3511] fflags.so
  [2986 3242.21 3598] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [990 1147.83 1970] fflags.so
  [1000 1153.04 2124] main.so

compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [2607381 2665940.83 2808423] fflags.so
  [2599335 2655289.42 2750262] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [111352544 113374288.89 115552774] fflags.so
  [111363591 113073838.73 115549133] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [1024902 1040201.83 1092704] fflags.so
  [1025677 1041126.17 1101387] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [2020133 2223545.22 2354573] fflags.so
  [2067731 2224627.35 2372602] main.so

compilation :: cycles :: benchmarks/regex/benchmark.wasm

  No difference in performance.

  [10651375 10816177.71 10994312] fflags.so
  [10625726 10814925.59 11001537] main.so

</details>


I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module and the wasmtime cli, but the results have quite a large uncertainty. Do you have any suggestions on how I can do this better?

I think invoking via cli might be adding too much noise, I'm going to try to build a criterion benchmark that uses the wasmtime API and benchmark it that way.

afonso@starfive:~$ hyperfine     --warmup 1000     --runs 5000     '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm'     '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm'
Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
  Time (mean ± σ):      24.5 ms ±   4.0 ms    [User: 13.3 ms, System: 9.9 ms]
  Range (min  max):    21.3 ms   37.0 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm
  Time (mean ± σ):      23.8 ms ±   3.6 ms    [User: 13.3 ms, System: 9.2 ms]
  Range (min  max):    20.7 ms   40.1 ms    5000 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran
    1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm

Edit: I should note, here fflags is this PR and main is 9d8ca828d1888013b45570a94267778962846ad6 which includes the previous FRM changes.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 16:54):

alexcrichton commented on issue #7340:

Oh for the criterion program you'll need to pass --bench as an argument to get it to actually run benchmarks. In theory you should be able to do wasmtime --dir . ./foo.wasm --bench with a before-and-after wasmtime and criterion should self-report regressions/improvements between the two.

The sightglass run seems a bit damning though in that maybe it's not too useful to inspect the exception flags!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 17:11):

afonso360 commented on issue #7340:

Oh, that makes much more sense! Weirdly none of these had significant performance differences.

<details>
<summary>Criterion output</summary>

afonso@starfive:~$ ./wasmtime-fflags --dir . ./bench.wasm --bench
f32->u32                time:   [23.730 ns 23.732 ns 23.734 ns]
                        change: [-0.2177% -0.1872% -0.1582%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe

f32->u64                time:   [23.731 ns 23.732 ns 23.734 ns]
                        change: [-0.0650% -0.0122% +0.0275%] (p = 0.68 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

f64->u32                time:   [25.399 ns 25.401 ns 25.403 ns]
                        change: [-0.0281% -0.0043% +0.0162%] (p = 0.72 > 0.05)
                        No change in performance detected.
Found 21 outliers among 100 measurements (21.00%)
  4 (4.00%) low mild
  9 (9.00%) high mild
  8 (8.00%) high severe

f64->u64                time:   [25.068 ns 25.069 ns 25.072 ns]
                        change: [-0.0540% -0.0158% +0.0182%] (p = 0.41 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  5 (5.00%) high severe

f32->i32                time:   [30.727 ns 30.729 ns 30.732 ns]
                        change: [-0.0390% -0.0081% +0.0170%] (p = 0.60 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

f32->i64                time:   [32.731 ns 32.734 ns 32.736 ns]
                        change: [-0.0523% -0.0069% +0.0358%] (p = 0.78 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe

f64->i32                time:   [34.740 ns 34.742 ns 34.745 ns]
                        change: [-0.0349% -0.0039% +0.0254%] (p = 0.81 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  8 (8.00%) high severe

f64->i64                time:   [34.067 ns 34.070 ns 34.073 ns]
                        change: [-0.0274% -0.0002% +0.0312%] (p = 0.99 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe

</details>

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 17:16):

afonso360 edited a comment on issue #7340:

Oh, that makes much more sense! Weirdly none of these had significant performance differences.

<details>
<summary>Criterion output</summary>

afonso@starfive:~$ ./wasmtime-main --dir . ./bench.wasm --bench
...
afonso@starfive:~$ ./wasmtime-fflags --dir . ./bench.wasm --bench
f32->u32                time:   [23.730 ns 23.732 ns 23.734 ns]
                        change: [-0.2177% -0.1872% -0.1582%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe

f32->u64                time:   [23.731 ns 23.732 ns 23.734 ns]
                        change: [-0.0650% -0.0122% +0.0275%] (p = 0.68 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

f64->u32                time:   [25.399 ns 25.401 ns 25.403 ns]
                        change: [-0.0281% -0.0043% +0.0162%] (p = 0.72 > 0.05)
                        No change in performance detected.
Found 21 outliers among 100 measurements (21.00%)
  4 (4.00%) low mild
  9 (9.00%) high mild
  8 (8.00%) high severe

f64->u64                time:   [25.068 ns 25.069 ns 25.072 ns]
                        change: [-0.0540% -0.0158% +0.0182%] (p = 0.41 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  5 (5.00%) high severe

f32->i32                time:   [30.727 ns 30.729 ns 30.732 ns]
                        change: [-0.0390% -0.0081% +0.0170%] (p = 0.60 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

f32->i64                time:   [32.731 ns 32.734 ns 32.736 ns]
                        change: [-0.0523% -0.0069% +0.0358%] (p = 0.78 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe

f64->i32                time:   [34.740 ns 34.742 ns 34.745 ns]
                        change: [-0.0349% -0.0039% +0.0254%] (p = 0.81 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  8 (8.00%) high severe

f64->i64                time:   [34.067 ns 34.070 ns 34.073 ns]
                        change: [-0.0274% -0.0002% +0.0312%] (p = 0.99 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe

</details>

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 19:10):

alexcrichton commented on issue #7340:

Thanks for collecting those! Sounds like this isn't the best way to go at this time, so I'm going to close this.


Last updated: Jan 24 2025 at 00:11 UTC