alexcrichton commented on issue #7340:
@afonso360 this is what I was thinking in terms of frobbing
fflags
around float-to-int conversions that need to trap. I'm not sure whether this is actually beneficial though in terms of perf, although it does look like a bit of a code size win. I'd be curious to test this on hardware you have in terms of perf, and I can try to whip up something later to test that.
afonso360 commented on issue #7340:
Super curious to see how this pans out! I'm going to run sightglass to check if we see any difference there, but we might need something more targeted.
I'd be curious to test this on hardware you have in terms of perf, and I can try to whip up something later to test that.
Yes, that'd be helpful, I'm not too familiar with writing these sorts of things.
afonso360 edited a comment on issue #7340:
Super curious to see how this pans out! I'm going to run sightglass to check if we see any difference there, but we might need something more targeted.
I'd be curious to test this on hardware you have in terms of perf, and I can try to whip up something later to test that.
That'd be helpful, I'm not too familiar with writing these sorts of things.
alexcrichton commented on issue #7340:
I'm not sure how representative it is, but another idea is bench.wasm.gz which is:
use criterion::{criterion_group, criterion_main, Criterion}; use std::hint::black_box; fn fcvt_to_uint(c: &mut Criterion) { c.bench_function("f32->u32", |b| { b.iter(|| black_box(black_box(2.0f32) as u32)) }); c.bench_function("f32->u64", |b| { b.iter(|| black_box(black_box(2.0f32) as u64)) }); c.bench_function("f64->u32", |b| { b.iter(|| black_box(black_box(2.0f64) as u32)) }); c.bench_function("f64->u64", |b| { b.iter(|| black_box(black_box(2.0f64) as u64)) }); } fn fcvt_to_sint(c: &mut Criterion) { c.bench_function("f32->i32", |b| { b.iter(|| black_box(black_box(2.0f32) as i32)) }); c.bench_function("f32->i64", |b| { b.iter(|| black_box(black_box(2.0f32) as i64)) }); c.bench_function("f64->i32", |b| { b.iter(|| black_box(black_box(2.0f64) as i32)) }); c.bench_function("f64->i64", |b| { b.iter(|| black_box(black_box(2.0f64) as i64)) }); } criterion_group!(benches, fcvt_to_uint, fcvt_to_sint); criterion_main!(benches);
afonso360 commented on issue #7340:
I have some interesting results, and some confusing results.
I first ran sightglass with a smaller iteration count and got the results that
main
was faster thanfflags
by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.<details>
<summary>Sightglass Run 1</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla gs.so --engine ./engine-main.so -- ./benchmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/ regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 611139.96 ± 69194.06 (confidence = 99%) main.so is 1.03x to 1.03x faster than fflags.so! [22003870 22177803.84 22296431] fflags.so [21400346 21566663.88 21700502] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6516 7450.52 14467] fflags.so [6500 6845.04 13012] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [950 1073.44 1500] fflags.so [939 1039.24 1814] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [991 1109.80 1603] fflags.so [1018 1129.44 1811] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3600 4488.00 8483] fflags.so [3660 4418.44 9216] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1025843 1035182.96 1079490] fflags.so [1026383 1044678.80 1099227] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2930 3202.92 3354] fflags.so [2983 3231.48 3832] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10667840 10819084.56 10947259] fflags.so [10725380 10845151.60 10951025] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111451448 113120311.48 115050440] fflags.so [111395436 112890160.52 115262339] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2033630 2237015.88 2365106] fflags.so [2032061 2233455.48 2332395] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [3845964 3852698.64 3870163] fflags.so [3845116 3849177.08 3880686] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2641545 2668781.48 2727010] fflags.so [2621339 2667785.92 2797660] main.so
</details>
<details>
<summary>Sightglass Run 2</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so -- ./ben chmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon key/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 528886.85 ± 30356.42 (confidence = 99%) main.so is 1.02x to 1.03x faster than fflags.so! [21985977 22155294.58 22282664] fflags.so [21399233 21626407.73 21778731] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm Δ = 4771.68 ± 2965.39 (confidence = 99%) main.so is 1.00x to 1.00x faster than fflags.so! [3847607 3856010.05 3909326] fflags.so [3845022 3851238.37 3912075] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6515 6877.80 14550] fflags.so [6489 7310.42 13787] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [927 1029.46 1697] fflags.so [923 1067.02 2110] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3555 4316.19 8421] fflags.so [3588 4375.00 8089] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2981 3225.85 3511] fflags.so [2986 3242.21 3598] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [990 1147.83 1970] fflags.so [1000 1153.04 2124] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2607381 2665940.83 2808423] fflags.so [2599335 2655289.42 2750262] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111352544 113374288.89 115552774] fflags.so [111363591 113073838.73 115549133] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1024902 1040201.83 1092704] fflags.so [1025677 1041126.17 1101387] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2020133 2223545.22 2354573] fflags.so [2067731 2224627.35 2372602] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10651375 10816177.71 10994312] fflags.so [10625726 10814925.59 11001537] main.so
</details>
I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module using the wasmtime cli, but the results don't seem very significant. Do you have any suggestions on how I can do this better?
afonso@starfive:~$ hyperfine --warmup 1000 --runs 5000 '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm' '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm' Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm Time (mean ± σ): 24.5 ms ± 4.0 ms [User: 13.3 ms, System: 9.9 ms] Range (min … max): 21.3 ms … 37.0 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm Time (mean ± σ): 23.8 ms ± 3.6 ms [User: 13.3 ms, System: 9.2 ms] Range (min … max): 20.7 ms … 40.1 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran 1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
afonso360 edited a comment on issue #7340:
I have some interesting results, and some confusing results.
I first ran sightglass with a smaller iteration count and got the results that
main
was faster thanfflags
by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.<details>
<summary>Sightglass Run 1</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla gs.so --engine ./engine-main.so -- ./benchmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/ regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 611139.96 ± 69194.06 (confidence = 99%) main.so is 1.03x to 1.03x faster than fflags.so! [22003870 22177803.84 22296431] fflags.so [21400346 21566663.88 21700502] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6516 7450.52 14467] fflags.so [6500 6845.04 13012] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [950 1073.44 1500] fflags.so [939 1039.24 1814] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [991 1109.80 1603] fflags.so [1018 1129.44 1811] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3600 4488.00 8483] fflags.so [3660 4418.44 9216] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1025843 1035182.96 1079490] fflags.so [1026383 1044678.80 1099227] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2930 3202.92 3354] fflags.so [2983 3231.48 3832] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10667840 10819084.56 10947259] fflags.so [10725380 10845151.60 10951025] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111451448 113120311.48 115050440] fflags.so [111395436 112890160.52 115262339] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2033630 2237015.88 2365106] fflags.so [2032061 2233455.48 2332395] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [3845964 3852698.64 3870163] fflags.so [3845116 3849177.08 3880686] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2641545 2668781.48 2727010] fflags.so [2621339 2667785.92 2797660] main.so
</details>
<details>
<summary>Sightglass Run 2</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so -- ./ben chmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon key/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 528886.85 ± 30356.42 (confidence = 99%) main.so is 1.02x to 1.03x faster than fflags.so! [21985977 22155294.58 22282664] fflags.so [21399233 21626407.73 21778731] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm Δ = 4771.68 ± 2965.39 (confidence = 99%) main.so is 1.00x to 1.00x faster than fflags.so! [3847607 3856010.05 3909326] fflags.so [3845022 3851238.37 3912075] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6515 6877.80 14550] fflags.so [6489 7310.42 13787] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [927 1029.46 1697] fflags.so [923 1067.02 2110] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3555 4316.19 8421] fflags.so [3588 4375.00 8089] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2981 3225.85 3511] fflags.so [2986 3242.21 3598] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [990 1147.83 1970] fflags.so [1000 1153.04 2124] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2607381 2665940.83 2808423] fflags.so [2599335 2655289.42 2750262] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111352544 113374288.89 115552774] fflags.so [111363591 113073838.73 115549133] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1024902 1040201.83 1092704] fflags.so [1025677 1041126.17 1101387] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2020133 2223545.22 2354573] fflags.so [2067731 2224627.35 2372602] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10651375 10816177.71 10994312] fflags.so [10625726 10814925.59 11001537] main.so
</details>
I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module using the wasmtime cli, but the results have quite a large uncertanty. Do you have any suggestions on how I can do this better?
afonso@starfive:~$ hyperfine --warmup 1000 --runs 5000 '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm' '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm' Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm Time (mean ± σ): 24.5 ms ± 4.0 ms [User: 13.3 ms, System: 9.9 ms] Range (min … max): 21.3 ms … 37.0 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm Time (mean ± σ): 23.8 ms ± 3.6 ms [User: 13.3 ms, System: 9.2 ms] Range (min … max): 20.7 ms … 40.1 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran 1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
afonso360 edited a comment on issue #7340:
I have some interesting results, and some confusing results.
I first ran sightglass with a smaller iteration count and got the results that
main
was faster thanfflags
by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.<details>
<summary>Sightglass Run 1</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla gs.so --engine ./engine-main.so -- ./benchmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/ regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 611139.96 ± 69194.06 (confidence = 99%) main.so is 1.03x to 1.03x faster than fflags.so! [22003870 22177803.84 22296431] fflags.so [21400346 21566663.88 21700502] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6516 7450.52 14467] fflags.so [6500 6845.04 13012] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [950 1073.44 1500] fflags.so [939 1039.24 1814] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [991 1109.80 1603] fflags.so [1018 1129.44 1811] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3600 4488.00 8483] fflags.so [3660 4418.44 9216] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1025843 1035182.96 1079490] fflags.so [1026383 1044678.80 1099227] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2930 3202.92 3354] fflags.so [2983 3231.48 3832] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10667840 10819084.56 10947259] fflags.so [10725380 10845151.60 10951025] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111451448 113120311.48 115050440] fflags.so [111395436 112890160.52 115262339] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2033630 2237015.88 2365106] fflags.so [2032061 2233455.48 2332395] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [3845964 3852698.64 3870163] fflags.so [3845116 3849177.08 3880686] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2641545 2668781.48 2727010] fflags.so [2621339 2667785.92 2797660] main.so
</details>
<details>
<summary>Sightglass Run 2</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so -- ./ben chmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon key/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 528886.85 ± 30356.42 (confidence = 99%) main.so is 1.02x to 1.03x faster than fflags.so! [21985977 22155294.58 22282664] fflags.so [21399233 21626407.73 21778731] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm Δ = 4771.68 ± 2965.39 (confidence = 99%) main.so is 1.00x to 1.00x faster than fflags.so! [3847607 3856010.05 3909326] fflags.so [3845022 3851238.37 3912075] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6515 6877.80 14550] fflags.so [6489 7310.42 13787] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [927 1029.46 1697] fflags.so [923 1067.02 2110] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3555 4316.19 8421] fflags.so [3588 4375.00 8089] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2981 3225.85 3511] fflags.so [2986 3242.21 3598] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [990 1147.83 1970] fflags.so [1000 1153.04 2124] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2607381 2665940.83 2808423] fflags.so [2599335 2655289.42 2750262] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111352544 113374288.89 115552774] fflags.so [111363591 113073838.73 115549133] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1024902 1040201.83 1092704] fflags.so [1025677 1041126.17 1101387] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2020133 2223545.22 2354573] fflags.so [2067731 2224627.35 2372602] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10651375 10816177.71 10994312] fflags.so [10625726 10814925.59 11001537] main.so
</details>
I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module using the wasmtime cli, but the results have quite a large uncertainty. Do you have any suggestions on how I can do this better?
afonso@starfive:~$ hyperfine --warmup 1000 --runs 5000 '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm' '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm' Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm Time (mean ± σ): 24.5 ms ± 4.0 ms [User: 13.3 ms, System: 9.9 ms] Range (min … max): 21.3 ms … 37.0 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm Time (mean ± σ): 23.8 ms ± 3.6 ms [User: 13.3 ms, System: 9.2 ms] Range (min … max): 20.7 ms … 40.1 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran 1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
afonso360 edited a comment on issue #7340:
I have some interesting results, and some confusing results.
I first ran sightglass with a smaller iteration count and got the results that
main
was faster thanfflags
by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.<details>
<summary>Sightglass Run 1</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla gs.so --engine ./engine-main.so -- ./benchmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/ regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 611139.96 ± 69194.06 (confidence = 99%) main.so is 1.03x to 1.03x faster than fflags.so! [22003870 22177803.84 22296431] fflags.so [21400346 21566663.88 21700502] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6516 7450.52 14467] fflags.so [6500 6845.04 13012] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [950 1073.44 1500] fflags.so [939 1039.24 1814] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [991 1109.80 1603] fflags.so [1018 1129.44 1811] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3600 4488.00 8483] fflags.so [3660 4418.44 9216] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1025843 1035182.96 1079490] fflags.so [1026383 1044678.80 1099227] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2930 3202.92 3354] fflags.so [2983 3231.48 3832] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10667840 10819084.56 10947259] fflags.so [10725380 10845151.60 10951025] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111451448 113120311.48 115050440] fflags.so [111395436 112890160.52 115262339] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2033630 2237015.88 2365106] fflags.so [2032061 2233455.48 2332395] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [3845964 3852698.64 3870163] fflags.so [3845116 3849177.08 3880686] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2641545 2668781.48 2727010] fflags.so [2621339 2667785.92 2797660] main.so
</details>
<details>
<summary>Sightglass Run 2</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so -- ./ben chmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon key/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 528886.85 ± 30356.42 (confidence = 99%) main.so is 1.02x to 1.03x faster than fflags.so! [21985977 22155294.58 22282664] fflags.so [21399233 21626407.73 21778731] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm Δ = 4771.68 ± 2965.39 (confidence = 99%) main.so is 1.00x to 1.00x faster than fflags.so! [3847607 3856010.05 3909326] fflags.so [3845022 3851238.37 3912075] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6515 6877.80 14550] fflags.so [6489 7310.42 13787] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [927 1029.46 1697] fflags.so [923 1067.02 2110] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3555 4316.19 8421] fflags.so [3588 4375.00 8089] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2981 3225.85 3511] fflags.so [2986 3242.21 3598] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [990 1147.83 1970] fflags.so [1000 1153.04 2124] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2607381 2665940.83 2808423] fflags.so [2599335 2655289.42 2750262] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111352544 113374288.89 115552774] fflags.so [111363591 113073838.73 115549133] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1024902 1040201.83 1092704] fflags.so [1025677 1041126.17 1101387] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2020133 2223545.22 2354573] fflags.so [2067731 2224627.35 2372602] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10651375 10816177.71 10994312] fflags.so [10625726 10814925.59 11001537] main.so
</details>
I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module and the wasmtime cli, but the results have quite a large uncertainty. Do you have any suggestions on how I can do this better?
afonso@starfive:~$ hyperfine --warmup 1000 --runs 5000 '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm' '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm' Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm Time (mean ± σ): 24.5 ms ± 4.0 ms [User: 13.3 ms, System: 9.9 ms] Range (min … max): 21.3 ms … 37.0 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm Time (mean ± σ): 23.8 ms ± 3.6 ms [User: 13.3 ms, System: 9.2 ms] Range (min … max): 20.7 ms … 40.1 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran 1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
afonso360 edited a comment on issue #7340:
I have some interesting results, and some confusing results.
I first ran sightglass with a smaller iteration count and got the results that
main
was faster thanfflags
by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.<details>
<summary>Sightglass Run 1</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla gs.so --engine ./engine-main.so -- ./benchmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/ regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 611139.96 ± 69194.06 (confidence = 99%) main.so is 1.03x to 1.03x faster than fflags.so! [22003870 22177803.84 22296431] fflags.so [21400346 21566663.88 21700502] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6516 7450.52 14467] fflags.so [6500 6845.04 13012] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [950 1073.44 1500] fflags.so [939 1039.24 1814] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [991 1109.80 1603] fflags.so [1018 1129.44 1811] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3600 4488.00 8483] fflags.so [3660 4418.44 9216] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1025843 1035182.96 1079490] fflags.so [1026383 1044678.80 1099227] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2930 3202.92 3354] fflags.so [2983 3231.48 3832] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10667840 10819084.56 10947259] fflags.so [10725380 10845151.60 10951025] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111451448 113120311.48 115050440] fflags.so [111395436 112890160.52 115262339] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2033630 2237015.88 2365106] fflags.so [2032061 2233455.48 2332395] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [3845964 3852698.64 3870163] fflags.so [3845116 3849177.08 3880686] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2641545 2668781.48 2727010] fflags.so [2621339 2667785.92 2797660] main.so
</details>
<details>
<summary>Sightglass Run 2</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so -- ./ben chmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon key/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 528886.85 ± 30356.42 (confidence = 99%) main.so is 1.02x to 1.03x faster than fflags.so! [21985977 22155294.58 22282664] fflags.so [21399233 21626407.73 21778731] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm Δ = 4771.68 ± 2965.39 (confidence = 99%) main.so is 1.00x to 1.00x faster than fflags.so! [3847607 3856010.05 3909326] fflags.so [3845022 3851238.37 3912075] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6515 6877.80 14550] fflags.so [6489 7310.42 13787] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [927 1029.46 1697] fflags.so [923 1067.02 2110] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3555 4316.19 8421] fflags.so [3588 4375.00 8089] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2981 3225.85 3511] fflags.so [2986 3242.21 3598] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [990 1147.83 1970] fflags.so [1000 1153.04 2124] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2607381 2665940.83 2808423] fflags.so [2599335 2655289.42 2750262] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111352544 113374288.89 115552774] fflags.so [111363591 113073838.73 115549133] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1024902 1040201.83 1092704] fflags.so [1025677 1041126.17 1101387] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2020133 2223545.22 2354573] fflags.so [2067731 2224627.35 2372602] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10651375 10816177.71 10994312] fflags.so [10625726 10814925.59 11001537] main.so
</details>
I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module and the wasmtime cli, but the results have quite a large uncertainty. Do you have any suggestions on how I can do this better?
afonso@starfive:~$ hyperfine --warmup 1000 --runs 5000 '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm' '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm' Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm Time (mean ± σ): 24.5 ms ± 4.0 ms [User: 13.3 ms, System: 9.9 ms] Range (min … max): 21.3 ms … 37.0 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm Time (mean ± σ): 23.8 ms ± 3.6 ms [User: 13.3 ms, System: 9.2 ms] Range (min … max): 20.7 ms … 40.1 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran 1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
Edit: I should note, here
fflags
is this PR andmain
is 9d8ca828d1888013b45570a94267778962846ad6 which does include the previous FRM changes.
afonso360 edited a comment on issue #7340:
I have some interesting results, and some confusing results.
I first ran sightglass with a smaller iteration count and got the results that
main
was faster thanfflags
by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.<details>
<summary>Sightglass Run 1</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla gs.so --engine ./engine-main.so -- ./benchmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/ regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 611139.96 ± 69194.06 (confidence = 99%) main.so is 1.03x to 1.03x faster than fflags.so! [22003870 22177803.84 22296431] fflags.so [21400346 21566663.88 21700502] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6516 7450.52 14467] fflags.so [6500 6845.04 13012] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [950 1073.44 1500] fflags.so [939 1039.24 1814] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [991 1109.80 1603] fflags.so [1018 1129.44 1811] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3600 4488.00 8483] fflags.so [3660 4418.44 9216] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1025843 1035182.96 1079490] fflags.so [1026383 1044678.80 1099227] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2930 3202.92 3354] fflags.so [2983 3231.48 3832] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10667840 10819084.56 10947259] fflags.so [10725380 10845151.60 10951025] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111451448 113120311.48 115050440] fflags.so [111395436 112890160.52 115262339] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2033630 2237015.88 2365106] fflags.so [2032061 2233455.48 2332395] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [3845964 3852698.64 3870163] fflags.so [3845116 3849177.08 3880686] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2641545 2668781.48 2727010] fflags.so [2621339 2667785.92 2797660] main.so
</details>
<details>
<summary>Sightglass Run 2</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so -- ./ben chmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon key/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 528886.85 ± 30356.42 (confidence = 99%) main.so is 1.02x to 1.03x faster than fflags.so! [21985977 22155294.58 22282664] fflags.so [21399233 21626407.73 21778731] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm Δ = 4771.68 ± 2965.39 (confidence = 99%) main.so is 1.00x to 1.00x faster than fflags.so! [3847607 3856010.05 3909326] fflags.so [3845022 3851238.37 3912075] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6515 6877.80 14550] fflags.so [6489 7310.42 13787] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [927 1029.46 1697] fflags.so [923 1067.02 2110] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3555 4316.19 8421] fflags.so [3588 4375.00 8089] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2981 3225.85 3511] fflags.so [2986 3242.21 3598] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [990 1147.83 1970] fflags.so [1000 1153.04 2124] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2607381 2665940.83 2808423] fflags.so [2599335 2655289.42 2750262] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111352544 113374288.89 115552774] fflags.so [111363591 113073838.73 115549133] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1024902 1040201.83 1092704] fflags.so [1025677 1041126.17 1101387] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2020133 2223545.22 2354573] fflags.so [2067731 2224627.35 2372602] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10651375 10816177.71 10994312] fflags.so [10625726 10814925.59 11001537] main.so
</details>
I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module and the wasmtime cli, but the results have quite a large uncertainty. Do you have any suggestions on how I can do this better?
afonso@starfive:~$ hyperfine --warmup 1000 --runs 5000 '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm' '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm' Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm Time (mean ± σ): 24.5 ms ± 4.0 ms [User: 13.3 ms, System: 9.9 ms] Range (min … max): 21.3 ms … 37.0 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm Time (mean ± σ): 23.8 ms ± 3.6 ms [User: 13.3 ms, System: 9.2 ms] Range (min … max): 20.7 ms … 40.1 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran 1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
Edit: I should note, here
fflags
is this PR andmain
is 9d8ca828d1888013b45570a94267778962846ad6 which includes the previous FRM changes.
afonso360 edited a comment on issue #7340:
I have some interesting results, and some confusing results.
I first ran sightglass with a smaller iteration count and got the results that
main
was faster thanfflags
by 3% in spidermonkey, which was surprising to me so I ran it again with the default iteration count and got the same result.<details>
<summary>Sightglass Run 1</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --processes 5 --iterations-per-process 5 --engine ./engine-ffla gs.so --engine ./engine-main.so -- ./benchmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/ regex/benchmark.wasm benchmarks/spidermonkey/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 611139.96 ± 69194.06 (confidence = 99%) main.so is 1.03x to 1.03x faster than fflags.so! [22003870 22177803.84 22296431] fflags.so [21400346 21566663.88 21700502] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6516 7450.52 14467] fflags.so [6500 6845.04 13012] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [950 1073.44 1500] fflags.so [939 1039.24 1814] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [991 1109.80 1603] fflags.so [1018 1129.44 1811] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3600 4488.00 8483] fflags.so [3660 4418.44 9216] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1025843 1035182.96 1079490] fflags.so [1026383 1044678.80 1099227] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2930 3202.92 3354] fflags.so [2983 3231.48 3832] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10667840 10819084.56 10947259] fflags.so [10725380 10845151.60 10951025] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111451448 113120311.48 115050440] fflags.so [111395436 112890160.52 115262339] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2033630 2237015.88 2365106] fflags.so [2032061 2233455.48 2332395] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [3845964 3852698.64 3870163] fflags.so [3845116 3849177.08 3880686] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2641545 2668781.48 2727010] fflags.so [2621339 2667785.92 2797660] main.so
</details>
<details>
<summary>Sightglass Run 2</summary>afonso@starfive:~/git/sightglass$ cargo run -- benchmark --engine ./engine-fflags.so --engine ./engine-main.so -- ./ben chmarks/blake3-scalar/benchmark.wasm benchmarks/bz2/benchmark.wasm benchmarks/regex/benchmark.wasm benchmarks/spidermon key/benchmark.wasm execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 528886.85 ± 30356.42 (confidence = 99%) main.so is 1.02x to 1.03x faster than fflags.so! [21985977 22155294.58 22282664] fflags.so [21399233 21626407.73 21778731] main.so execution :: cycles :: benchmarks/regex/benchmark.wasm Δ = 4771.68 ± 2965.39 (confidence = 99%) main.so is 1.00x to 1.00x faster than fflags.so! [3847607 3856010.05 3909326] fflags.so [3845022 3851238.37 3912075] main.so execution :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [6515 6877.80 14550] fflags.so [6489 7310.42 13787] main.so instantiation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [927 1029.46 1697] fflags.so [923 1067.02 2110] main.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3555 4316.19 8421] fflags.so [3588 4375.00 8089] main.so instantiation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [2981 3225.85 3511] fflags.so [2986 3242.21 3598] main.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [990 1147.83 1970] fflags.so [1000 1153.04 2124] main.so compilation :: cycles :: ./benchmarks/blake3-scalar/benchmark.wasm No difference in performance. [2607381 2665940.83 2808423] fflags.so [2599335 2655289.42 2750262] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [111352544 113374288.89 115552774] fflags.so [111363591 113073838.73 115549133] main.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [1024902 1040201.83 1092704] fflags.so [1025677 1041126.17 1101387] main.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [2020133 2223545.22 2354573] fflags.so [2067731 2224627.35 2372602] main.so compilation :: cycles :: benchmarks/regex/benchmark.wasm No difference in performance. [10651375 10816177.71 10994312] fflags.so [10625726 10814925.59 11001537] main.so
</details>
I also tried to run the above benchmark via sightglass but couldn't get it to execute, so I ran it using a precompiled module and the wasmtime cli, but the results have quite a large uncertainty. Do you have any suggestions on how I can do this better?
I think invoking via cli might be adding too much noise, I'm going to try to build a criterion benchmark that uses the wasmtime API and benchmark it that way.
afonso@starfive:~$ hyperfine --warmup 1000 --runs 5000 '~/wasmtime-main --allow-precompiled ~/bench-main.cwasm' '~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm' Benchmark 1: ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm Time (mean ± σ): 24.5 ms ± 4.0 ms [User: 13.3 ms, System: 9.9 ms] Range (min … max): 21.3 ms … 37.0 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 2: ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm Time (mean ± σ): 23.8 ms ± 3.6 ms [User: 13.3 ms, System: 9.2 ms] Range (min … max): 20.7 ms … 40.1 ms 5000 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Summary ~/wasmtime-fflags --allow-precompiled ~/bench-fflags.cwasm ran 1.03 ± 0.23 times faster than ~/wasmtime-main --allow-precompiled ~/bench-main.cwasm
Edit: I should note, here
fflags
is this PR andmain
is 9d8ca828d1888013b45570a94267778962846ad6 which includes the previous FRM changes.
alexcrichton commented on issue #7340:
Oh for the criterion program you'll need to pass
--bench
as an argument to get it to actually run benchmarks. In theory you should be able to dowasmtime --dir . ./foo.wasm --bench
with a before-and-after wasmtime and criterion should self-report regressions/improvements between the two.The sightglass run seems a bit damning though in that maybe it's not too useful to inspect the exception flags!
afonso360 commented on issue #7340:
Oh, that makes much more sense! Weirdly none of these had significant performance differences.
<details>
<summary>Criterion output</summary>afonso@starfive:~$ ./wasmtime-fflags --dir . ./bench.wasm --bench f32->u32 time: [23.730 ns 23.732 ns 23.734 ns] change: [-0.2177% -0.1872% -0.1582%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 5 (5.00%) high mild 4 (4.00%) high severe f32->u64 time: [23.731 ns 23.732 ns 23.734 ns] change: [-0.0650% -0.0122% +0.0275%] (p = 0.68 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe f64->u32 time: [25.399 ns 25.401 ns 25.403 ns] change: [-0.0281% -0.0043% +0.0162%] (p = 0.72 > 0.05) No change in performance detected. Found 21 outliers among 100 measurements (21.00%) 4 (4.00%) low mild 9 (9.00%) high mild 8 (8.00%) high severe f64->u64 time: [25.068 ns 25.069 ns 25.072 ns] change: [-0.0540% -0.0158% +0.0182%] (p = 0.41 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low mild 7 (7.00%) high mild 5 (5.00%) high severe f32->i32 time: [30.727 ns 30.729 ns 30.732 ns] change: [-0.0390% -0.0081% +0.0170%] (p = 0.60 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) low mild 4 (4.00%) high mild 6 (6.00%) high severe f32->i64 time: [32.731 ns 32.734 ns 32.736 ns] change: [-0.0523% -0.0069% +0.0358%] (p = 0.78 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 3 (3.00%) high mild 6 (6.00%) high severe f64->i32 time: [34.740 ns 34.742 ns 34.745 ns] change: [-0.0349% -0.0039% +0.0254%] (p = 0.81 > 0.05) No change in performance detected. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) low mild 6 (6.00%) high mild 8 (8.00%) high severe f64->i64 time: [34.067 ns 34.070 ns 34.073 ns] change: [-0.0274% -0.0002% +0.0312%] (p = 0.99 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 6 (6.00%) high mild 5 (5.00%) high severe
</details>
afonso360 edited a comment on issue #7340:
Oh, that makes much more sense! Weirdly none of these had significant performance differences.
<details>
<summary>Criterion output</summary>afonso@starfive:~$ ./wasmtime-main --dir . ./bench.wasm --bench ... afonso@starfive:~$ ./wasmtime-fflags --dir . ./bench.wasm --bench f32->u32 time: [23.730 ns 23.732 ns 23.734 ns] change: [-0.2177% -0.1872% -0.1582%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 5 (5.00%) high mild 4 (4.00%) high severe f32->u64 time: [23.731 ns 23.732 ns 23.734 ns] change: [-0.0650% -0.0122% +0.0275%] (p = 0.68 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe f64->u32 time: [25.399 ns 25.401 ns 25.403 ns] change: [-0.0281% -0.0043% +0.0162%] (p = 0.72 > 0.05) No change in performance detected. Found 21 outliers among 100 measurements (21.00%) 4 (4.00%) low mild 9 (9.00%) high mild 8 (8.00%) high severe f64->u64 time: [25.068 ns 25.069 ns 25.072 ns] change: [-0.0540% -0.0158% +0.0182%] (p = 0.41 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low mild 7 (7.00%) high mild 5 (5.00%) high severe f32->i32 time: [30.727 ns 30.729 ns 30.732 ns] change: [-0.0390% -0.0081% +0.0170%] (p = 0.60 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) low mild 4 (4.00%) high mild 6 (6.00%) high severe f32->i64 time: [32.731 ns 32.734 ns 32.736 ns] change: [-0.0523% -0.0069% +0.0358%] (p = 0.78 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 3 (3.00%) high mild 6 (6.00%) high severe f64->i32 time: [34.740 ns 34.742 ns 34.745 ns] change: [-0.0349% -0.0039% +0.0254%] (p = 0.81 > 0.05) No change in performance detected. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) low mild 6 (6.00%) high mild 8 (8.00%) high severe f64->i64 time: [34.067 ns 34.070 ns 34.073 ns] change: [-0.0274% -0.0002% +0.0312%] (p = 0.99 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 6 (6.00%) high mild 5 (5.00%) high severe
</details>
alexcrichton commented on issue #7340:
Thanks for collecting those! Sounds like this isn't the best way to go at this time, so I'm going to close this.
Last updated: Nov 22 2024 at 17:03 UTC