jlb6740 commented on issue #5064:
/bench_x64
jlb6740 commented on issue #5064:
/bench_x64
jlb6740 commented on issue #5064:
Change factor shows patch effect on x64 if merged compared to current head for main.
Results are based on clocktick (CT) event cycles. Change Factor = (Patched_CT - Main_CT) / (Main_CT)
A negative change factor means clockticks are expected to be reduced by the patch.
wasm arch phase change_factor benchmarks/blake3-scalar x86_64 Compilation 0.001 benchmarks/blake3-simd x86_64 Compilation -0.008 benchmarks/bz2 x86_64 Compilation -0.006 benchmarks/intgemm-simd x86_64 Compilation 0.004 benchmarks/meshoptimizer x86_64 Compilation 0.008 benchmarks/noop x86_64 Compilation -0.009 benchmarks/pulldown-cmark x86_64 Compilation -0.009 benchmarks/shootout-ackermann x86_64 Compilation 0.007 benchmarks/shootout-base64 x86_64 Compilation 0.022 benchmarks/shootout-ctype x86_64 Compilation -0.006 benchmarks/shootout-ed25519 x86_64 Compilation -0.001 benchmarks/shootout-fib2 x86_64 Compilation 0.018 benchmarks/shootout-gimli x86_64 Compilation 0.002 benchmarks/shootout-heapsort x86_64 Compilation 0.013 benchmarks/shootout-keccak x86_64 Compilation 0.002 benchmarks/shootout-matrix x86_64 Compilation -0.007 benchmarks/shootout-memmove x86_64 Compilation -0.027 benchmarks/shootout-minicsv x86_64 Compilation -0.011 benchmarks/shootout-nestedloop x86_64 Compilation -0.043 benchmarks/shootout-random x86_64 Compilation 0.003 benchmarks/shootout-ratelimit x86_64 Compilation 0.013 benchmarks/shootout-seqhash x86_64 Compilation -0.004 benchmarks/shootout-sieve x86_64 Compilation 0.011 benchmarks/shootout-switch x86_64 Compilation -0.006 benchmarks/shootout-xblabla20 x86_64 Compilation 0.051 benchmarks/shootout-xchacha20 x86_64 Compilation 0.018 benchmarks/spidermonkey x86_64 Compilation -0.002
wasm arch phase change_factor benchmarks/blake3-scalar x86_64 Instantiation 0.013 benchmarks/blake3-simd x86_64 Instantiation 0.035 benchmarks/bz2 x86_64 Instantiation 0.012 benchmarks/intgemm-simd x86_64 Instantiation -0.065 benchmarks/meshoptimizer x86_64 Instantiation 0.017 benchmarks/noop x86_64 Instantiation 0.020 benchmarks/pulldown-cmark x86_64 Instantiation 0.062 benchmarks/shootout-ackermann x86_64 Instantiation 0.022 benchmarks/shootout-base64 x86_64 Instantiation -0.003 benchmarks/shootout-ctype x86_64 Instantiation -0.057 benchmarks/shootout-ed25519 x86_64 Instantiation 0.006 benchmarks/shootout-fib2 x86_64 Instantiation 0.030 benchmarks/shootout-gimli x86_64 Instantiation 0.001 benchmarks/shootout-heapsort x86_64 Instantiation 0.021 benchmarks/shootout-keccak x86_64 Instantiation -0.034 benchmarks/shootout-matrix x86_64 Instantiation -0.083 benchmarks/shootout-memmove x86_64 Instantiation 0.036 benchmarks/shootout-minicsv x86_64 Instantiation -0.019 benchmarks/shootout-nestedloop x86_64 Instantiation -0.013 benchmarks/shootout-random x86_64 Instantiation 0.007 benchmarks/shootout-ratelimit x86_64 Instantiation 0.042 benchmarks/shootout-seqhash x86_64 Instantiation 0.001 benchmarks/shootout-sieve x86_64 Instantiation -0.021 benchmarks/shootout-switch x86_64 Instantiation -0.040 benchmarks/shootout-xblabla20 x86_64 Instantiation 0.025 benchmarks/shootout-xchacha20 x86_64 Instantiation 0.023 benchmarks/spidermonkey x86_64 Instantiation -0.002
wasm arch phase change_factor benchmarks/blake3-scalar x86_64 Execution 0.003 benchmarks/blake3-simd x86_64 Execution -0.014 benchmarks/bz2 x86_64 Execution -0.003 benchmarks/intgemm-simd x86_64 Execution -0.000 benchmarks/meshoptimizer x86_64 Execution -0.001 benchmarks/noop x86_64 Execution 0.008 benchmarks/pulldown-cmark x86_64 Execution -0.002 benchmarks/shootout-ackermann x86_64 Execution 0.110 benchmarks/shootout-base64 x86_64 Execution 0.001 benchmarks/shootout-ctype x86_64 Execution -0.001 benchmarks/shootout-ed25519 x86_64 Execution 0.003 benchmarks/shootout-fib2 x86_64 Execution 0.000 benchmarks/shootout-gimli x86_64 Execution -0.014 benchmarks/shootout-heapsort x86_64 Execution 0.000 benchmarks/shootout-keccak x86_64 Execution -0.001 benchmarks/shootout-matrix x86_64 Execution 0.001 benchmarks/shootout-memmove x86_64 Execution 0.001 benchmarks/shootout-minicsv x86_64 Execution 0.000 benchmarks/shootout-nestedloop x86_64 Execution -0.003 benchmarks/shootout-random x86_64 Execution 0.001 benchmarks/shootout-ratelimit x86_64 Execution 0.003 benchmarks/shootout-seqhash x86_64 Execution -0.014 benchmarks/shootout-sieve x86_64 Execution 0.000 benchmarks/shootout-switch x86_64 Execution -0.000 benchmarks/shootout-xblabla20 x86_64 Execution 0.037 benchmarks/shootout-xchacha20 x86_64 Execution -0.015 benchmarks/spidermonkey x86_64 Execution 0.000 Averages (x64):
|phase|change_factor|
|-|-|
|Compilation|0.001|
|Execution|0.004|
|Instantiation|0.001|
jlb6740 commented on issue #5064:
/bench_x64
jlb6740 commented on issue #5064:
/bench_x64
jlb6740 commented on issue #5064:
/bench_x64
cfallin commented on issue #5064:
Two ideas for bounding stability:
We might want to exclude instantiation time altogether from these runs. I'd prefer not to, from first principles, but they seem to have significantly more variance than the other categories. I suspect this is because instantiation is so much faster (usually) than compilation or execution. It may just be that the platform is not noise-free enough to accurately measure instantiation, and we'll need to benchmark this locally if working to improve it. Curious what others think though (@fitzgen , @alexcrichton, @abrown ?).
Could we run a "no-change test" as a control on every run? Basically, run the baseline twice, and show (i) the delta between the two baselines, and (ii)the delta between the baseline (either one) and the PR's change. We expect to see (in a perfect world) zero change in the control (baseline-to-baseline comparison) and whatever actual change in the diff run. If we see similar swings in both then we can conclude it's more likely noise. Thoughts?
jlb6740 commented on issue #5064:
Change factor shows patch effect on x64 if merged compared to current head for main.
Results are based on clocktick (CT) event cycles. Change Factor = (Patched_CT - Main_CT) / (Main_CT)
A negative change factor means clockticks are expected to be reduced by the patch.
wasm arch phase change_factor benchmarks/blake3-scalar x86_64 Compilation 0.002 benchmarks/blake3-simd x86_64 Compilation 0.002 benchmarks/bz2 x86_64 Compilation 0.001 benchmarks/intgemm-simd x86_64 Compilation 0.002 benchmarks/meshoptimizer x86_64 Compilation 0.002 benchmarks/noop x86_64 Compilation -0.013 benchmarks/pulldown-cmark x86_64 Compilation 0.006 benchmarks/shootout-ackermann x86_64 Compilation 0.002 benchmarks/shootout-base64 x86_64 Compilation 0.000 benchmarks/shootout-ctype x86_64 Compilation -0.001 benchmarks/shootout-ed25519 x86_64 Compilation -0.004 benchmarks/shootout-fib2 x86_64 Compilation 0.004 benchmarks/shootout-gimli x86_64 Compilation -0.000 benchmarks/shootout-heapsort x86_64 Compilation 0.016 benchmarks/shootout-keccak x86_64 Compilation -0.001 benchmarks/shootout-matrix x86_64 Compilation -0.010 benchmarks/shootout-memmove x86_64 Compilation 0.007 benchmarks/shootout-minicsv x86_64 Compilation -0.005 benchmarks/shootout-nestedloop x86_64 Compilation 0.006 benchmarks/shootout-random x86_64 Compilation -0.000 benchmarks/shootout-ratelimit x86_64 Compilation 0.002 benchmarks/shootout-seqhash x86_64 Compilation 0.023 benchmarks/shootout-sieve x86_64 Compilation 0.005 benchmarks/shootout-switch x86_64 Compilation 0.005 benchmarks/shootout-xblabla20 x86_64 Compilation 0.006 benchmarks/shootout-xchacha20 x86_64 Compilation -0.015 benchmarks/spidermonkey x86_64 Compilation -0.003
wasm arch phase change_factor benchmarks/blake3-scalar x86_64 Instantiation 0.002 benchmarks/blake3-simd x86_64 Instantiation -0.008 benchmarks/bz2 x86_64 Instantiation -0.026 benchmarks/intgemm-simd x86_64 Instantiation -0.033 benchmarks/meshoptimizer x86_64 Instantiation -0.002 benchmarks/noop x86_64 Instantiation -0.014 benchmarks/pulldown-cmark x86_64 Instantiation 0.011 benchmarks/shootout-ackermann x86_64 Instantiation 0.017 benchmarks/shootout-base64 x86_64 Instantiation -0.015 benchmarks/shootout-ctype x86_64 Instantiation -0.011 benchmarks/shootout-ed25519 x86_64 Instantiation -0.017 benchmarks/shootout-fib2 x86_64 Instantiation 0.085 benchmarks/shootout-gimli x86_64 Instantiation -0.024 benchmarks/shootout-heapsort x86_64 Instantiation 0.007 benchmarks/shootout-keccak x86_64 Instantiation 0.029 benchmarks/shootout-matrix x86_64 Instantiation 0.010 benchmarks/shootout-memmove x86_64 Instantiation 0.071 benchmarks/shootout-minicsv x86_64 Instantiation 0.044 benchmarks/shootout-nestedloop x86_64 Instantiation -0.017 benchmarks/shootout-random x86_64 Instantiation 0.015 benchmarks/shootout-ratelimit x86_64 Instantiation 0.022 benchmarks/shootout-seqhash x86_64 Instantiation -0.036 benchmarks/shootout-sieve x86_64 Instantiation 0.030 benchmarks/shootout-switch x86_64 Instantiation -0.005 benchmarks/shootout-xblabla20 x86_64 Instantiation -0.024 benchmarks/shootout-xchacha20 x86_64 Instantiation 0.035 benchmarks/spidermonkey x86_64 Instantiation 0.059
wasm arch phase change_factor benchmarks/blake3-scalar x86_64 Execution -0.004 benchmarks/blake3-simd x86_64 Execution -0.023 benchmarks/bz2 x86_64 Execution 0.014 benchmarks/intgemm-simd x86_64 Execution 0.001 benchmarks/meshoptimizer x86_64 Execution -0.000 benchmarks/noop x86_64 Execution 0.049 benchmarks/pulldown-cmark x86_64 Execution 0.005 benchmarks/shootout-ackermann x86_64 Execution 0.026 benchmarks/shootout-base64 x86_64 Execution -0.001 benchmarks/shootout-ctype x86_64 Execution -0.001 benchmarks/shootout-ed25519 x86_64 Execution 0.002 benchmarks/shootout-fib2 x86_64 Execution -0.000 benchmarks/shootout-gimli x86_64 Execution -0.003 benchmarks/shootout-heapsort x86_64 Execution 0.000 benchmarks/shootout-keccak x86_64 Execution 0.001 benchmarks/shootout-matrix x86_64 Execution -0.004 benchmarks/shootout-memmove x86_64 Execution -0.000 benchmarks/shootout-minicsv x86_64 Execution -0.000 benchmarks/shootout-nestedloop x86_64 Execution 0.005 benchmarks/shootout-random x86_64 Execution -0.001 benchmarks/shootout-ratelimit x86_64 Execution -0.009 benchmarks/shootout-seqhash x86_64 Execution -0.003 benchmarks/shootout-sieve x86_64 Execution 0.000 benchmarks/shootout-switch x86_64 Execution 0.000 benchmarks/shootout-xblabla20 x86_64 Execution 0.003 benchmarks/shootout-xchacha20 x86_64 Execution -0.007 benchmarks/spidermonkey x86_64 Execution -0.003 Averages (x64):
|phase|change_factor|
|-|-|
|Compilation|0.001|
|Execution|0.002|
|Instantiation|0.008|
alexcrichton commented on issue #5064:
In my experience even with dedicated hardware I've always had a lot of noise in time-based measurements, so for long-term regression testing which this is intended for would it be possible to measure instructions retired instead of wall-time? (which I think clock-cycles is more-or-less equivalent to). That's what rust-lang/rust uses by deault and instructions are typically quite stable (although not 100% still).
Also, as a minor thing, would it be possible to print the changes as %-based changes instead of factor-based changes?
jlb6740 commented on issue #5064:
Change factor shows patch effect on x64 if merged compared to current head for main.
Results are based on clocktick (CT) event cycles. Change Factor = (Patched_CT - Main_CT) / (Main_CT)
A negative change factor means clockticks are expected to be reduced by the patch.
wasm arch phase change_factor benchmarks/blake3-scalar x86_64 Compilation -0.001 benchmarks/blake3-simd x86_64 Compilation 0.022 benchmarks/bz2 x86_64 Compilation -0.008 benchmarks/intgemm-simd x86_64 Compilation -0.005 benchmarks/meshoptimizer x86_64 Compilation 0.001 benchmarks/noop x86_64 Compilation 0.027 benchmarks/pulldown-cmark x86_64 Compilation -0.000 benchmarks/shootout-ackermann x86_64 Compilation 0.005 benchmarks/shootout-base64 x86_64 Compilation -0.000 benchmarks/shootout-ctype x86_64 Compilation 0.018 benchmarks/shootout-ed25519 x86_64 Compilation -0.007 benchmarks/shootout-fib2 x86_64 Compilation 0.006 benchmarks/shootout-gimli x86_64 Compilation -0.012 benchmarks/shootout-heapsort x86_64 Compilation 0.006 benchmarks/shootout-keccak x86_64 Compilation -0.000 benchmarks/shootout-matrix x86_64 Compilation -0.021 benchmarks/shootout-memmove x86_64 Compilation 0.012 benchmarks/shootout-minicsv x86_64 Compilation -0.028 benchmarks/shootout-nestedloop x86_64 Compilation -0.007 benchmarks/shootout-random x86_64 Compilation 0.007 benchmarks/shootout-ratelimit x86_64 Compilation 0.011 benchmarks/shootout-seqhash x86_64 Compilation -0.011 benchmarks/shootout-sieve x86_64 Compilation 0.005 benchmarks/shootout-switch x86_64 Compilation 0.001 benchmarks/shootout-xblabla20 x86_64 Compilation -0.005 benchmarks/shootout-xchacha20 x86_64 Compilation 0.004 benchmarks/spidermonkey x86_64 Compilation -0.005
wasm arch phase change_factor benchmarks/blake3-scalar x86_64 Instantiation -0.014 benchmarks/blake3-simd x86_64 Instantiation 0.010 benchmarks/bz2 x86_64 Instantiation 0.038 benchmarks/intgemm-simd x86_64 Instantiation -0.035 benchmarks/meshoptimizer x86_64 Instantiation 0.017 benchmarks/noop x86_64 Instantiation -0.013 benchmarks/pulldown-cmark x86_64 Instantiation 0.039 benchmarks/shootout-ackermann x86_64 Instantiation -0.004 benchmarks/shootout-base64 x86_64 Instantiation -0.010 benchmarks/shootout-ctype x86_64 Instantiation 0.031 benchmarks/shootout-ed25519 x86_64 Instantiation 0.031 benchmarks/shootout-fib2 x86_64 Instantiation 0.028 benchmarks/shootout-gimli x86_64 Instantiation -0.102 benchmarks/shootout-heapsort x86_64 Instantiation -0.040 benchmarks/shootout-keccak x86_64 Instantiation 0.012 benchmarks/shootout-matrix x86_64 Instantiation 0.045 benchmarks/shootout-memmove x86_64 Instantiation -0.025 benchmarks/shootout-minicsv x86_64 Instantiation 0.085 benchmarks/shootout-nestedloop x86_64 Instantiation 0.042 benchmarks/shootout-random x86_64 Instantiation 0.031 benchmarks/shootout-ratelimit x86_64 Instantiation 0.037 benchmarks/shootout-seqhash x86_64 Instantiation 0.008 benchmarks/shootout-sieve x86_64 Instantiation 0.005 benchmarks/shootout-switch x86_64 Instantiation 0.050 benchmarks/shootout-xblabla20 x86_64 Instantiation -0.015 benchmarks/shootout-xchacha20 x86_64 Instantiation -0.020 benchmarks/spidermonkey x86_64 Instantiation -0.033
wasm arch phase change_factor benchmarks/blake3-scalar x86_64 Execution -0.014 benchmarks/blake3-simd x86_64 Execution 0.006 benchmarks/bz2 x86_64 Execution 0.017 benchmarks/intgemm-simd x86_64 Execution 0.000 benchmarks/meshoptimizer x86_64 Execution 0.002 benchmarks/noop x86_64 Execution 0.067 benchmarks/pulldown-cmark x86_64 Execution 0.003 benchmarks/shootout-ackermann x86_64 Execution -0.191 benchmarks/shootout-base64 x86_64 Execution -0.002 benchmarks/shootout-ctype x86_64 Execution -0.000 benchmarks/shootout-ed25519 x86_64 Execution -0.001 benchmarks/shootout-fib2 x86_64 Execution 0.000 benchmarks/shootout-gimli x86_64 Execution -0.057 benchmarks/shootout-heapsort x86_64 Execution 0.000 benchmarks/shootout-keccak x86_64 Execution -0.001 benchmarks/shootout-matrix x86_64 Execution 0.000 benchmarks/shootout-memmove x86_64 Execution -0.000 benchmarks/shootout-minicsv x86_64 Execution 0.000 benchmarks/shootout-nestedloop x86_64 Execution -0.000 benchmarks/shootout-random x86_64 Execution 0.000 benchmarks/shootout-ratelimit x86_64 Execution 0.003 benchmarks/shootout-seqhash x86_64 Execution -0.000 benchmarks/shootout-sieve x86_64 Execution 0.001 benchmarks/shootout-switch x86_64 Execution -0.000 benchmarks/shootout-xblabla20 x86_64 Execution -0.009 benchmarks/shootout-xchacha20 x86_64 Execution -0.003 benchmarks/spidermonkey x86_64 Execution -0.000 Averages (x64):
|phase|change_factor|
|-|-|
|Compilation|0.001|
|Execution|-0.007|
|Instantiation|0.007|
jlb6740 commented on issue #5064:
Change factor shows patch effect on x64 if merged compared to current head for main.
Results are based on clocktick (CT) event cycles. Change Factor = (Patched_CT - Main_CT) / (Main_CT)
A negative change factor means clockticks are expected to be reduced by the patch.
wasm arch phase change_factor benchmarks/blake3-scalar x86_64 Compilation 0.012 benchmarks/blake3-simd x86_64 Compilation -0.011 benchmarks/bz2 x86_64 Compilation -0.008 benchmarks/intgemm-simd x86_64 Compilation 0.001 benchmarks/meshoptimizer x86_64 Compilation 0.008 benchmarks/noop x86_64 Compilation 0.004 benchmarks/pulldown-cmark x86_64 Compilation 0.022 benchmarks/shootout-ackermann x86_64 Compilation -0.029 benchmarks/shootout-base64 x86_64 Compilation -0.009 benchmarks/shootout-ctype x86_64 Compilation -0.001 benchmarks/shootout-ed25519 x86_64 Compilation -0.003 benchmarks/shootout-fib2 x86_64 Compilation 0.003 benchmarks/shootout-gimli x86_64 Compilation 0.039 benchmarks/shootout-heapsort x86_64 Compilation -0.031 benchmarks/shootout-keccak x86_64 Compilation 0.003 benchmarks/shootout-matrix x86_64 Compilation 0.009 benchmarks/shootout-memmove x86_64 Compilation -0.003 benchmarks/shootout-minicsv x86_64 Compilation -0.002 benchmarks/shootout-nestedloop x86_64 Compilation 0.000 benchmarks/shootout-random x86_64 Compilation 0.006 benchmarks/shootout-ratelimit x86_64 Compilation 0.001 benchmarks/shootout-seqhash x86_64 Compilation -0.006 benchmarks/shootout-sieve x86_64 Compilation -0.004 benchmarks/shootout-switch x86_64 Compilation 0.012 benchmarks/shootout-xblabla20 x86_64 Compilation 0.006 benchmarks/shootout-xchacha20 x86_64 Compilation -0.003 benchmarks/spidermonkey x86_64 Compilation -0.005
wasm arch phase change_factor benchmarks/blake3-scalar x86_64 Instantiation 0.009 benchmarks/blake3-simd x86_64 Instantiation 0.070 benchmarks/bz2 x86_64 Instantiation 0.006 benchmarks/intgemm-simd x86_64 Instantiation 0.047 benchmarks/meshoptimizer x86_64 Instantiation 0.031 benchmarks/noop x86_64 Instantiation 0.030 benchmarks/pulldown-cmark x86_64 Instantiation -0.037 benchmarks/shootout-ackermann x86_64 Instantiation 0.010 benchmarks/shootout-base64 x86_64 Instantiation 0.035 benchmarks/shootout-ctype x86_64 Instantiation -0.015 benchmarks/shootout-ed25519 x86_64 Instantiation -0.025 benchmarks/shootout-fib2 x86_64 Instantiation -0.027 benchmarks/shootout-gimli x86_64 Instantiation 0.034 benchmarks/shootout-heapsort x86_64 Instantiation 0.044 benchmarks/shootout-keccak x86_64 Instantiation -0.023 benchmarks/shootout-matrix x86_64 Instantiation 0.036 benchmarks/shootout-memmove x86_64 Instantiation 0.043 benchmarks/shootout-minicsv x86_64 Instantiation -0.064 benchmarks/shootout-nestedloop x86_64 Instantiation -0.015 benchmarks/shootout-random x86_64 Instantiation -0.025 benchmarks/shootout-ratelimit x86_64 Instantiation 0.014 benchmarks/shootout-seqhash x86_64 Instantiation 0.012 benchmarks/shootout-sieve x86_64 Instantiation 0.025 benchmarks/shootout-switch x86_64 Instantiation -0.070 benchmarks/shootout-xblabla20 x86_64 Instantiation -0.021 benchmarks/shootout-xchacha20 x86_64 Instantiation -0.018 benchmarks/spidermonkey x86_64 Instantiation -0.059
wasm arch phase change_factor benchmarks/blake3-scalar x86_64 Execution -0.023 benchmarks/blake3-simd x86_64 Execution 0.001 benchmarks/bz2 x86_64 Execution -0.004 benchmarks/intgemm-simd x86_64 Execution -0.002 benchmarks/meshoptimizer x86_64 Execution 0.001 benchmarks/noop x86_64 Execution 0.100 benchmarks/pulldown-cmark x86_64 Execution -0.003 benchmarks/shootout-ackermann x86_64 Execution -0.080 benchmarks/shootout-base64 x86_64 Execution -0.001 benchmarks/shootout-ctype x86_64 Execution -0.000 benchmarks/shootout-ed25519 x86_64 Execution -0.000 benchmarks/shootout-fib2 x86_64 Execution -0.000 benchmarks/shootout-gimli x86_64 Execution -0.015 benchmarks/shootout-heapsort x86_64 Execution -0.000 benchmarks/shootout-keccak x86_64 Execution -0.004 benchmarks/shootout-matrix x86_64 Execution -0.001 benchmarks/shootout-memmove x86_64 Execution 0.000 benchmarks/shootout-minicsv x86_64 Execution -0.001 benchmarks/shootout-nestedloop x86_64 Execution 0.003 benchmarks/shootout-random x86_64 Execution -0.000 benchmarks/shootout-ratelimit x86_64 Execution 0.003 benchmarks/shootout-seqhash x86_64 Execution 0.009 benchmarks/shootout-sieve x86_64 Execution 0.000 benchmarks/shootout-switch x86_64 Execution -0.001 benchmarks/shootout-xblabla20 x86_64 Execution 0.003 benchmarks/shootout-xchacha20 x86_64 Execution 0.003 benchmarks/spidermonkey x86_64 Execution 0.002 Averages (x64):
|phase|change_factor|
|-|-|
|Compilation|0.001|
|Execution|-0.000|
|Instantiation|0.002|
fitzgen commented on issue #5064:
Two ideas for bounding stability:
* We might want to exclude instantiation time altogether from these runs. I'd prefer not to, from first principles, but they seem to have significantly more variance than the other categories. I suspect this is because instantiation is so much faster (usually) than compilation or execution. It may just be that the platform is not noise-free enough to accurately measure instantiation, and we'll need to benchmark this locally if working to improve it. Curious what others think though (@fitzgen , @alexcrichton, @abrown ?).
Seems fine to exclude instantiation. We have decent instantiation benchmarks in criterion anyways.
* Could we run a "no-change test" as a control on every run? Basically, run the baseline twice, and show (i) the delta between the two baselines, and (ii)the delta between the baseline (either one) and the PR's change. We expect to see (in a perfect world) zero change in the control (baseline-to-baseline comparison) and whatever actual change in the diff run. If we see similar swings in both then we can conclude it's more likely noise. Thoughts?
This is more something for the
sightglass-analysis
crate than the github bot, IMO. The github bot shouldn't be growing anything other than what is needed to runsightglass
on the server, authenticate who is allowed to do that, and report the results back. All the details of actually running benchmarks and doing analysis on them should be insightglass
itself.
fitzgen edited a comment on issue #5064:
Two ideas for bounding stability:
- We might want to exclude instantiation time altogether from these runs. I'd prefer not to, from first principles, but they seem to have significantly more variance than the other categories. I suspect this is because instantiation is so much faster (usually) than compilation or execution. It may just be that the platform is not noise-free enough to accurately measure instantiation, and we'll need to benchmark this locally if working to improve it. Curious what others think though (@fitzgen , @alexcrichton, @abrown ?).
Seems fine to exclude instantiation. We have decent instantiation benchmarks in criterion anyways.
- Could we run a "no-change test" as a control on every run? Basically, run the baseline twice, and show (i) the delta between the two baselines, and (ii)the delta between the baseline (either one) and the PR's change. We expect to see (in a perfect world) zero change in the control (baseline-to-baseline comparison) and whatever actual change in the diff run. If we see similar swings in both then we can conclude it's more likely noise. Thoughts?
This is more something for the
sightglass-analysis
crate than the github bot, IMO. The github bot shouldn't be growing anything other than what is needed to runsightglass
on the server, authenticate who is allowed to do that, and report the results back. All the details of actually running benchmarks and doing analysis on them should be insightglass
itself.
cfallin commented on issue #5064:
This is more something for the sightglass-analysis crate than the github bot, IMO. The github bot shouldn't be growing anything other than what is needed to run sightglass on the server, authenticate who is allowed to do that, and report the results back. All the details of actually running benchmarks and doing analysis on them should be in sightglass itself.
Yeah, that's a good point actually; I agree. My main concern was that we have trustworthy results and actually using the confidence-interval computation is the best way of doing that.
cfallin commented on issue #5064:
(And following on that a bit more, I guess what I really want is to sort of build up trust in the tool from first principles -- that's what I was trying to get at with the null-diff control; so perhaps this is a way we can validate the confidence interval reporting, when we get it integrated. If we submit an empty PR and benchmark it, we should see "no statistical difference" everywhere, or else we have a stats bug)
cfallin edited a comment on issue #5064:
(And following on that a bit more, I guess what I really want is to sort of build up trust in the tool from first principles -- that's what I was trying to get at with the null-diff control; so perhaps this is a way we can validate the confidence interval reporting, when we get it integrated. If we submit an empty PR and benchmark it, we should see "no statistical difference" everywhere, or else we have a stats or configuration/setup bug)
fitzgen commented on issue #5064:
(Note that the probability of a false positive is 1% (due to our default significance level) but this is per test and we do 3 tests per Wasm input so we only need to have ~33 Wasm inputs to expect one false positive per benchmark run. One of the many reasons to choose our Wasm inputs carefully.)
Last updated: Jan 24 2025 at 00:11 UTC