Stream: git-wasmtime

Topic: wasmtime / issue #5060 Update format of benchmark results


view this post on Zulip Wasmtime GitHub notifications bot (Oct 17 2022 at 18:33):

jlb6740 commented on issue #5060:

/bench_x64

view this post on Zulip Wasmtime GitHub notifications bot (Oct 17 2022 at 18:41):

jlb6740 commented on issue #5060:

Change factor shows patch effect on x64 if merged compared to current head for main.

Results are based on clocktick (CT) event cycles. Change Factor = (Patched_CT - Main_CT) / (Main_CT)
A negative change factor means clockticks are expected to be reduced by the patch.

wasm arch phase change_factor
benchmarks/blake3-scalar x86_64 Compilation -0.055
benchmarks/blake3-simd x86_64 Compilation -0.018
benchmarks/bz2 x86_64 Compilation -0.017
benchmarks/intgemm-simd x86_64 Compilation 0.000
benchmarks/meshoptimizer x86_64 Compilation -0.004
benchmarks/noop x86_64 Compilation -0.001
benchmarks/pulldown-cmark x86_64 Compilation -0.062
benchmarks/shootout-ackermann x86_64 Compilation 0.019
benchmarks/shootout-base64 x86_64 Compilation 0.004
benchmarks/shootout-ctype x86_64 Compilation -0.018
benchmarks/shootout-ed25519 x86_64 Compilation 0.000
benchmarks/shootout-fib2 x86_64 Compilation -0.014
benchmarks/shootout-gimli x86_64 Compilation 0.051
benchmarks/shootout-heapsort x86_64 Compilation 0.012
benchmarks/shootout-keccak x86_64 Compilation 0.027
benchmarks/shootout-matrix x86_64 Compilation -0.017
benchmarks/shootout-memmove x86_64 Compilation -0.022
benchmarks/shootout-minicsv x86_64 Compilation 0.016
benchmarks/shootout-nestedloop x86_64 Compilation -0.026
benchmarks/shootout-random x86_64 Compilation 0.027
benchmarks/shootout-ratelimit x86_64 Compilation -0.013
benchmarks/shootout-seqhash x86_64 Compilation 0.306
benchmarks/shootout-sieve x86_64 Compilation 0.015
benchmarks/shootout-switch x86_64 Compilation 0.054
benchmarks/shootout-xblabla20 x86_64 Compilation 0.033
benchmarks/shootout-xchacha20 x86_64 Compilation -0.086
benchmarks/spidermonkey x86_64 Compilation -0.013
wasm arch phase change_factor
benchmarks/blake3-scalar x86_64 Instantiation 0.071
benchmarks/blake3-simd x86_64 Instantiation -0.037
benchmarks/bz2 x86_64 Instantiation -0.019
benchmarks/intgemm-simd x86_64 Instantiation 0.322
benchmarks/meshoptimizer x86_64 Instantiation 0.075
benchmarks/noop x86_64 Instantiation -0.066
benchmarks/pulldown-cmark x86_64 Instantiation -0.043
benchmarks/shootout-ackermann x86_64 Instantiation 0.003
benchmarks/shootout-base64 x86_64 Instantiation 0.065
benchmarks/shootout-ctype x86_64 Instantiation 0.049
benchmarks/shootout-ed25519 x86_64 Instantiation 0.010
benchmarks/shootout-fib2 x86_64 Instantiation -0.060
benchmarks/shootout-gimli x86_64 Instantiation 0.025
benchmarks/shootout-heapsort x86_64 Instantiation 0.008
benchmarks/shootout-keccak x86_64 Instantiation -0.030
benchmarks/shootout-matrix x86_64 Instantiation 0.039
benchmarks/shootout-memmove x86_64 Instantiation 0.005
benchmarks/shootout-minicsv x86_64 Instantiation -0.003
benchmarks/shootout-nestedloop x86_64 Instantiation 0.072
benchmarks/shootout-random x86_64 Instantiation -0.003
benchmarks/shootout-ratelimit x86_64 Instantiation 0.021
benchmarks/shootout-seqhash x86_64 Instantiation -0.111
benchmarks/shootout-sieve x86_64 Instantiation -0.036
benchmarks/shootout-switch x86_64 Instantiation 0.017
benchmarks/shootout-xblabla20 x86_64 Instantiation 0.022
benchmarks/shootout-xchacha20 x86_64 Instantiation -0.049
benchmarks/spidermonkey x86_64 Instantiation -0.147
wasm arch phase change_factor
benchmarks/blake3-scalar x86_64 Execution 0.003
benchmarks/blake3-simd x86_64 Execution -0.011
benchmarks/bz2 x86_64 Execution -0.033
benchmarks/intgemm-simd x86_64 Execution -0.007
benchmarks/meshoptimizer x86_64 Execution 0.000
benchmarks/noop x86_64 Execution 0.454
benchmarks/pulldown-cmark x86_64 Execution -0.002
benchmarks/shootout-ackermann x86_64 Execution -0.350
benchmarks/shootout-base64 x86_64 Execution 0.003
benchmarks/shootout-ctype x86_64 Execution 0.003
benchmarks/shootout-ed25519 x86_64 Execution -0.000
benchmarks/shootout-fib2 x86_64 Execution -0.000
benchmarks/shootout-gimli x86_64 Execution -0.017
benchmarks/shootout-heapsort x86_64 Execution 0.000
benchmarks/shootout-keccak x86_64 Execution -0.006
benchmarks/shootout-matrix x86_64 Execution 0.001
benchmarks/shootout-memmove x86_64 Execution 0.002
benchmarks/shootout-minicsv x86_64 Execution -0.001
benchmarks/shootout-nestedloop x86_64 Execution 0.003
benchmarks/shootout-random x86_64 Execution 0.003
benchmarks/shootout-ratelimit x86_64 Execution -0.000
benchmarks/shootout-seqhash x86_64 Execution 0.004
benchmarks/shootout-sieve x86_64 Execution -0.001
benchmarks/shootout-switch x86_64 Execution -0.001
benchmarks/shootout-xblabla20 x86_64 Execution -0.025
benchmarks/shootout-xchacha20 x86_64 Execution 0.010
benchmarks/spidermonkey x86_64 Execution -0.000

Averages (x64):
|phase|change_factor|
|-|-|
|Compilation|0.007|
|Execution|0.001|
|Instantiation|0.007|

view this post on Zulip Wasmtime GitHub notifications bot (Oct 17 2022 at 19:47):

jlb6740 commented on issue #5060:

@cfallin I think this is the one to address some previous comments. Note, https://github.com/bytecodealliance/wasmtime/pull/5064 is a separate PR to look at better stabilizing results.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 17 2022 at 20:04):

cfallin commented on issue #5060:

@jlb6740 a few thoughts:

view this post on Zulip Wasmtime GitHub notifications bot (Oct 17 2022 at 20:26):

jlb6740 commented on issue #5060:

@cfallin. These are averages of what we decided to call the change factor (python pandas calls it percent change). I definetly don't think we want to take the geomean of these numbers. The numbers are already a percentage (and not based on a diff) .. so I think taking the arithmetic average is appropriate there. Note, I updated the comments to include the formula that is used:

Some Factor = (Patched_CT - Main_CT) / (Main_CT)
or
Patched Clock Ticks = Main Clock Ticks + (Main Clock Ticks * (Some Factor))

This makes sense to me, but multiplying "Some Factor" by 100 would break this formula.
0.003 factor is not the same as .3%.
1.003 factor is equivalent to .3%

view this post on Zulip Wasmtime GitHub notifications bot (Oct 17 2022 at 20:33):

jlb6740 commented on issue #5060:

As far as the effect size I think that is calculated in sightglass so should be doable. That said, I've found it takes a minute to noodle on these changes just because it takes so long to run and you don't know if you've broken something with your changes or to know the format is not quite like you want it. I don't want this to languish for a week while I attend to other things. Is it OK to do these other updates in another iteration? But of course close on the labels and formula here.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 17 2022 at 20:33):

cfallin commented on issue #5060:

These are averages of what we decided to call the change factor (python pandas calls it percent change). I definetly don't think we want to take the geomean of these numbers. The numbers are already a percentage (and not based on a diff) .. so I think taking the arithmetic average is appropriate there.

Ah, right, a geomean is warranted if raw ratio of runtime, sorry; I had been thinking in those terms and not fractional-change terms. (Can we call it "arithmetic mean" in the header then to specify to the reader which it is?)

Some Factor = (Patched_CT - Main_CT) / (Main_CT)

This makes sense to me, but multiplying "Some Factor" by 100 would break this formula.
0.003 factor is not the same as .3%.
1.003 factor is equivalent to .3%

Isn't this just the definition of fractional change? The - Main_CT / Main_CT term is effectively subtracting one (the above formula can be rearranged to (Patched_CT / Main_CT) - 1). So if this formula reports a result of 0.003, then that is a multiplicative factor of 1.003 from old to new, and that is a 0.3% shift. In other words, if I have a runtime ratio of 1.003, then the run got 0.3% slower; to go from 1.003 to 0.3%, I subtract one and multiply by 100.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 17 2022 at 20:44):

jlb6740 commented on issue #5060:

Ok .. I may be confusing something here but this is how I see it:

Patch = Main (1 + Factor) is what the current formula states.
So .. let's say Main is 100 and Factor is -.003 then the Patch Clockticks would decrease to (100 - .3) = 99.7

If we take the same thing with %
So .. let's say Main is 100 and Percentage is -.3% then the Patch Clockticks would decrease to (100 * .3) = 99.7

Ok .. this is why I don't like percentages :grinning_face_with_smiling_eyes:. I think you are right. Hopefully I can easily change this factor to percentage in pandas with ease. I'll just make that change as I do think it is easier to read.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 17 2022 at 20:49):

jlb6740 commented on issue #5060:

/bench_x64

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 02:32):

jlb6740 commented on issue #5060:

/bench_x64

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 05:11):

jlb6740 commented on issue #5060:

/bench_x64

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 05:25):

jlb6740 commented on issue #5060:

Change factor shows patch effect on x64 if merged compared to current head for main.

Results are based on clocktick (CT) event cycles. Change Factor = (Patched_CT - Main_CT) / (Main_CT)
A negative change factor means clockticks are expected to be reduced by the patch.

wasm arch phase change_factor
benchmarks/blake3-scalar x86_64 Compilation -2.067%
benchmarks/blake3-simd x86_64 Compilation -9.731%
benchmarks/bz2 x86_64 Compilation 5.358%
benchmarks/intgemm-simd x86_64 Compilation 0.691%
benchmarks/meshoptimizer x86_64 Compilation -1.587%
benchmarks/noop x86_64 Compilation -1.157%
benchmarks/pulldown-cmark x86_64 Compilation -7.943%
benchmarks/shootout-ackermann x86_64 Compilation -3.513%
benchmarks/shootout-base64 x86_64 Compilation 4.014%
benchmarks/shootout-ctype x86_64 Compilation -1.046%
benchmarks/shootout-ed25519 x86_64 Compilation 0.191%
benchmarks/shootout-fib2 x86_64 Compilation -1.799%
benchmarks/shootout-gimli x86_64 Compilation -1.431%
benchmarks/shootout-heapsort x86_64 Compilation 2.065%
benchmarks/shootout-keccak x86_64 Compilation -2.097%
benchmarks/shootout-matrix x86_64 Compilation -5.134%
benchmarks/shootout-memmove x86_64 Compilation -2.702%
benchmarks/shootout-minicsv x86_64 Compilation -2.506%
benchmarks/shootout-nestedloop x86_64 Compilation -2.062%
benchmarks/shootout-random x86_64 Compilation -0.815%
benchmarks/shootout-ratelimit x86_64 Compilation -0.740%
benchmarks/shootout-seqhash x86_64 Compilation 9.665%
benchmarks/shootout-sieve x86_64 Compilation 0.483%
benchmarks/shootout-switch x86_64 Compilation -0.386%
benchmarks/shootout-xblabla20 x86_64 Compilation -3.094%
benchmarks/shootout-xchacha20 x86_64 Compilation -5.356%
benchmarks/spidermonkey x86_64 Compilation -0.783%
wasm arch phase change_factor
benchmarks/blake3-scalar x86_64 Instantiation 1.770%
benchmarks/blake3-simd x86_64 Instantiation -21.610%
benchmarks/bz2 x86_64 Instantiation 6.219%
benchmarks/intgemm-simd x86_64 Instantiation -2.431%
benchmarks/meshoptimizer x86_64 Instantiation -1.874%
benchmarks/noop x86_64 Instantiation -4.099%
benchmarks/pulldown-cmark x86_64 Instantiation -7.633%
benchmarks/shootout-ackermann x86_64 Instantiation 0.835%
benchmarks/shootout-base64 x86_64 Instantiation 4.207%
benchmarks/shootout-ctype x86_64 Instantiation 0.682%
benchmarks/shootout-ed25519 x86_64 Instantiation -1.963%
benchmarks/shootout-fib2 x86_64 Instantiation 1.302%
benchmarks/shootout-gimli x86_64 Instantiation 11.633%
benchmarks/shootout-heapsort x86_64 Instantiation 5.706%
benchmarks/shootout-keccak x86_64 Instantiation 11.342%
benchmarks/shootout-matrix x86_64 Instantiation 25.500%
benchmarks/shootout-memmove x86_64 Instantiation 9.481%
benchmarks/shootout-minicsv x86_64 Instantiation 0.009%
benchmarks/shootout-nestedloop x86_64 Instantiation 1.024%
benchmarks/shootout-random x86_64 Instantiation -2.402%
benchmarks/shootout-ratelimit x86_64 Instantiation -4.735%
benchmarks/shootout-seqhash x86_64 Instantiation 1.122%
benchmarks/shootout-sieve x86_64 Instantiation 3.037%
benchmarks/shootout-switch x86_64 Instantiation -2.586%
benchmarks/shootout-xblabla20 x86_64 Instantiation 2.925%
benchmarks/shootout-xchacha20 x86_64 Instantiation 5.166%
benchmarks/spidermonkey x86_64 Instantiation -2.497%
wasm arch phase change_factor
benchmarks/blake3-scalar x86_64 Execution -0.293%
benchmarks/blake3-simd x86_64 Execution -18.240%
benchmarks/bz2 x86_64 Execution -0.483%
benchmarks/intgemm-simd x86_64 Execution -0.143%
benchmarks/meshoptimizer x86_64 Execution -0.128%
benchmarks/noop x86_64 Execution -31.053%
benchmarks/pulldown-cmark x86_64 Execution 0.558%
benchmarks/shootout-ackermann x86_64 Execution -38.342%
benchmarks/shootout-base64 x86_64 Execution -0.241%
benchmarks/shootout-ctype x86_64 Execution 0.106%
benchmarks/shootout-ed25519 x86_64 Execution -0.239%
benchmarks/shootout-fib2 x86_64 Execution -0.129%
benchmarks/shootout-gimli x86_64 Execution 2.898%
benchmarks/shootout-heapsort x86_64 Execution 0.041%
benchmarks/shootout-keccak x86_64 Execution 0.485%
benchmarks/shootout-matrix x86_64 Execution -0.063%
benchmarks/shootout-memmove x86_64 Execution 0.026%
benchmarks/shootout-minicsv x86_64 Execution -0.130%
benchmarks/shootout-nestedloop x86_64 Execution -0.345%
benchmarks/shootout-random x86_64 Execution 0.014%
benchmarks/shootout-ratelimit x86_64 Execution 0.223%
benchmarks/shootout-seqhash x86_64 Execution -0.153%
benchmarks/shootout-sieve x86_64 Execution 0.215%
benchmarks/shootout-switch x86_64 Execution 0.132%
benchmarks/shootout-xblabla20 x86_64 Execution -3.346%
benchmarks/shootout-xchacha20 x86_64 Execution -1.062%
benchmarks/spidermonkey x86_64 Execution 0.034%

Averages (x64):
|phase|change_factor|
|-|-|
|Compilation|-1.240%|
|Execution|-3.321%|
|Instantiation|1.486%|

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 05:35):

jlb6740 commented on issue #5060:

/bench_x64

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 05:42):

jlb6740 commented on issue #5060:

Change factor shows patch effect on x64 if merged compared to current head for main.

Results are based on clocktick (CT) event cycles. Change Factor = (Patched_CT - Main_CT) / (Main_CT)
A negative change factor means clockticks are expected to be reduced by the patch.

wasm arch phase change_factor
benchmarks/blake3-scalar x86_64 Compilation -1.983%
benchmarks/blake3-simd x86_64 Compilation 4.863%
benchmarks/bz2 x86_64 Compilation 0.741%
benchmarks/intgemm-simd x86_64 Compilation 0.432%
benchmarks/meshoptimizer x86_64 Compilation 1.810%
benchmarks/noop x86_64 Compilation 0.412%
benchmarks/pulldown-cmark x86_64 Compilation -3.803%
benchmarks/shootout-ackermann x86_64 Compilation -1.363%
benchmarks/shootout-base64 x86_64 Compilation 2.021%
benchmarks/shootout-ctype x86_64 Compilation 5.066%
benchmarks/shootout-ed25519 x86_64 Compilation -1.554%
benchmarks/shootout-fib2 x86_64 Compilation 1.328%
benchmarks/shootout-gimli x86_64 Compilation -3.708%
benchmarks/shootout-heapsort x86_64 Compilation -2.336%
benchmarks/shootout-keccak x86_64 Compilation -0.795%
benchmarks/shootout-matrix x86_64 Compilation -1.155%
benchmarks/shootout-memmove x86_64 Compilation 6.049%
benchmarks/shootout-minicsv x86_64 Compilation 0.892%
benchmarks/shootout-nestedloop x86_64 Compilation 2.154%
benchmarks/shootout-random x86_64 Compilation -1.150%
benchmarks/shootout-ratelimit x86_64 Compilation -5.721%
benchmarks/shootout-seqhash x86_64 Compilation -0.942%
benchmarks/shootout-sieve x86_64 Compilation -1.220%
benchmarks/shootout-switch x86_64 Compilation -0.195%
benchmarks/shootout-xblabla20 x86_64 Compilation -1.808%
benchmarks/shootout-xchacha20 x86_64 Compilation -3.225%
benchmarks/spidermonkey x86_64 Compilation 1.566%
wasm arch phase change_factor
benchmarks/blake3-scalar x86_64 Instantiation 14.789%
benchmarks/blake3-simd x86_64 Instantiation -6.792%
benchmarks/bz2 x86_64 Instantiation 7.797%
benchmarks/intgemm-simd x86_64 Instantiation 54.875%
benchmarks/meshoptimizer x86_64 Instantiation 19.314%
benchmarks/noop x86_64 Instantiation 6.291%
benchmarks/pulldown-cmark x86_64 Instantiation -0.683%
benchmarks/shootout-ackermann x86_64 Instantiation 9.086%
benchmarks/shootout-base64 x86_64 Instantiation 8.598%
benchmarks/shootout-ctype x86_64 Instantiation 5.321%
benchmarks/shootout-ed25519 x86_64 Instantiation -1.979%
benchmarks/shootout-fib2 x86_64 Instantiation 2.882%
benchmarks/shootout-gimli x86_64 Instantiation 14.625%
benchmarks/shootout-heapsort x86_64 Instantiation 6.266%
benchmarks/shootout-keccak x86_64 Instantiation -2.256%
benchmarks/shootout-matrix x86_64 Instantiation -1.146%
benchmarks/shootout-memmove x86_64 Instantiation 2.591%
benchmarks/shootout-minicsv x86_64 Instantiation -15.002%
benchmarks/shootout-nestedloop x86_64 Instantiation 2.356%
benchmarks/shootout-random x86_64 Instantiation -0.031%
benchmarks/shootout-ratelimit x86_64 Instantiation -0.988%
benchmarks/shootout-seqhash x86_64 Instantiation 22.805%
benchmarks/shootout-sieve x86_64 Instantiation 5.798%
benchmarks/shootout-switch x86_64 Instantiation 5.490%
benchmarks/shootout-xblabla20 x86_64 Instantiation 3.263%
benchmarks/shootout-xchacha20 x86_64 Instantiation 4.046%
benchmarks/spidermonkey x86_64 Instantiation 0.671%
wasm arch phase change_factor
benchmarks/blake3-scalar x86_64 Execution -0.843%
benchmarks/blake3-simd x86_64 Execution 0.142%
benchmarks/bz2 x86_64 Execution 2.205%
benchmarks/intgemm-simd x86_64 Execution -0.128%
benchmarks/meshoptimizer x86_64 Execution -0.003%
benchmarks/noop x86_64 Execution 9.814%
benchmarks/pulldown-cmark x86_64 Execution -1.530%
benchmarks/shootout-ackermann x86_64 Execution 4.071%
benchmarks/shootout-base64 x86_64 Execution 0.085%
benchmarks/shootout-ctype x86_64 Execution 0.046%
benchmarks/shootout-ed25519 x86_64 Execution -0.558%
benchmarks/shootout-fib2 x86_64 Execution 0.010%
benchmarks/shootout-gimli x86_64 Execution 0.266%
benchmarks/shootout-heapsort x86_64 Execution 0.027%
benchmarks/shootout-keccak x86_64 Execution -0.302%
benchmarks/shootout-matrix x86_64 Execution -0.601%
benchmarks/shootout-memmove x86_64 Execution -0.028%
benchmarks/shootout-minicsv x86_64 Execution 1.168%
benchmarks/shootout-nestedloop x86_64 Execution 0.272%
benchmarks/shootout-random x86_64 Execution 0.189%
benchmarks/shootout-ratelimit x86_64 Execution -0.064%
benchmarks/shootout-seqhash x86_64 Execution 0.254%
benchmarks/shootout-sieve x86_64 Execution 0.099%
benchmarks/shootout-switch x86_64 Execution -0.192%
benchmarks/shootout-xblabla20 x86_64 Execution -2.433%
benchmarks/shootout-xchacha20 x86_64 Execution 0.044%
benchmarks/spidermonkey x86_64 Execution -0.012%

Averages (x64):
|phase|change_factor|
|-|:-:|
|Compilation|-0.134%|
|Execution|0.444%|
|Instantiation|6.222%|

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 05:52):

jlb6740 commented on issue #5060:

@cfallin .. Ok. I think this is in the direction of the format we want. Is it OK to merge this as is? Adding an effect size column and sorting based on that we can do in iteration with another patch? Also .. about that. I kind of like the benchmarks being in the same predictable alphabetical order each table where you don't have to search for where a specific benchmark result is. Maybe we want to add the column but not sort?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 17:27):

cfallin commented on issue #5060:

One final request (sorry!). "Change factor" makes sense to me if it's a pure ratio where 1.000 is no change; but since we're presenting only the delta (subracting 1, or subracting Main_CT in the numerator, or subracting 100%, all equivalent), this is a delta, not a factor. It's also now a percent. So, ironically I guess, the original pct_change title (can we spell it out as "Percent change" though?) is actually now accurate, now that we've adjusted the formula -- could we go to that? Happy to approve+merge after that!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 18:24):

fitzgen commented on issue #5060:

Can we mark whether the change is statistically significant or not? Otherwise there is no way to know whether to trust it or not.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 18:45):

fitzgen commented on issue #5060:

I think I've communicated this in various one-off meetings over the years, but for posterity, my ideal output would be something like:

# Benchmark Results

<details>

<summary>Methods and Configuration</summary>

* Baseline: `main` at commit a1b2c3
* Comparison: `feature-branch` at commit d4e5f6
* Significance level: 0.01
* Processes: N
* Iterations per process: M
* Engine flags: ...
* Etc...

</details>

## Statistically Significant Results

<table>
  <thead>
    <tr> <th>Wasm Input</th> <th>Architecture</th> <th>Phase</th> <th>Effect Size</th> </tr>
  </thead>
  <tbody>
    <tr> <td><code>spidermonkey.wasm</code></td> <td>x64</td> <td>Compilation</td> <td>1.03 ± 0.01</td> </tr>
    <!-- etc... sorted by largest absolute effect size --->
  </tbody>
</table>

## Statistically Insignificant Results

<details>

<summary>Statistically insignificant results; hidden by default</details>

<!-- same type of table as above -->

</details>

The important bits being:

FWIW, we designed the sightglass-analysis crate to expose reusable functions to compute statistical significance, effect size, and confidence interval: https://github.com/bytecodealliance/sightglass/blob/main/crates/analysis/src/effect_size.rs#L16

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 03:21):

jlb6740 commented on issue #5060:

/bench_x64

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 03:28):

jlb6740 commented on issue #5060:

Change factor shows patch effect on x64 if merged compared to current head for main.%0A%0AResults are based on clocktick (CT) event cycles. Change Factor = (Patched_CT - Main_CT) / (Main_CT) %0AA negative change factor means clockticks are expected to be reduced by the patch.%0A%0A|wasm|arch|phase|change_factor|%0A |-|:-:|:-:|:-:|%0A|benchmarks/blake3-scalar|x86_64|Compilation|-4.631%25|%0A|benchmarks/blake3-simd|x86_64|Compilation|2.186%25|%0A|benchmarks/bz2|x86_64|Compilation|-0.093%25|%0A|benchmarks/intgemm-simd|x86_64|Compilation|-1.441%25|%0A|benchmarks/meshoptimizer|x86_64|Compilation|-0.102%25|%0A|benchmarks/noop|x86_64|Compilation|-1.682%25|%0A|benchmarks/pulldown-cmark|x86_64|Compilation|2.070%25|%0A|benchmarks/shootout-ackermann|x86_64|Compilation|-4.602%25|%0A|benchmarks/shootout-base64|x86_64|Compilation|-1.885%25|%0A|benchmarks/shootout-ctype|x86_64|Compilation|-2.564%25|%0A|benchmarks/shootout-ed25519|x86_64|Compilation|-3.418%25|%0A|benchmarks/shootout-fib2|x86_64|Compilation|-1.847%25|%0A|benchmarks/shootout-gimli|x86_64|Compilation|-3.495%25|%0A|benchmarks/shootout-heapsort|x86_64|Compilation|6.073%25|%0A|benchmarks/shootout-keccak|x86_64|Compilation|0.739%25|%0A|benchmarks/shootout-matrix|x86_64|Compilation|-8.418%25|%0A|benchmarks/shootout-memmove|x86_64|Compilation|-4.909%25|%0A|benchmarks/shootout-minicsv|x86_64|Compilation|0.607%25|%0A|benchmarks/shootout-nestedloop|x86_64|Compilation|-2.425%25|%0A|benchmarks/shootout-random|x86_64|Compilation|-2.287%25|%0A|benchmarks/shootout-ratelimit|x86_64|Compilation|3.886%25|%0A|benchmarks/shootout-seqhash|x86_64|Compilation|-2.022%25|%0A|benchmarks/shootout-sieve|x86_64|Compilation|-1.459%25|%0A|benchmarks/shootout-switch|x86_64|Compilation|-4.815%25|%0A|benchmarks/shootout-xblabla20|x86_64|Compilation|3.422%25|%0A|benchmarks/shootout-xchacha20|x86_64|Compilation|-3.053%25|%0A|benchmarks/spidermonkey|x86_64|Compilation|-0.316%25|%0A%0A|wasm|arch|phase|change_factor|%0A |-|:-:|:-:|:-:|%0A|benchmarks/blake3-scalar|x86_64|Instantiation|-30.403%25|%0A|benchmarks/blake3-simd|x86_64|Instantiation|-3.762%25|%0A|benchmarks/bz2|x86_64|Instantiation|7.500%25|%0A|benchmarks/intgemm-simd|x86_64|Instantiation|-1.067%25|%0A|benchmarks/meshoptimizer|x86_64|Instantiation|-2.640%25|%0A|benchmarks/noop|x86_64|Instantiation|-7.303%25|%0A|benchmarks/pulldown-cmark|x86_64|Instantiation|7.822%25|%0A|benchmarks/shootout-ackermann|x86_64|Instantiation|1.992%25|%0A|benchmarks/shootout-base64|x86_64|Instantiation|7.792%25|%0A|benchmarks/shootout-ctype|x86_64|Instantiation|2.504%25|%0A|benchmarks/shootout-ed25519|x86_64|Instantiation|-4.106%25|%0A|benchmarks/shootout-fib2|x86_64|Instantiation|0.998%25|%0A|benchmarks/shootout-gimli|x86_64|Instantiation|-0.532%25|%0A|benchmarks/shootout-heapsort|x86_64|Instantiation|2.495%25|%0A|benchmarks/shootout-keccak|x86_64|Instantiation|-0.245%25|%0A|benchmarks/shootout-matrix|x86_64|Instantiation|3.993%25|%0A|benchmarks/shootout-memmove|x86_64|Instantiation|-10.591%25|%0A|benchmarks/shootout-minicsv|x86_64|Instantiation|-7.414%25|%0A|benchmarks/shootout-nestedloop|x86_64|Instantiation|2.388%25|%0A|benchmarks/shootout-random|x86_64|Instantiation|-2.975%25|%0A|benchmarks/shootout-ratelimit|x86_64|Instantiation|4.045%25|%0A|benchmarks/shootout-seqhash|x86_64|Instantiation|6.400%25|%0A|benchmarks/shootout-sieve|x86_64|Instantiation|-0.312%25|%0A|benchmarks/shootout-switch|x86_64|Instantiation|-0.480%25|%0A|benchmarks/shootout-xblabla20|x86_64|Instantiation|-2.197%25|%0A|benchmarks/shootout-xchacha20|x86_64|Instantiation|-12.450%25|%0A|benchmarks/spidermonkey|x86_64|Instantiation|1.493%25|%0A%0A|wasm|arch|phase|change_factor|%0A |-|:-:|:-:|:-:|%0A|benchmarks/blake3-scalar|x86_64|Execution|0.093%25|%0A|benchmarks/blake3-simd|x86_64|Execution|-0.428%25|%0A|benchmarks/bz2|x86_64|Execution|1.562%25|%0A|benchmarks/intgemm-simd|x86_64|Execution|0.053%25|%0A|benchmarks/meshoptimizer|x86_64|Execution|0.030%25|%0A|benchmarks/noop|x86_64|Execution|4.042%25|%0A|benchmarks/pulldown-cmark|x86_64|Execution|0.612%25|%0A|benchmarks/shootout-ackermann|x86_64|Execution|18.860%25|%0A|benchmarks/shootout-base64|x86_64|Execution|0.292%25|%0A|benchmarks/shootout-ctype|x86_64|Execution|-0.073%25|%0A|benchmarks/shootout-ed25519|x86_64|Execution|0.082%25|%0A|benchmarks/shootout-fib2|x86_64|Execution|-0.094%25|%0A|benchmarks/shootout-gimli|x86_64|Execution|2.252%25|%0A|benchmarks/shootout-heapsort|x86_64|Execution|-0.083%25|%0A|benchmarks/shootout-keccak|x86_64|Execution|-1.394%25|%0A|benchmarks/shootout-matrix|x86_64|Execution|0.033%25|%0A|benchmarks/shootout-memmove|x86_64|Execution|-0.033%25|%0A|benchmarks/shootout-minicsv|x86_64|Execution|-0.293%25|%0A|benchmarks/shootout-nestedloop|x86_64|Execution|0.112%25|%0A|benchmarks/shootout-random|x86_64|Execution|-0.454%25|%0A|benchmarks/shootout-ratelimit|x86_64|Execution|-0.319%25|%0A|benchmarks/shootout-seqhash|x86_64|Execution|-5.423%25|%0A|benchmarks/shootout-sieve|x86_64|Execution|-0.227%25|%0A|benchmarks/shootout-switch|x86_64|Execution|-0.083%25|%0A|benchmarks/shootout-xblabla20|x86_64|Execution|2.013%25|%0A|benchmarks/shootout-xchacha20|x86_64|Execution|-3.499%25|%0A|benchmarks/spidermonkey|x86_64|Execution|0.748%25|%0A%0AAverages (x64):%0A|phase|change_factor|%0A |-|:-:|%0A|Compilation|-1.351%25|%0A|Execution|0.681%25|%0A|Instantiation|-1.372%25|

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 20:40):

jlb6740 commented on issue #5060:

/bench_x64

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 20:46):

jlb6740 commented on issue #5060:

Change factor shows patch effect on x64 if merged compared to current head for main.%0A%0AResults are based on clocktick (CT) event cycles. Change Factor = (Patched_CT - Main_CT) / (Main_CT) %0AA negative change factor means clockticks are expected to be reduced by the patch.%0A%0A|wasm|arch|phase|change_factor|%0A |-|:-:|:-:|:-:|%0A|benchmarks/blake3-scalar|x86_64|Compilation|-2.264%25|%0A|benchmarks/blake3-simd|x86_64|Compilation|-0.952%25|%0A|benchmarks/bz2|x86_64|Compilation|-1.169%25|%0A|benchmarks/intgemm-simd|x86_64|Compilation|-0.073%25|%0A|benchmarks/meshoptimizer|x86_64|Compilation|-0.470%25|%0A|benchmarks/noop|x86_64|Compilation|1.484%25|%0A|benchmarks/pulldown-cmark|x86_64|Compilation|2.920%25|%0A|benchmarks/shootout-ackermann|x86_64|Compilation|-1.525%25|%0A|benchmarks/shootout-base64|x86_64|Compilation|3.301%25|%0A|benchmarks/shootout-ctype|x86_64|Compilation|-2.706%25|%0A|benchmarks/shootout-ed25519|x86_64|Compilation|1.641%25|%0A|benchmarks/shootout-fib2|x86_64|Compilation|-0.268%25|%0A|benchmarks/shootout-gimli|x86_64|Compilation|6.551%25|%0A|benchmarks/shootout-heapsort|x86_64|Compilation|1.015%25|%0A|benchmarks/shootout-keccak|x86_64|Compilation|-1.059%25|%0A|benchmarks/shootout-matrix|x86_64|Compilation|1.295%25|%0A|benchmarks/shootout-memmove|x86_64|Compilation|-5.877%25|%0A|benchmarks/shootout-minicsv|x86_64|Compilation|1.779%25|%0A|benchmarks/shootout-nestedloop|x86_64|Compilation|0.122%25|%0A|benchmarks/shootout-random|x86_64|Compilation|-0.916%25|%0A|benchmarks/shootout-ratelimit|x86_64|Compilation|2.141%25|%0A|benchmarks/shootout-seqhash|x86_64|Compilation|1.004%25|%0A|benchmarks/shootout-sieve|x86_64|Compilation|0.260%25|%0A|benchmarks/shootout-switch|x86_64|Compilation|3.673%25|%0A|benchmarks/shootout-xblabla20|x86_64|Compilation|-2.125%25|%0A|benchmarks/shootout-xchacha20|x86_64|Compilation|0.693%25|%0A|benchmarks/spidermonkey|x86_64|Compilation|-0.094%25|%0A%0A|wasm|arch|phase|change_factor|%0A |-|:-:|:-:|:-:|%0A|benchmarks/blake3-scalar|x86_64|Instantiation|-8.503%25|%0A|benchmarks/blake3-simd|x86_64|Instantiation|11.889%25|%0A|benchmarks/bz2|x86_64|Instantiation|10.633%25|%0A|benchmarks/intgemm-simd|x86_64|Instantiation|-20.765%25|%0A|benchmarks/meshoptimizer|x86_64|Instantiation|15.852%25|%0A|benchmarks/noop|x86_64|Instantiation|3.321%25|%0A|benchmarks/pulldown-cmark|x86_64|Instantiation|-3.284%25|%0A|benchmarks/shootout-ackermann|x86_64|Instantiation|-0.097%25|%0A|benchmarks/shootout-base64|x86_64|Instantiation|-1.808%25|%0A|benchmarks/shootout-ctype|x86_64|Instantiation|-15.002%25|%0A|benchmarks/shootout-ed25519|x86_64|Instantiation|0.630%25|%0A|benchmarks/shootout-fib2|x86_64|Instantiation|-0.610%25|%0A|benchmarks/shootout-gimli|x86_64|Instantiation|15.031%25|%0A|benchmarks/shootout-heapsort|x86_64|Instantiation|25.119%25|%0A|benchmarks/shootout-keccak|x86_64|Instantiation|5.948%25|%0A|benchmarks/shootout-matrix|x86_64|Instantiation|5.849%25|%0A|benchmarks/shootout-memmove|x86_64|Instantiation|2.700%25|%0A|benchmarks/shootout-minicsv|x86_64|Instantiation|-25.233%25|%0A|benchmarks/shootout-nestedloop|x86_64|Instantiation|3.077%25|%0A|benchmarks/shootout-random|x86_64|Instantiation|1.315%25|%0A|benchmarks/shootout-ratelimit|x86_64|Instantiation|0.327%25|%0A|benchmarks/shootout-seqhash|x86_64|Instantiation|-4.867%25|%0A|benchmarks/shootout-sieve|x86_64|Instantiation|3.642%25|%0A|benchmarks/shootout-switch|x86_64|Instantiation|-0.317%25|%0A|benchmarks/shootout-xblabla20|x86_64|Instantiation|3.272%25|%0A|benchmarks/shootout-xchacha20|x86_64|Instantiation|11.555%25|%0A|benchmarks/spidermonkey|x86_64|Instantiation|-2.845%25|%0A%0A|wasm|arch|phase|change_factor|%0A |-|:-:|:-:|:-:|%0A|benchmarks/blake3-scalar|x86_64|Execution|-1.544%25|%0A|benchmarks/blake3-simd|x86_64|Execution|-3.132%25|%0A|benchmarks/bz2|x86_64|Execution|2.717%25|%0A|benchmarks/intgemm-simd|x86_64|Execution|0.179%25|%0A|benchmarks/meshoptimizer|x86_64|Execution|0.007%25|%0A|benchmarks/noop|x86_64|Execution|-4.482%25|%0A|benchmarks/pulldown-cmark|x86_64|Execution|-0.157%25|%0A|benchmarks/shootout-ackermann|x86_64|Execution|-19.695%25|%0A|benchmarks/shootout-base64|x86_64|Execution|-0.069%25|%0A|benchmarks/shootout-ctype|x86_64|Execution|0.295%25|%0A|benchmarks/shootout-ed25519|x86_64|Execution|0.082%25|%0A|benchmarks/shootout-fib2|x86_64|Execution|0.060%25|%0A|benchmarks/shootout-gimli|x86_64|Execution|-0.897%25|%0A|benchmarks/shootout-heapsort|x86_64|Execution|0.105%25|%0A|benchmarks/shootout-keccak|x86_64|Execution|-0.574%25|%0A|benchmarks/shootout-matrix|x86_64|Execution|-0.213%25|%0A|benchmarks/shootout-memmove|x86_64|Execution|0.211%25|%0A|benchmarks/shootout-minicsv|x86_64|Execution|0.063%25|%0A|benchmarks/shootout-nestedloop|x86_64|Execution|-0.075%25|%0A|benchmarks/shootout-random|x86_64|Execution|0.101%25|%0A|benchmarks/shootout-ratelimit|x86_64|Execution|0.270%25|%0A|benchmarks/shootout-seqhash|x86_64|Execution|0.039%25|%0A|benchmarks/shootout-sieve|x86_64|Execution|0.001%25|%0A|benchmarks/shootout-switch|x86_64|Execution|0.447%25|%0A|benchmarks/shootout-xblabla20|x86_64|Execution|1.642%25|%0A|benchmarks/shootout-xchacha20|x86_64|Execution|1.066%25|%0A|benchmarks/spidermonkey|x86_64|Execution|0.598%25|%0A%0AAverages (x64):%0A|phase|change_factor|%0A |-|:-:|%0A|Compilation|0.310%25|%0A|Execution|-0.850%25|%0A|Instantiation|1.364%25|

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 21:48):

jlb6740 commented on issue #5060:

/bench_x64

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 22:36):

jlb6740 commented on issue #5060:

%0Aexecution :: cycles :: benchmarks/bz2/benchmark.wasm%0A%0A Δ = 1513143.28 ± 1231561.44 (confidence = 99%25)%0A%0A main.so is 1.00x to 1.02x faster than commit.so!%0A%0A [129798904 131817054.80 138038694] commit.so%0A [127515208 130303911.52 132506288] main.so%0A%0Ainstantiation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm%0A%0A No difference in performance.%0A%0A [170034 196676.08 218436] commit.so%0A [173046 202642.40 373914] main.so%0A%0Ainstantiation :: cycles :: benchmarks/bz2/benchmark.wasm%0A%0A No difference in performance.%0A%0A [101652 121869.28 137242] commit.so%0A [106258 118855.28 133740] main.so%0A%0Acompilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm%0A%0A No difference in performance.%0A%0A [309698748 346607655.52 374942674] commit.so%0A [301942394 349979492.32 394267312] main.so%0A%0Acompilation :: cycles :: benchmarks/bz2/benchmark.wasm%0A%0A No difference in performance.%0A%0A [215154062 234460165.44 267407768] commit.so%0A [217859454 236701668.96 272852560] main.so%0A%0Aexecution :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm%0A%0A No difference in performance.%0A%0A [9344088 9469022.72 9651890] commit.so%0A [9343350 9504960.00 9678308] main.so%0A%0Aexecution :: cycles :: benchmarks/spidermonkey/benchmark.wasm%0A%0A No difference in performance.%0A%0A [1090353462 1104235121.84 1117828124] commit.so%0A [1091742374 1107584374.72 1126788480] main.so%0A%0Acompilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm%0A%0A No difference in performance.%0A%0A [7228229446 7366056318.80 7478912292] commit.so%0A [7210593826 7359829007.52 7506138438] main.so%0A%0Ainstantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm%0A%0A No difference in performance.%0A%0A [543920 582452.72 736118] commit.so%0A [548450 582405.60 752954] main.so

view this post on Zulip Wasmtime GitHub notifications bot (Oct 27 2022 at 22:53):

jlb6740 commented on issue #5060:

/bench_x64

view this post on Zulip Wasmtime GitHub notifications bot (Oct 27 2022 at 23:35):

jlb6740 commented on issue #5060:

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[103568 119207.92 135522] commit.so
[106822 126276.88 272912] main.so

instantiation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[170928 190962.88 212534] commit.so
[173412 200371.76 304700] main.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[542924 569039.60 605244] commit.so
[539128 581279.04 757404] main.so

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[318068526 344151444.88 407384632] commit.so
[306264764 350683370.00 383642122] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[215556738 235741942.72 275455752] commit.so
[217886770 233020488.80 280170694] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

No difference in performance.

[127738454 131527589.52 137189426] commit.so
[127496598 130751974.40 133060306] main.so

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[1091080444 1106099439.76 1164561264] commit.so
[1089452668 1099986181.04 1121721354] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

No difference in performance.

[7177863870 7295968763.20 7438418366] commit.so
[7180327178 7320723278.88 7475606826] main.so

execution :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

No difference in performance.

[9359830 9498456.32 9801272] commit.so
[9336598 9483072.80 9601184] main.so

view this post on Zulip Wasmtime GitHub notifications bot (Oct 27 2022 at 23:51):

jlb6740 commented on issue #5060:

Hi @cfallin @fitzgen, Instead of having the current table, how about we just highlight a few of the default benchmarks and print the default message? I think, as has already been suggested, we should be doing any formatting of the output printed in the message in sightglass itself so can work on a github markdown formatting there. Also, this patch increases the number of iterations and parallel processes to help stabilize results allowing us to close https://github.com/bytecodealliance/wasmtime/pull/5064

view this post on Zulip Wasmtime GitHub notifications bot (Oct 31 2022 at 20:30):

fitzgen commented on issue #5060:

Instead of having the current table, how about we just highlight a few of the default benchmarks and print the default message?

Happy with using the default output (and separately from the github action growing a "markdown" output format in the sightglass tool).

Not sure what you mean about "just highlight a few of the default benchmarks". Do you mean just run the default.suite set of benchmarks? If so, fine by me.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 18:30):

cfallin commented on issue #5060:

Will wait for @cfallin approval

Ah, sorry, didn't realize you were waiting for me here too. Yes, it seems fine to me.

One thing that I just realized is that the results come as a comment via your personal GitHub account. I think that we should change that -- we shouldn't have a dependence on one person's account (it's liable to break if you change or delete your account, it's problematic if one day you aren't working on Wasmtime/Cranelift any more, etc). We don't have to do it in this PR but would you be able to create a dedicated bot account for posting these comments, and give the infra a token for that instead (and somehow share the appropriate details with various folks across BA so we always have access)?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 01 2022 at 19:28):

jlb6740 edited a comment on issue #5060:

Will wait for @cfallin approval

Ah, sorry, didn't realize you were waiting for me here too. Yes, it seems fine to me.

@cfallin Says it is still waiting for the +1. Will use bot account to send results back. May need you to create that account here though, not sure I have permission.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 03 2022 at 20:53):

jlb6740 edited a comment on issue #5060:

Will wait for @cfallin approval

Ah, sorry, didn't realize you were waiting for me here too. Yes, it seems fine to me.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 03 2022 at 20:55):

cfallin edited a comment on issue #5060:

Will wait for @cfallin approval

Ah, sorry, didn't realize you were waiting for me here too. Yes, it seems fine to me.

One thing that I just realized is that the results come as a comment via your personal GitHub account. I think that we should change that -- we shouldn't have a dependence on one person's account (it's liable to break if you change or delete your account, it's problematic if one day you aren't working on Wasmtime/Cranelift any more, etc). We don't have to do it in this PR but would you be able to create a dedicated bot account for posting these comments, and give the infra a token for that instead (and somehow share the appropriate details with various folks across BA so we always have access)?


Last updated: Dec 23 2024 at 12:05 UTC