Stream: git-wasmtime

Topic: wasmtime / PR #4619 x64: Migrate selectif and selectif_sp...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 22:21):

elliottt edited PR #4619 from trevor/x64-isle-select to main:

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 22:24):

elliottt edited PR #4619 from trevor/x64-isle-select to main:

Lower selectif and selectif_spectre_guard in ISLE, which allows a lot of code related to emit_cmp to be removed from the x64 lower.rs module.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 22:36):

elliottt has marked PR #4619 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 22:52):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 22:52):

elliottt created PR review comment:

This might not be the most reusable signature, should I rework this to take the type as an argument and ValueRegs arguments instead of Value?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 23:17):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 23:17):

cfallin created PR review comment:

That might be slightly more idiomatic for the "inst helpers" layer, though I don't think it's too important either way; strong types mean we can always refactor easily and relatively safely...

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 23:17):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 23:17):

cfallin created PR review comment:

Here is I think a good reason to force the inputs into registers: otherwise the heap legalization is going to have a load in the hotpath.

Although, after saying that, I am a bit curious whether the reduced register pressure (no rdx needed here anymore) has an effect in the opposite direction... if it's not too much trouble, would you be willing to run a quick set of Sightglass benchmarks (on at least say bz2, spidermonkey, pulldown-cmark)?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 23:51):

elliottt updated PR #4619 from trevor/x64-isle-select to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 23:54):

elliottt created PR review comment:

Sure! I have a fix that forces the value to registers now so this PR shouldn't change performance, but I'll run the sightglass benchmarks without that fix just to see what the difference is.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 23:54):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 02:09):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 02:09):

elliottt created PR review comment:

Here are the results, branch.so is loading a 0 from memory, main.so is with xor r, r, r: https://gist.github.com/elliottt/67be59da8bbdc3a9f946837efc81dedf

Overall, it looks like using xor r, r, r is still a win, so I'll merge the version of this PR that preserves that behavior :+1:

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 02:14):

elliottt edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 06:09):

elliottt created PR review comment:

My naming of the two outputs was bothering me, so I ran another set of benchmarks with the sha of the commit as the name for the shared object. Here are the results: https://gist.github.com/elliottt/cd2ad5c97d53c57a315b76b4b674af46.

25c436f4480b5e20f832eab4d165734692c298a0 is the head of this branch, loading 0 via xor
643f53789890f2a776839fe4f5a662d0ec7f891b is the previous commit that would load 0 via memory

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 06:09):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 06:13):

elliottt edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 16:36):

elliottt merged PR #4619.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 16:43):

cfallin created PR review comment:

A bit late but I realized this morning that we probably aren't actually benchmarking the thing with the default Sightglass config: it should be using static memories with a large enough guard region (4+2GiB by default I think?) that bounds checks don't actually happen. We would need to configure it to use dynamic bounds to see a delta; we may just be reading into noise here.

If you're up for it, --static-memory-maximum-size 0 --static-memory-guard-size 0 will I think give a completely dynamic (bounds-checked) configuration; running the two different wasmtimes and timing them (with e.g. hyperfine, running a cwasm with --allow-precompiled) is probably easier than hacking the default config to make Sightglass use it.

I still remain a little skeptical that loads of 0 from memory are faster, but I'd love to be proven wrong...

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 16:43):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 16:59):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 16:59):

elliottt created PR review comment:

Happy to run some more tests in the background -- I've still got the shared objects hanging around. I figured it was better to merge the version that didn't introduce the loads, as it would be pretty easy to revert the last commit on this PR if we decided that it was worthwhile :+1:

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 17:37):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 17:37):

elliottt created PR review comment:

The results for bz2 are pretty interesting: initialization really takes a hit from the load.

instantiation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  Δ = 26233.52 ± 5394.98 (confidence = 99%)

  25c436f4480b5e20f832eab4d165734692c298a0.so is 1.43x to 1.65x faster than 643f53789890f2a776839fe4f5a662d0ec7f891b.so!
  643f53789890f2a776839fe4f5a662d0ec7f891b.so is 0.58x to 0.72x faster than 25c436f4480b5e20f832eab4d165734692c298a0.so!

  [43591 48321.01 90806] 25c436f4480b5e20f832eab4d165734692c298a0.so
  [43783 74554.53 122088] 643f53789890f2a776839fe4f5a662d0ec7f891b.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  Δ = 94444.02 ± 19422.58 (confidence = 99%)

  25c436f4480b5e20f832eab4d165734692c298a0.so is 1.43x to 1.65x faster than 643f53789890f2a776839fe4f5a662d0ec7f891b.so!
  643f53789890f2a776839fe4f5a662d0ec7f891b.so is 0.58x to 0.72x faster than 25c436f4480b5e20f832eab4d165734692c298a0.so!

  [156936 173963.32 326914] 25c436f4480b5e20f832eab4d165734692c298a0.so
  [157626 268407.34 439534] 643f53789890f2a776839fe4f5a662d0ec7f891b.so

execution :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  Δ = 269493.01 ± 178023.21 (confidence = 99%)

  25c436f4480b5e20f832eab4d165734692c298a0.so is 0.99x to 1.00x faster than 643f53789890f2a776839fe4f5a662d0ec7f891b.so!
  643f53789890f2a776839fe4f5a662d0ec7f891b.so is 1.00x to 1.01x faster than 25c436f4480b5e20f832eab4d165734692c298a0.so!

  [30236428 30873377.89 31490866] 25c436f4480b5e20f832eab4d165734692c298a0.so
  [30184192 30603884.88 31578144] 643f53789890f2a776839fe4f5a662d0ec7f891b.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  Δ = 970207.46 ± 640905.23 (confidence = 99%)

  25c436f4480b5e20f832eab4d165734692c298a0.so is 0.99x to 1.00x faster than 643f53789890f2a776839fe4f5a662d0ec7f891b.so!
  643f53789890f2a776839fe4f5a662d0ec7f891b.so is 1.00x to 1.01x faster than 25c436f4480b5e20f832eab4d165734692c298a0.so!

  [108854820 111147916.48 113370948] 25c436f4480b5e20f832eab4d165734692c298a0.so
  [108666764 110177709.02 113685162] 643f53789890f2a776839fe4f5a662d0ec7f891b.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [235950166 248161159.74 266311074] 25c436f4480b5e20f832eab4d165734692c298a0.so
  [237357286 247903675.98 301295484] 643f53789890f2a776839fe4f5a662d0ec7f891b.so

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [65539498 68931326.53 73972799] 25c436f4480b5e20f832eab4d165734692c298a0.so
  [65930352 68859805.71 83690362] 643f53789890f2a776839fe4f5a662d0ec7f891b.so

As a note about how I set these flags: I modified crates/bench-api/src/lib.rs to initialize both of the fields @cfallin mentioned to 0 so that I could continue using sightglass for generating the benchmarks.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 17:40):

cfallin created PR review comment:

Interesting, so load-from-zero is still 0-1% faster, if I'm reading this right? Great if so; perhaps the register pressure is the deciding factor!

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 17:40):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 17:44):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 17:44):

cfallin created PR review comment:

Actually after discussing offline with @elliottt it seems that what's on main now is the xor-based approach; the gap is small enough and xor is otherwise the idiomatic way to zero a register (optimized as such by microarchitectures etc) that I think I'd prefer to keep that approach. Thanks for running the benchmarks here!


Last updated: Jan 24 2025 at 00:11 UTC