wasmtime / issue #4555 Cranelift AArch64: Harden the Spec... · git-wasmtime · Zulip Chat Archive

Stream: git-wasmtime

Topic: wasmtime / issue #4555 Cranelift AArch64: Harden the Spec...

Wasmtime GitHub notifications bot (Jul 29 2022 at 16:54):

akirilov-arm commented on issue #4555:

... this should not result in a large pipeline bubble (hence large performance penalty) on current microarchitectures, is that right?

Yes, microarchitectures that do not perform any value speculation should treat it as NOP.

... would you be willing to run a quick test (any reasonably complex benchmark that uses the heap will do -- bz2 or spidermonkey from Sightglass perhaps)?

Sure, I can give it a try.

... should there be a test that shows csdb appearing in br_table lowerings as well?

Is the test I updated in cranelift/codegen/src/isa/aarch64/mod.rs insufficient?

Wasmtime GitHub notifications bot (Jul 29 2022 at 16:55):

akirilov-arm edited a comment on issue #4555:

... this should not result in a large pipeline bubble (hence large performance penalty) on current microarchitectures, is that right?

Yes, microarchitectures that do not perform any value speculation should treat it as NOP (hence the choice of a particular encoding).

... would you be willing to run a quick test (any reasonably complex benchmark that uses the heap will do -- bz2 or spidermonkey from Sightglass perhaps)?

Sure, I can give it a try.

... should there be a test that shows csdb appearing in br_table lowerings as well?

Is the test I updated in cranelift/codegen/src/isa/aarch64/mod.rs insufficient?

Wasmtime GitHub notifications bot (Jul 29 2022 at 17:01):

akirilov-arm edited a comment on issue #4555:

... this should not result in a large pipeline bubble (hence large performance penalty) on current microarchitectures, is that right?

Yes, microarchitectures that do not perform any value speculation should treat it as NOP (hence the choice of a particular encoding).

... would you be willing to run a quick test (any reasonably complex benchmark that uses the heap will do -- bz2 or spidermonkey from Sightglass perhaps)?

Sure, I can give it a try.

... should there be a test that shows csdb appearing in br_table lowerings as well?

Is the test I updated in cranelift/codegen/src/isa/aarch64/mod.rs insufficient? Edit - sorry, this sounds a bit rude, but I didn't mean it that way; I believe that test does the job.

Wasmtime GitHub notifications bot (Jul 29 2022 at 17:14):

cfallin commented on issue #4555:

Ah, yes, the smoke test does technically cover br_table. I guess I was a bit surprised we don't have a filetest that exercises it that changed as a result of this; if you'd like to add one, that'd be great, as an opportunistic expansion of our test suite :-)

Wasmtime GitHub notifications bot (Jul 29 2022 at 18:00):

akirilov-arm commented on issue #4555:

Hah, it turns out that there are in fact tests - cranelift/filetests/filetests/isa/aarch64/jumptable.clif and, unsurprisingly, the cranelift/filetests/filetests/runtests/br_table.clif runtest. I think the issue is that we don't emit the Csdb VCode instruction explicitly for br_table, but the JTSequence compound operation instead, so actually we can't use a CLIF test to verify end-to-end the precise machine code that has been generated. Note that the JTSequence pretty-printing logic hasn't been updated to include the csel instruction added by the initial Spectre mitigation either; I will remedy both issues.

Wasmtime GitHub notifications bot (Aug 01 2022 at 14:22):

akirilov-arm commented on issue #4555:

Here are the Sightglass results on an Ampere Altra machine (wasmtime/target/release/libwasmtime_bench_api.so being the build that generates CSDB):

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 442011.60 ± 101171.36 (confidence = 99%)

  libwasmtime_bench_api.so is 1.00x to 1.00x faster than wasmtime/target/release/libwasmtime_bench_api.so!
  wasmtime/target/release/libwasmtime_bench_api.so is 1.00x to 1.00x faster than libwasmtime_bench_api.so!

  [115783264 116247975.57 116880177] libwasmtime_bench_api.so
  [115445187 115805963.97 118081119] wasmtime/target/release/libwasmtime_bench_api.so

execution :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 17636030.01 ± 4052032.02 (confidence = 99%)

  libwasmtime_bench_api.so is 1.00x to 1.00x faster than wasmtime/target/release/libwasmtime_bench_api.so!
  wasmtime/target/release/libwasmtime_bench_api.so is 1.00x to 1.00x faster than libwasmtime_bench_api.so!

  [4631075480 4649708462.75 4675100300] libwasmtime_bench_api.so
  [4617673936 4632072432.74 4723056593] wasmtime/target/release/libwasmtime_bench_api.so

[...]

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [836103 890682.87 1035450] libwasmtime_bench_api.so
  [839938 903084.69 1030489] wasmtime/target/release/libwasmtime_bench_api.so

execution :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [33442481 35625535.03 41416669] libwasmtime_bench_api.so
  [33595869 36121314.02 41218165] wasmtime/target/release/libwasmtime_bench_api.so

There is no real difference, as expected.

Last updated: Apr 18 2025 at 14:03 UTC