akirilov-arm commented on issue #4555:
... this should not result in a large pipeline bubble (hence large performance penalty) on current microarchitectures, is that right?
Yes, microarchitectures that do not perform any value speculation should treat it as
NOP
.... would you be willing to run a quick test (any reasonably complex benchmark that uses the heap will do --
bz2
orspidermonkey
from Sightglass perhaps)?Sure, I can give it a try.
... should there be a test that shows
csdb
appearing in br_table lowerings as well?Is the test I updated in
cranelift/codegen/src/isa/aarch64/mod.rs
insufficient?
akirilov-arm edited a comment on issue #4555:
... this should not result in a large pipeline bubble (hence large performance penalty) on current microarchitectures, is that right?
Yes, microarchitectures that do not perform any value speculation should treat it as
NOP
(hence the choice of a particular encoding).... would you be willing to run a quick test (any reasonably complex benchmark that uses the heap will do --
bz2
orspidermonkey
from Sightglass perhaps)?Sure, I can give it a try.
... should there be a test that shows
csdb
appearing in br_table lowerings as well?Is the test I updated in
cranelift/codegen/src/isa/aarch64/mod.rs
insufficient?
akirilov-arm edited a comment on issue #4555:
... this should not result in a large pipeline bubble (hence large performance penalty) on current microarchitectures, is that right?
Yes, microarchitectures that do not perform any value speculation should treat it as
NOP
(hence the choice of a particular encoding).... would you be willing to run a quick test (any reasonably complex benchmark that uses the heap will do --
bz2
orspidermonkey
from Sightglass perhaps)?Sure, I can give it a try.
... should there be a test that shows
csdb
appearing in br_table lowerings as well?Is the test I updated in
cranelift/codegen/src/isa/aarch64/mod.rs
insufficient? Edit - sorry, this sounds a bit rude, but I didn't mean it that way; I believe that test does the job.
cfallin commented on issue #4555:
Ah, yes, the smoke test does technically cover
br_table
. I guess I was a bit surprised we don't have a filetest that exercises it that changed as a result of this; if you'd like to add one, that'd be great, as an opportunistic expansion of our test suite :-)
akirilov-arm commented on issue #4555:
Hah, it turns out that there are in fact tests -
cranelift/filetests/filetests/isa/aarch64/jumptable.clif
and, unsurprisingly, thecranelift/filetests/filetests/runtests/br_table.clif
runtest. I think the issue is that we don't emit theCsdb
VCode instruction explicitly forbr_table
, but theJTSequence
compound operation instead, so actually we can't use a CLIF test to verify end-to-end the precise machine code that has been generated. Note that theJTSequence
pretty-printing logic hasn't been updated to include thecsel
instruction added by the initial Spectre mitigation either; I will remedy both issues.
akirilov-arm commented on issue #4555:
Here are the Sightglass results on an Ampere Altra machine (
wasmtime/target/release/libwasmtime_bench_api.so
being the build that generatesCSDB
):execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm Δ = 442011.60 ± 101171.36 (confidence = 99%) libwasmtime_bench_api.so is 1.00x to 1.00x faster than wasmtime/target/release/libwasmtime_bench_api.so! wasmtime/target/release/libwasmtime_bench_api.so is 1.00x to 1.00x faster than libwasmtime_bench_api.so! [115783264 116247975.57 116880177] libwasmtime_bench_api.so [115445187 115805963.97 118081119] wasmtime/target/release/libwasmtime_bench_api.so execution :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 17636030.01 ± 4052032.02 (confidence = 99%) libwasmtime_bench_api.so is 1.00x to 1.00x faster than wasmtime/target/release/libwasmtime_bench_api.so! wasmtime/target/release/libwasmtime_bench_api.so is 1.00x to 1.00x faster than libwasmtime_bench_api.so! [4631075480 4649708462.75 4675100300] libwasmtime_bench_api.so [4617673936 4632072432.74 4723056593] wasmtime/target/release/libwasmtime_bench_api.so [...] execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [836103 890682.87 1035450] libwasmtime_bench_api.so [839938 903084.69 1030489] wasmtime/target/release/libwasmtime_bench_api.so execution :: nanoseconds :: benchmarks/bz2/benchmark.wasm No difference in performance. [33442481 35625535.03 41416669] libwasmtime_bench_api.so [33595869 36121314.02 41218165] wasmtime/target/release/libwasmtime_bench_api.so
There is no real difference, as expected.
Last updated: Jan 24 2025 at 00:11 UTC