cfallin commented on issue #7352:
A performance measurement, also:
cfallin@fastly2:~/work/wasmtime% hyperfine -L pcc no,yes "target/release/wasmtime compile -C pcc={pcc} --target x86_64 ../wasm-tests/spidermonkey.wasm" Benchmark 1: target/release/wasmtime compile -C pcc=no --target x86_64 ../wasm-tests/spidermonkey.wasm Time (mean ± σ): 995.9 ms ± 13.1 ms [User: 7685.1 ms, System: 347.6 ms] Range (min … max): 981.5 ms … 1015.4 ms 10 runs Benchmark 2: target/release/wasmtime compile -C pcc=yes --target x86_64 ../wasm-tests/spidermonkey.wasm Time (mean ± σ): 1.009 s ± 0.008 s [User: 7.828 s, System: 0.349 s] Range (min … max): 0.998 s … 1.026 s 10 runs Summary target/release/wasmtime compile -C pcc=no --target x86_64 ../wasm-tests/spidermonkey.wasm ran 1.01 ± 0.02 times faster than target/release/wasmtime compile -C pcc=yes --target x86_64 ../wasm-tests/spidermonkey.wasm
or in other words, ~1% overhead.
Prior to this PR, turning on PCC automatically enabled the regalloc checker as well; I found this to have much higher overhead:
cfallin@fastly2:~/work/wasmtime% hyperfine -L checker no,yes "target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker={checker} --target x86_64 ../wasm-tests/spidermonkey.wasm" Benchmark 1: target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=no --target x86_64 ../wasm-tests/spidermonkey.wasm Time (mean ± σ): 1.034 s ± 0.010 s [User: 7.798 s, System: 0.362 s] Range (min … max): 1.018 s … 1.055 s 10 runs Benchmark 2: target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=yes --target x86_64 ../wasm-tests/spidermonkey.wasm Time (mean ± σ): 1.741 s ± 0.033 s [User: 15.546 s, System: 0.393 s] Range (min … max): 1.710 s … 1.820 s 10 runs Summary target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=no --target x86_64 ../wasm-tests/spidermonkey.wasm ran 1.68 ± 0.04 times faster than target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=yes --target x86_64 ../wasm-tests/spidermonkey.wasm
or about 68% above just PCC. Given that, IMHO a good design tradeoff point is to run PCC in production, but not the regalloc checker; we already fuzz continuously with the latter. It can always be turned on explicitly.
jeffparsons commented on issue #7352:
An inquisitive member of the peanut gallery would like to know if you have written anything high-level about your goals for this work? I've seen the PRs flying by and it sounds really cool, but I don't understand any of the context. In particular, I've been wondering:
- Is this primarily aimed at having another layer of safety without compromising performance, or does it unlock opportunities for increased performance without having to compromise on existing safety by replacing blunter mechanisms?
- Are there particular workloads that you expect this work to benefit?
- If increased performance is a goal, do you have any targets/estimates/hopes in mind?
cfallin commented on issue #7352:
@jeffparsons great questions! I haven't written anything beyond the initial issue proposing this work in #6090 -- the last section of that issue writeup describes the proof-carrying code / "memory capabilities" model. I plan to write more eventually.
The goal doesn't have anything to do with perf -- the generated code doesn't change, and this doesn't allow any more aggressive strategies to be used -- but rather, risk mitigation. We've had a few CVEs that have allowed sandbox escapes from Wasmtime due to miscompiles, and so I want to build infrastructure that does translation validation to prove a given compilation artifact doesn't have such an issue. Long-term, it could also be used to verify other invariants (e.g., @fitzgen and I have talked a bit about how it could be used to provide additional safety in the implementation of Wasm GC).
cfallin commented on issue #7352:
I reworked the whole PCC implementation for x64 based on the above feedback -- removing the ability to pattern-match into
Gpr
/Xmm
types forced a transpose of the whole thing, but as a side-effect, I think the explicit case breakdown is kind of nice in its thoroughness. I was able to actually remove the_
catch-all and list every instruction kind explicitly, so we'll be forced to think about semantics (and catch memory accesses, etc.) whenever we add a new instruction kind. Let me know what you think!
Last updated: Jan 24 2025 at 00:11 UTC