Stream: git-wasmtime

Topic: wasmtime / issue #7352 PCC: support x86-64.


view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 21:56):

cfallin commented on issue #7352:

A performance measurement, also:

cfallin@fastly2:~/work/wasmtime% hyperfine -L pcc no,yes "target/release/wasmtime compile -C pcc={pcc} --target x86_64 ../wasm-tests/spidermonkey.wasm"
Benchmark 1: target/release/wasmtime compile -C pcc=no --target x86_64 ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):     995.9 ms ±  13.1 ms    [User: 7685.1 ms, System: 347.6 ms]
  Range (min  max):   981.5 ms  1015.4 ms    10 runs

Benchmark 2: target/release/wasmtime compile -C pcc=yes --target x86_64 ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.009 s ±  0.008 s    [User: 7.828 s, System: 0.349 s]
  Range (min  max):    0.998 s   1.026 s    10 runs

Summary
  target/release/wasmtime compile -C pcc=no --target x86_64 ../wasm-tests/spidermonkey.wasm ran
    1.01 ± 0.02 times faster than target/release/wasmtime compile -C pcc=yes --target x86_64 ../wasm-tests/spidermonkey.wasm

or in other words, ~1% overhead.

Prior to this PR, turning on PCC automatically enabled the regalloc checker as well; I found this to have much higher overhead:

cfallin@fastly2:~/work/wasmtime% hyperfine -L checker no,yes "target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker={checker} --target x86_64 ../wasm-tests/spidermonkey.wasm"
Benchmark 1: target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=no --target x86_64 ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.034 s ±  0.010 s    [User: 7.798 s, System: 0.362 s]
  Range (min  max):    1.018 s   1.055 s    10 runs

Benchmark 2: target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=yes --target x86_64 ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.741 s ±  0.033 s    [User: 15.546 s, System: 0.393 s]
  Range (min  max):    1.710 s   1.820 s    10 runs

Summary
  target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=no --target x86_64 ../wasm-tests/spidermonkey.wasm ran
    1.68 ± 0.04 times faster than target/release/wasmtime compile -C pcc=yes -C cranelift-regalloc_checker=yes --target x86_64 ../wasm-tests/spidermonkey.wasm

or about 68% above just PCC. Given that, IMHO a good design tradeoff point is to run PCC in production, but not the regalloc checker; we already fuzz continuously with the latter. It can always be turned on explicitly.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 22:09):

jeffparsons commented on issue #7352:

An inquisitive member of the peanut gallery would like to know if you have written anything high-level about your goals for this work? I've seen the PRs flying by and it sounds really cool, but I don't understand any of the context. In particular, I've been wondering:

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2023 at 22:31):

cfallin commented on issue #7352:

@jeffparsons great questions! I haven't written anything beyond the initial issue proposing this work in #6090 -- the last section of that issue writeup describes the proof-carrying code / "memory capabilities" model. I plan to write more eventually.

The goal doesn't have anything to do with perf -- the generated code doesn't change, and this doesn't allow any more aggressive strategies to be used -- but rather, risk mitigation. We've had a few CVEs that have allowed sandbox escapes from Wasmtime due to miscompiles, and so I want to build infrastructure that does translation validation to prove a given compilation artifact doesn't have such an issue. Long-term, it could also be used to verify other invariants (e.g., @fitzgen and I have talked a bit about how it could be used to provide additional safety in the implementation of Wasm GC).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 26 2023 at 01:47):

cfallin commented on issue #7352:

I reworked the whole PCC implementation for x64 based on the above feedback -- removing the ability to pattern-match into Gpr / Xmm types forced a transpose of the whole thing, but as a side-effect, I think the explicit case breakdown is kind of nice in its thoroughness. I was able to actually remove the _ catch-all and list every instruction kind explicitly, so we'll be forced to think about semantics (and catch memory accesses, etc.) whenever we add a new instruction kind. Let me know what you think!


Last updated: Nov 22 2024 at 16:03 UTC