erikrose opened PR #10183 from erikrose:canonical-gvn
to bytecodealliance:main
:
This stands on the shoulders of #6135 but reorders only while computing GVN map lookup keys. It doesn't affect the codegenned instructions themselves, which will hopefully dodge the performance regression that led to that PR's reversion.
cfallin commented on PR #10183:
@erikrose interesting, thanks for this PR. I'm curious if you've observed a benefit (e.g. on our Sightglass suite) in runtime, and also what the impact on compile time is?
My general thought is I'm a little leery of adding extra magic to the egraph optimizer -- it's already a carefully-tuned combo stage of all of the core optimizations, and I'd rather make it simpler than more complex, all other things being equal -- but we can certainly think about ways to clean this up if it shows a benefit.
github-actions[bot] commented on PR #10183:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
erikrose commented on PR #10183:
Yes, this is definitely a drafty draft at the moment (hence the Draft classification!). :-) Dan thought it would be a good acclimatization task for me, and indeed, it was a great tour of many parts of Cranelift. It seems like it can also grant me a good little tour of Sightglass, and then we'll see if it's worth pursuing further. Thanks for having a look!
erikrose updated PR #10183.
erikrose closed without merge PR #10183.
erikrose commented on PR #10183:
I benched this patch last night (and it took all night, because I ran the whole "all" suite). It didn't have a huge impact on execution speed either way: 7 benchmarks were faster, 9 slower, and 94 no-difference. Caveats: I ran this on macOS, so it's ARM-only and cycle-count-only—no perf counters. Here are the trimmed-down results, to get a sense of impact:
Ones where the PR was faster than
main
:target/release/libwasmtime_bench_api.dylib is 1.02x to 1.03x faster than tmp/wasmtime_main.dylib! target/release/libwasmtime_bench_api.dylib is 1.02x to 1.02x faster than tmp/wasmtime_main.dylib! target/release/libwasmtime_bench_api.dylib is 1.00x to 1.02x faster than tmp/wasmtime_main.dylib! target/release/libwasmtime_bench_api.dylib is 1.01x to 1.02x faster than tmp/wasmtime_main.dylib! target/release/libwasmtime_bench_api.dylib is 1.00x to 1.02x faster than tmp/wasmtime_main.dylib! target/release/libwasmtime_bench_api.dylib is 1.00x to 1.01x faster than tmp/wasmtime_main.dylib! target/release/libwasmtime_bench_api.dylib is 1.00x to 1.01x faster than tmp/wasmtime_main.dylib!
Ones where the PR was slower:
tmp/wasmtime_main.dylib is 1.00x to 1.05x faster than target/release/libwasmtime_bench_api.dylib! tmp/wasmtime_main.dylib is 1.00x to 1.04x faster than target/release/libwasmtime_bench_api.dylib! tmp/wasmtime_main.dylib is 1.02x to 1.02x faster than target/release/libwasmtime_bench_api.dylib! tmp/wasmtime_main.dylib is 1.00x to 1.04x faster than target/release/libwasmtime_bench_api.dylib! tmp/wasmtime_main.dylib is 1.01x to 1.02x faster than target/release/libwasmtime_bench_api.dylib! tmp/wasmtime_main.dylib is 1.00x to 1.01x faster than target/release/libwasmtime_bench_api.dylib! tmp/wasmtime_main.dylib is 1.00x to 1.01x faster than target/release/libwasmtime_bench_api.dylib! tmp/wasmtime_main.dylib is 1.00x to 1.01x faster than target/release/libwasmtime_bench_api.dylib! tmp/wasmtime_main.dylib is 1.00x to 1.00x faster than target/release/libwasmtime_bench_api.dylib!
I expect most of the suite, embodied in .wasm files, already made a trip through LLVM's optimization pipeline and thus had all the GVN opportunities used up already. It may be worth waking this up again if ever we encounter a use case where that's not true. In the meantime, I'll close this. Here are the full benchmark results for posterity: Full Sightglass Results.txt.
If we ever pick it up again, I would…
- [ ] Push GVN key generation to someplace where we don't have to look at it in the main egraph loop: perhaps in the GVN hash itself.
- [ ] Remove the
IntCompare
andFloatCompare
==
arms of thematch
, since they'd be more at home as ISLE rules (and are probably already implemented as such).
Last updated: Feb 28 2025 at 02:27 UTC