cfallin requested elliottt for a review on PR #5391.
cfallin opened PR #5391 from egraph-opts
to main
:
It turns out that there was a decent amount of low-hanging fruit:
Save elaborated results by canonical value, not latest value (union value). Previously we were artificially skipping and re-elaborating some values we already had because we were not finding them in the map.
Make some changes to handling of icmp results: when icmp became I8-typed (when bools went away), many uses became
(uextend $I32 (icmp $I8 ...))
, and so patterns in lowering backends were no longer matching.This PR includes an x64-specific change to match
(brz (uextend (icmp ...)))
and similarly forbrnz
, but it also takes advantage of the ability to write rules easily in the egraph mid-end to rewrite selects with icmp inputs appropriately.Extend constprop to understand selects in the egraph mid-end.
With these changes, bz2.wasm sees a ~1% speedup, and spidermonkey.wasm with a fib.js input sees a ~16% speedup (measured over 5 runs; CPU frequency scaling disabled, pinned to 1 core), with methodology:
$ time taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./spidermonkey.base.cwasm ./fib.js 1346269 taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./fib.js 2.14s user 0.01s system 99% cpu 2.148 total $ time taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./spidermonkey.egraphs.cwasm ./fib.js 1346269 taskset 1 target/release/wasmtime run --allow-precompiled --dir=. ./fib.js 1.78s user 0.01s system 99% cpu 1.788 total
My basic strategy here was to stare at disassemblies of the hottest basic blocks when running the above, and pick out what looked like suboptimal code. I suspect doing the same for other benchmarks would possibly yield more benefits.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin requested jameysharp for a review on PR #5391.
cfallin requested fitzgen for a review on PR #5391.
cfallin updated PR #5391 from egraph-opts
to main
.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
There's already an
intcc_inverse
inprelude_lower.isle
, implemented inmachinst/isle.rs
. Can you move that to the prelude instead of introducing a new copy? Maybe move the otherintcc
andfloatcc
constructors at the same time, just so they're all in one place.
cfallin updated PR #5391 from egraph-opts
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, I hadn't seen that; updated, thanks!
jameysharp submitted PR review.
cfallin merged PR #5391.
Last updated: Nov 22 2024 at 16:03 UTC