jameysharp opened PR #4966 from resolve-aliases
to main
:
At control-flow join points, cranelift-frontend's SSA builder currently checks to see if only one definition of a variable reaches the current block. If so, it can eliminate the corresponding block parameter and use the original def directly. It implements this by turning the block parameter into an alias for the original value.
However, it didn't resolve aliases during this check, except after it had already determined that there was only one definition.
Resolving aliases first instead allows it to detect that more block parameters are redundant. And as more block parameters get converted to aliases, later blocks can see common definitions from further away, so this has a compounding effect.
According to
valgrind --tool=dhat
, this is a significant memory savings. On the pulldown-cmark benchmark from Sightglass:
- 15.3% (1.9MiB) less memory allocated at maximum heap
- 4.1% (6.7MiB) less memory allocated in total
- 9.8% (57MiB) fewer bytes read
- 12.6% (36MiB) fewer bytes written
- 5.4% fewer instructions retired
- 1.04x faster by instructions retired (per Sightglass/perf)
- 1.03x to 1.04x faster by CPU cycles (per Sightglass/perf)
- 1.03 ± 0.01 times faster by CPU time (per hyperfine)
- 1.04x faster by cache accesses (per Sightglass/perf)
On the bz2 benchmark:
- 1.06x faster by instructions retired (per Sightglass/perf)
- 1.05x faster by CPU cycles (per Sightglass/perf)
- 1.04 ± 0.01 times faster by CPU time (per hyperfine)
- 1.02x to 1.03x faster by cache accesses (per Sightglass/perf)
Even on the largest benchmark in Sightglass (spidermonkey.wasm), this is a measurable improvement:
- 1.03x faster by instructions retired (per Sightglass/perf)
- 1.02x faster by CPU cycles (per Sightglass/perf)
- 1.02 ± 0.00 times faster by CPU time (per hyperfine)
There was no significant difference in cache misses for any benchmark, according to Sightglass/perf.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
jameysharp requested cfallin for a review on PR #4966.
jameysharp requested fitzgen for a review on PR #4966.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Why isn't this reachable anymore?
jameysharp submitted PR review.
jameysharp created PR review comment:
Good question. Two pieces:
ZeroOneOrMore::One(pred_val)
comes from the abovefor pred_val in self
loop, which already resolved aliases. So it's guaranteed at this point that resolving aliases is a no-op.- The above loop filters out any
pred_val
which is equal tosentinel
.So instead of being a special case of
ZeroOneOrMore::One
, this is now just another case ofZeroOneOrMore::Zero
. And that already had theemit_zero
.But that case also takes care of two other things that weren't handled here: appending the current block to the layout if necessary, and adding this block to the side effects. So this does change behavior.
I _think_ it was a bug that this case didn't also do those things. I also think it doesn't matter very much: this can only happen in unreachable blocks. As far as I understand, cranelift-wasm never generates unreachable blocks; does cg-clif?
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Thanks for the explanation! cg_clif does generate unreachable blocks in some cases.
jameysharp has enabled auto merge for PR #4966.
jameysharp submitted PR review.
jameysharp created PR review comment:
I think this is worth noting in the git history so I'm adding this paragraph to the commit message:
This also merges a special case, where there's exactly one unique non-sentinel definition but it's actually an alias for the sentinel, into the general case where all definitions are from the sentinel. As a result there's only one case that has to introduce a definition of the variable to zero.
fitzgen submitted PR review.
jameysharp merged PR #4966.
Last updated: Dec 23 2024 at 12:05 UTC