fitzgen requested cfallin for a review on PR #4710.
fitzgen opened PR #4710 from bump-alloc-in-remove-constant-phis
to main
:
This makes compilation 2-6% faster for Sightglass's bz2 benchmark:
compilation :: cycles :: benchmarks/bz2/benchmark.wasm Δ = 7290648.36 ± 4245152.07 (confidence = 99%) bump.so is 1.02x to 1.06x faster than main.so! [166388177 183238542.98 214732518] bump.so [172836648 190529191.34 217514271] main.so compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [182220055 225793551.12 277857575] bump.so [193212613 227784078.61 277175335] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3848442474 4295214144.37 4665127241] bump.so [3969505457 4262415290.10 4563869974] main.so
cfallin created PR review comment:
While we're here, is
blocks_reverse_postorder.len()
the only remaining reason we need to collect RPO into aVec
above? Or can we iterate on the iterator chain (postorder ....rev()
) below, and get the length from that somehow too?
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Does
summaries
still need to be a hashmap? It seems to me like it should be dense. If we can have a default state (eitherOption<BlockSummary<'a>>
or a zero-alloc and cloneable emptyBlockSummary
) we could use aSecondaryMap
here I think.
bjorn3 created PR review comment:
You can get the length of an iterator using
.count()
. This avoids the intermediate allocation of aVec
.
bjorn3 submitted PR review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
However we do iterate over it below:
for &&b in &blocks_reverse_postorder {
.
cfallin submitted PR review.
cfallin created PR review comment:
Right, the idea is that we could iterate over
domtree.cfg_postorder().iter().rev()
(with that expression directly in thefor
). And I seecfg_postorder
returns a slice so we can dodomtree.cfg_postorder().len()
here I think.
fitzgen submitted PR review.
fitzgen created PR review comment:
About 30-35% of blocks don't get a block summary (no parameters themselves and no successors have params either) so it isn't a clear win since the hash map miiiiight end up using less space. I'll do a quick test and see if there is anything I can measure.
fitzgen created PR review comment:
Did this and wasn't able to measure any difference in perf, but its a nice change (good eye!) so I'll push it up soon.
fitzgen submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Ah OK, in that case it seems totally fine to stick with the hashmap, unless we're surprised by your results!
fitzgen updated PR #4710 from bump-alloc-in-remove-constant-phis
to main
.
fitzgen submitted PR review.
fitzgen created PR review comment:
I didn't get any measurable perf benefits from switching to a
SecondaryMap
but it did actually clean up a bit ofunwrap
s and such, so I do think I'll keep it.
fitzgen has enabled auto merge for PR #4710.
fitzgen merged PR #4710.
Last updated: Dec 23 2024 at 12:05 UTC