fitzgen requested cfallin for a review on PR #4710.
fitzgen opened PR #4710 from bump-alloc-in-remove-constant-phis to main:
This makes compilation 2-6% faster for Sightglass's bz2 benchmark:
compilation :: cycles :: benchmarks/bz2/benchmark.wasm Δ = 7290648.36 ± 4245152.07 (confidence = 99%) bump.so is 1.02x to 1.06x faster than main.so! [166388177 183238542.98 214732518] bump.so [172836648 190529191.34 217514271] main.so compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [182220055 225793551.12 277857575] bump.so [193212613 227784078.61 277175335] main.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [3848442474 4295214144.37 4665127241] bump.so [3969505457 4262415290.10 4563869974] main.so
cfallin created PR review comment:
While we're here, is
blocks_reverse_postorder.len()the only remaining reason we need to collect RPO into aVecabove? Or can we iterate on the iterator chain (postorder ....rev()) below, and get the length from that somehow too?
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Does
summariesstill need to be a hashmap? It seems to me like it should be dense. If we can have a default state (eitherOption<BlockSummary<'a>>or a zero-alloc and cloneable emptyBlockSummary) we could use aSecondaryMaphere I think.
bjorn3 created PR review comment:
You can get the length of an iterator using
.count(). This avoids the intermediate allocation of aVec.
bjorn3 submitted PR review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
However we do iterate over it below:
for &&b in &blocks_reverse_postorder {.
cfallin submitted PR review.
cfallin created PR review comment:
Right, the idea is that we could iterate over
domtree.cfg_postorder().iter().rev()(with that expression directly in thefor). And I seecfg_postorderreturns a slice so we can dodomtree.cfg_postorder().len()here I think.
fitzgen submitted PR review.
fitzgen created PR review comment:
About 30-35% of blocks don't get a block summary (no parameters themselves and no successors have params either) so it isn't a clear win since the hash map miiiiight end up using less space. I'll do a quick test and see if there is anything I can measure.
fitzgen created PR review comment:
Did this and wasn't able to measure any difference in perf, but its a nice change (good eye!) so I'll push it up soon.
fitzgen submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Ah OK, in that case it seems totally fine to stick with the hashmap, unless we're surprised by your results!
fitzgen updated PR #4710 from bump-alloc-in-remove-constant-phis to main.
fitzgen submitted PR review.
fitzgen created PR review comment:
I didn't get any measurable perf benefits from switching to a
SecondaryMapbut it did actually clean up a bit ofunwraps and such, so I do think I'll keep it.
fitzgen has enabled auto merge for PR #4710.
fitzgen merged PR #4710.
Last updated: Dec 13 2025 at 19:03 UTC