bnjbvr opened PR #1798 from memory-enhancements
to master
:
This series of commits (which will have to land with a version bump of regalloc.rs to accommodate the API changes) reduces the number of memory allocations during compilation.
- First commit changes the return type of mem_finalize so it's a SmallVec, avoiding short-lived a SmallVec allocation that was convered into a Vec.
- Second commit adapts to changes in regalloc.rs brought by https://github.com/bytecodealliance/regalloc.rs/pull/71, reducing the need for short-lived Sets. Also avoids hashing computations, etc.
- Third commit makes an effort to reduce the size of the
Inst
enum down to 32 bytes. Unfortunately this means more pointer chasing for Call/JTSequence instructions (their members are now living in a *Info data structures) so not sure it's something we'd take as is. Maybe we could mitigate the indirection effects by reversing some of the attributes moves, putting them back next to the box; it's unclear which are most often accessed, and thus should be prioritized.All in all, with all the changes from https://github.com/bytecodealliance/regalloc.rs/pull/71 too, the results are the following:
benchmark bytes allocs block allocs inst count big.clif before 389M 458K 4966M big.clif after 317M 424K 4798M medium.clif before 17.9M 33K 173M medium.clif after 14.8M 22K 155M regex-rs.wasm before 700M 1.942M 5030M regex-rs.wasm after 586M 1.619M 4582M It's hard to measure the wallclock effect, because my machine is quite noisy (despite setting CPU governance to perf, etc.) and the benchmarks run for really short. Instruction counts are pretty stable, though, and for a final measure I performed them with Cachegrind, just to be sure about those.
cc @julian-seward1
bnjbvr requested cfallin for a review on PR #1798.
cfallin submitted PR Review.
cfallin submitted PR Review.
cfallin created PR Review Comment:
Pre-existing, but I just realized that we don't document this side-effect of
emit_call
on theABICall
trait. Perhaps add to the doc comment (in the trait def) to note thatemit_call()
should only be called once, because it is allowed to re-use parts of the ABICall object in emitting instructions?
bnjbvr updated PR #1798 from memory-enhancements
to master
:
This series of commits (which will have to land with a version bump of regalloc.rs to accommodate the API changes) reduces the number of memory allocations during compilation.
- First commit changes the return type of mem_finalize so it's a SmallVec, avoiding short-lived a SmallVec allocation that was convered into a Vec.
- Second commit adapts to changes in regalloc.rs brought by https://github.com/bytecodealliance/regalloc.rs/pull/71, reducing the need for short-lived Sets. Also avoids hashing computations, etc.
- Third commit makes an effort to reduce the size of the
Inst
enum down to 32 bytes. Unfortunately this means more pointer chasing for Call/JTSequence instructions (their members are now living in a *Info data structures) so not sure it's something we'd take as is. Maybe we could mitigate the indirection effects by reversing some of the attributes moves, putting them back next to the box; it's unclear which are most often accessed, and thus should be prioritized.All in all, with all the changes from https://github.com/bytecodealliance/regalloc.rs/pull/71 too, the results are the following:
benchmark bytes allocs block allocs inst count big.clif before 389M 458K 4966M big.clif after 317M 424K 4798M medium.clif before 17.9M 33K 173M medium.clif after 14.8M 22K 155M regex-rs.wasm before 700M 1.942M 5030M regex-rs.wasm after 586M 1.619M 4582M It's hard to measure the wallclock effect, because my machine is quite noisy (despite setting CPU governance to perf, etc.) and the benchmarks run for really short. Instruction counts are pretty stable, though, and for a final measure I performed them with Cachegrind, just to be sure about those.
cc @julian-seward1
bnjbvr updated PR #1798 from memory-enhancements
to master
:
This series of commits (which will have to land with a version bump of regalloc.rs to accommodate the API changes) reduces the number of memory allocations during compilation.
- First commit changes the return type of mem_finalize so it's a SmallVec, avoiding short-lived a SmallVec allocation that was convered into a Vec.
- Second commit adapts to changes in regalloc.rs brought by https://github.com/bytecodealliance/regalloc.rs/pull/71, reducing the need for short-lived Sets. Also avoids hashing computations, etc.
- Third commit makes an effort to reduce the size of the
Inst
enum down to 32 bytes. Unfortunately this means more pointer chasing for Call/JTSequence instructions (their members are now living in a *Info data structures) so not sure it's something we'd take as is. Maybe we could mitigate the indirection effects by reversing some of the attributes moves, putting them back next to the box; it's unclear which are most often accessed, and thus should be prioritized.All in all, with all the changes from https://github.com/bytecodealliance/regalloc.rs/pull/71 too, the results are the following:
benchmark bytes allocs block allocs inst count big.clif before 389M 458K 4966M big.clif after 317M 424K 4798M medium.clif before 17.9M 33K 173M medium.clif after 14.8M 22K 155M regex-rs.wasm before 700M 1.942M 5030M regex-rs.wasm after 586M 1.619M 4582M It's hard to measure the wallclock effect, because my machine is quite noisy (despite setting CPU governance to perf, etc.) and the benchmarks run for really short. Instruction counts are pretty stable, though, and for a final measure I performed them with Cachegrind, just to be sure about those.
cc @julian-seward1
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Good point, added!
bnjbvr merged PR #1798.
Last updated: Jan 24 2025 at 00:11 UTC