wasmtime / PR #1798 MachInst: a few memory enhancements · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #1798 MachInst: a few memory enhancements

Wasmtime GitHub notifications bot (Jun 01 2020 at 16:48):

bnjbvr opened PR #1798 from memory-enhancements to master:

This series of commits (which will have to land with a version bump of regalloc.rs to accommodate the API changes) reduces the number of memory allocations during compilation.

First commit changes the return type of mem_finalize so it's a SmallVec, avoiding short-lived a SmallVec allocation that was convered into a Vec.

Second commit adapts to changes in regalloc.rs brought by https://github.com/bytecodealliance/regalloc.rs/pull/71, reducing the need for short-lived Sets. Also avoids hashing computations, etc.

Third commit makes an effort to reduce the size of the Inst enum down to 32 bytes. Unfortunately this means more pointer chasing for Call/JTSequence instructions (their members are now living in a *Info data structures) so not sure it's something we'd take as is. Maybe we could mitigate the indirection effects by reversing some of the attributes moves, putting them back next to the box; it's unclear which are most often accessed, and thus should be prioritized.

All in all, with all the changes from https://github.com/bytecodealliance/regalloc.rs/pull/71 too, the results are the following:

benchmark bytes allocs block allocs inst count

big.clif before 389M 458K 4966M

big.clif after 317M 424K 4798M

medium.clif before 17.9M 33K 173M

medium.clif after 14.8M 22K 155M

regex-rs.wasm before 700M 1.942M 5030M

regex-rs.wasm after 586M 1.619M 4582M

It's hard to measure the wallclock effect, because my machine is quite noisy (despite setting CPU governance to perf, etc.) and the benchmarks run for really short. Instruction counts are pretty stable, though, and for a final measure I performed them with Cachegrind, just to be sure about those.

cc @julian-seward1

benchmark	bytes allocs	block allocs	inst count
big.clif before	389M	458K	4966M
big.clif after	317M	424K	4798M
medium.clif before	17.9M	33K	173M
medium.clif after	14.8M	22K	155M
regex-rs.wasm before	700M	1.942M	5030M
regex-rs.wasm after	586M	1.619M	4582M

Wasmtime GitHub notifications bot (Jun 01 2020 at 16:48):

bnjbvr requested cfallin for a review on PR #1798.

Wasmtime GitHub notifications bot (Jun 01 2020 at 16:57):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Jun 01 2020 at 16:57):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Jun 01 2020 at 16:57):

cfallin created PR Review Comment:

Pre-existing, but I just realized that we don't document this side-effect of emit_call on the ABICall trait. Perhaps add to the doc comment (in the trait def) to note that emit_call() should only be called once, because it is allowed to re-use parts of the ABICall object in emitting instructions?

Wasmtime GitHub notifications bot (Jun 02 2020 at 13:30):

bnjbvr updated PR #1798 from memory-enhancements to master:

This series of commits (which will have to land with a version bump of regalloc.rs to accommodate the API changes) reduces the number of memory allocations during compilation.

First commit changes the return type of mem_finalize so it's a SmallVec, avoiding short-lived a SmallVec allocation that was convered into a Vec.

Second commit adapts to changes in regalloc.rs brought by https://github.com/bytecodealliance/regalloc.rs/pull/71, reducing the need for short-lived Sets. Also avoids hashing computations, etc.

Third commit makes an effort to reduce the size of the Inst enum down to 32 bytes. Unfortunately this means more pointer chasing for Call/JTSequence instructions (their members are now living in a *Info data structures) so not sure it's something we'd take as is. Maybe we could mitigate the indirection effects by reversing some of the attributes moves, putting them back next to the box; it's unclear which are most often accessed, and thus should be prioritized.

All in all, with all the changes from https://github.com/bytecodealliance/regalloc.rs/pull/71 too, the results are the following:

benchmark bytes allocs block allocs inst count

big.clif before 389M 458K 4966M

big.clif after 317M 424K 4798M

medium.clif before 17.9M 33K 173M

medium.clif after 14.8M 22K 155M

regex-rs.wasm before 700M 1.942M 5030M

regex-rs.wasm after 586M 1.619M 4582M

It's hard to measure the wallclock effect, because my machine is quite noisy (despite setting CPU governance to perf, etc.) and the benchmarks run for really short. Instruction counts are pretty stable, though, and for a final measure I performed them with Cachegrind, just to be sure about those.

cc @julian-seward1

benchmark	bytes allocs	block allocs	inst count
big.clif before	389M	458K	4966M
big.clif after	317M	424K	4798M
medium.clif before	17.9M	33K	173M
medium.clif after	14.8M	22K	155M
regex-rs.wasm before	700M	1.942M	5030M
regex-rs.wasm after	586M	1.619M	4582M

Wasmtime GitHub notifications bot (Jun 02 2020 at 13:32):

bnjbvr updated PR #1798 from memory-enhancements to master:

This series of commits (which will have to land with a version bump of regalloc.rs to accommodate the API changes) reduces the number of memory allocations during compilation.

First commit changes the return type of mem_finalize so it's a SmallVec, avoiding short-lived a SmallVec allocation that was convered into a Vec.

Second commit adapts to changes in regalloc.rs brought by https://github.com/bytecodealliance/regalloc.rs/pull/71, reducing the need for short-lived Sets. Also avoids hashing computations, etc.

Third commit makes an effort to reduce the size of the Inst enum down to 32 bytes. Unfortunately this means more pointer chasing for Call/JTSequence instructions (their members are now living in a *Info data structures) so not sure it's something we'd take as is. Maybe we could mitigate the indirection effects by reversing some of the attributes moves, putting them back next to the box; it's unclear which are most often accessed, and thus should be prioritized.

All in all, with all the changes from https://github.com/bytecodealliance/regalloc.rs/pull/71 too, the results are the following:

benchmark bytes allocs block allocs inst count

big.clif before 389M 458K 4966M

big.clif after 317M 424K 4798M

medium.clif before 17.9M 33K 173M

medium.clif after 14.8M 22K 155M

regex-rs.wasm before 700M 1.942M 5030M

regex-rs.wasm after 586M 1.619M 4582M

It's hard to measure the wallclock effect, because my machine is quite noisy (despite setting CPU governance to perf, etc.) and the benchmarks run for really short. Instruction counts are pretty stable, though, and for a final measure I performed them with Cachegrind, just to be sure about those.

cc @julian-seward1

benchmark	bytes allocs	block allocs	inst count
big.clif before	389M	458K	4966M
big.clif after	317M	424K	4798M
medium.clif before	17.9M	33K	173M
medium.clif after	14.8M	22K	155M
regex-rs.wasm before	700M	1.942M	5030M
regex-rs.wasm after	586M	1.619M	4582M

Wasmtime GitHub notifications bot (Jun 02 2020 at 13:32):

bnjbvr submitted PR Review.

Wasmtime GitHub notifications bot (Jun 02 2020 at 13:32):

bnjbvr created PR Review Comment:

Good point, added!

Wasmtime GitHub notifications bot (Jun 02 2020 at 14:29):

bnjbvr merged PR #1798.

Last updated: Apr 17 2025 at 15:03 UTC