fitzgen opened issue #5183:
This could maybe give us some small perf gains due to a smaller working set that better fits in cache.
SigData
is defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627The two changes we could make to shrink its size are:
We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle
Range<u32>
to represent the subslice inSigSet::abi_args
for this signature's arguments and returns. Instead we could have au32
start and au16
length. Doing this for both args and returns would save a total of 4 bytes.
SigData::stack_reg_arg
is currently anOption<usize>
but, again because of implementation limits on the number of returns, could beOption<u16>
. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.For anyone who picks this up, we can measure the impact this change has via
Building the bench API on
main
:
$ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.so
Building the bench API on the feature branch for this patch:
$ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.so
Running the sightglass benchmarks:
$ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
(You can also try passing--measure perf-counters
to see the effects on cache accesses/misses if you're on linux.)
fitzgen labeled issue #5183:
This could maybe give us some small perf gains due to a smaller working set that better fits in cache.
SigData
is defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627The two changes we could make to shrink its size are:
We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle
Range<u32>
to represent the subslice inSigSet::abi_args
for this signature's arguments and returns. Instead we could have au32
start and au16
length. Doing this for both args and returns would save a total of 4 bytes.
SigData::stack_reg_arg
is currently anOption<usize>
but, again because of implementation limits on the number of returns, could beOption<u16>
. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.For anyone who picks this up, we can measure the impact this change has via
Building the bench API on
main
:
$ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.so
Building the bench API on the feature branch for this patch:
$ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.so
Running the sightglass benchmarks:
$ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
(You can also try passing--measure perf-counters
to see the effects on cache accesses/misses if you're on linux.)
fitzgen labeled issue #5183:
This could maybe give us some small perf gains due to a smaller working set that better fits in cache.
SigData
is defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627The two changes we could make to shrink its size are:
We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle
Range<u32>
to represent the subslice inSigSet::abi_args
for this signature's arguments and returns. Instead we could have au32
start and au16
length. Doing this for both args and returns would save a total of 4 bytes.
SigData::stack_reg_arg
is currently anOption<usize>
but, again because of implementation limits on the number of returns, could beOption<u16>
. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.For anyone who picks this up, we can measure the impact this change has via
Building the bench API on
main
:
$ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.so
Building the bench API on the feature branch for this patch:
$ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.so
Running the sightglass benchmarks:
$ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
(You can also try passing--measure perf-counters
to see the effects on cache accesses/misses if you're on linux.)
fitzgen labeled issue #5183:
This could maybe give us some small perf gains due to a smaller working set that better fits in cache.
SigData
is defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627The two changes we could make to shrink its size are:
We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle
Range<u32>
to represent the subslice inSigSet::abi_args
for this signature's arguments and returns. Instead we could have au32
start and au16
length. Doing this for both args and returns would save a total of 4 bytes.
SigData::stack_reg_arg
is currently anOption<usize>
but, again because of implementation limits on the number of returns, could beOption<u16>
. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.For anyone who picks this up, we can measure the impact this change has via
Building the bench API on
main
:
$ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.so
Building the bench API on the feature branch for this patch:
$ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.so
Running the sightglass benchmarks:
$ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
(You can also try passing--measure perf-counters
to see the effects on cache accesses/misses if you're on linux.)
jameysharp commented on issue #5183:
It might also be possible to entirely remove the end-points of the argument and return ranges. I haven't checked carefully but I think the end-point in one
SigData
is always equal to the start-point in the nextSigData
, and the lastSigData
's end-point is the length of the corresponding backing array.If we move all the
SigData
methods that take aSigSet
to be onSigSet
instead and take aSig
index—which I think is a good idea for API design reasons anyway—then it's easy to access theSig+1
index.Are there any constraints on where the stack-ret arg can appear in the list? For example, if it's always the first argument, we could replace the
Option<usize>
with abool
. If there are no constraints, we could still cut the field's size in half by getting rid of theOption
, if we can pick a reserved value likeu16::MAX
to indicate that there's no stack-ret arg.Also, do the
sized_stack_*_space
fields need to bei64
? For one thing, what does it mean to reserve a negative amount of stack space? For another, surely 32 bits is enough for the size of a stack frame...?All together, I think we can shrink
SigData
from 7u64
to 5u32
, taking padding and alignment into account, for a savings of 64%.
fitzgen commented on issue #5183:
It might also be possible to entirely remove the end-points of the argument and return ranges. I haven't checked carefully but I think the end-point in one
SigData
is always equal to the start-point in the nextSigData
, and the lastSigData
's end-point is the length of the corresponding backing array.If we move all the
SigData
methods that take aSigSet
to be onSigSet
instead and take aSig
index—which I think is a good idea for API design reasons anyway—then it's easy to access theSig+1
index.Yeah, this is possible but a larger change and probably not a
good first issue
anymore.Are there any constraints on where the stack-ret arg can appear in the list? For example, if it's always the first argument, we could replace the
Option<usize>
with abool
. If there are no constraints, we could still cut the field's size in half by getting rid of theOption
, if we can pick a reserved value likeu16::MAX
to indicate that there's no stack-ret arg.There aren't, but in practice I think it is always the first. Maybe we could just start enforcing that? Again, probably not in the realm of a
good first issue
though.Also, do the
sized_stack_*_space
fields need to bei64
? For one thing, what does it mean to reserve a negative amount of stack space? For another, surely 32 bits is enough for the size of a stack frame...?Yeah I was kinda wondering that too. Haven't investigated it, so didn't want to include it in a
good first issue
.
tnachen commented on issue #5183:
I'll try out changing the SigData methods as suggested with Sigset
jameysharp commented on issue #5183:
Let us know if you have any questions! I'm excited that you're digging into this.
TornaxO7 commented on issue #5183:
May I ask what's still missing for this issue?
jameysharp closed issue #5183:
This could maybe give us some small perf gains due to a smaller working set that better fits in cache.
SigData
is defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627The two changes we could make to shrink its size are:
We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle
Range<u32>
to represent the subslice inSigSet::abi_args
for this signature's arguments and returns. Instead we could have au32
start and au16
length. Doing this for both args and returns would save a total of 4 bytes.
SigData::stack_reg_arg
is currently anOption<usize>
but, again because of implementation limits on the number of returns, could beOption<u16>
. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.For anyone who picks this up, we can measure the impact this change has via
Building the bench API on
main
:
$ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.so
Building the bench API on the feature branch for this patch:
$ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.so
Running the sightglass benchmarks:
$ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
(You can also try passing--measure perf-counters
to see the effects on cache accesses/misses if you're on linux.)
jameysharp commented on issue #5183:
Good question! I don't remember for sure.
I think the only remaining possibility for shrinking this struct further is the
Option<u16>
storingstack_ret_arg
, which is four bytes with one byte of padding in the middle. I think theisa::CallConv
type is one byte, so with all the other fields being four bytes, thecall_conv
field has three bytes of padding. Therefore if we can pack the calling convention in with the return argument index, we can save four bytes.However, I don't think that's a "good first issue". I'm not even sure it's worth doing.
So I'm going to go ahead and close this because I think it's basically done. Thanks for checking!
Last updated: Dec 23 2024 at 12:05 UTC