Stream: git-wasmtime

Topic: wasmtime / issue #5183 Cranelift: shrink the size of `Sig...


view this post on Zulip Wasmtime GitHub notifications bot (Nov 02 2022 at 21:00):

fitzgen opened issue #5183:

This could maybe give us some small perf gains due to a smaller working set that better fits in cache.

SigData is defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627

The two changes we could make to shrink its size are:

  1. We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle Range<u32> to represent the subslice in SigSet::abi_args for this signature's arguments and returns. Instead we could have a u32 start and a u16 length. Doing this for both args and returns would save a total of 4 bytes.

  2. SigData::stack_reg_arg is currently an Option<usize> but, again because of implementation limits on the number of returns, could be Option<u16>. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.

For anyone who picks this up, we can measure the impact this change has via

  1. Building the bench API on main:
    $ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.so

  2. Building the bench API on the feature branch for this patch:
    $ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.so

  3. Running the sightglass benchmarks:
    $ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
    (You can also try passing --measure perf-counters to see the effects on cache accesses/misses if you're on linux.)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 02 2022 at 21:00):

fitzgen labeled issue #5183:

This could maybe give us some small perf gains due to a smaller working set that better fits in cache.

SigData is defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627

The two changes we could make to shrink its size are:

  1. We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle Range<u32> to represent the subslice in SigSet::abi_args for this signature's arguments and returns. Instead we could have a u32 start and a u16 length. Doing this for both args and returns would save a total of 4 bytes.

  2. SigData::stack_reg_arg is currently an Option<usize> but, again because of implementation limits on the number of returns, could be Option<u16>. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.

For anyone who picks this up, we can measure the impact this change has via

  1. Building the bench API on main:
    $ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.so

  2. Building the bench API on the feature branch for this patch:
    $ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.so

  3. Running the sightglass benchmarks:
    $ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
    (You can also try passing --measure perf-counters to see the effects on cache accesses/misses if you're on linux.)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 02 2022 at 21:00):

fitzgen labeled issue #5183:

This could maybe give us some small perf gains due to a smaller working set that better fits in cache.

SigData is defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627

The two changes we could make to shrink its size are:

  1. We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle Range<u32> to represent the subslice in SigSet::abi_args for this signature's arguments and returns. Instead we could have a u32 start and a u16 length. Doing this for both args and returns would save a total of 4 bytes.

  2. SigData::stack_reg_arg is currently an Option<usize> but, again because of implementation limits on the number of returns, could be Option<u16>. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.

For anyone who picks this up, we can measure the impact this change has via

  1. Building the bench API on main:
    $ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.so

  2. Building the bench API on the feature branch for this patch:
    $ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.so

  3. Running the sightglass benchmarks:
    $ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
    (You can also try passing --measure perf-counters to see the effects on cache accesses/misses if you're on linux.)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 02 2022 at 21:00):

fitzgen labeled issue #5183:

This could maybe give us some small perf gains due to a smaller working set that better fits in cache.

SigData is defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627

The two changes we could make to shrink its size are:

  1. We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle Range<u32> to represent the subslice in SigSet::abi_args for this signature's arguments and returns. Instead we could have a u32 start and a u16 length. Doing this for both args and returns would save a total of 4 bytes.

  2. SigData::stack_reg_arg is currently an Option<usize> but, again because of implementation limits on the number of returns, could be Option<u16>. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.

For anyone who picks this up, we can measure the impact this change has via

  1. Building the bench API on main:
    $ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.so

  2. Building the bench API on the feature branch for this patch:
    $ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.so

  3. Running the sightglass benchmarks:
    $ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
    (You can also try passing --measure perf-counters to see the effects on cache accesses/misses if you're on linux.)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 02 2022 at 21:43):

jameysharp commented on issue #5183:

It might also be possible to entirely remove the end-points of the argument and return ranges. I haven't checked carefully but I think the end-point in one SigData is always equal to the start-point in the next SigData, and the last SigData's end-point is the length of the corresponding backing array.

If we move all the SigData methods that take a SigSet to be on SigSet instead and take a Sig index—which I think is a good idea for API design reasons anyway—then it's easy to access the Sig+1 index.

Are there any constraints on where the stack-ret arg can appear in the list? For example, if it's always the first argument, we could replace the Option<usize> with a bool. If there are no constraints, we could still cut the field's size in half by getting rid of the Option, if we can pick a reserved value like u16::MAX to indicate that there's no stack-ret arg.

Also, do the sized_stack_*_space fields need to be i64? For one thing, what does it mean to reserve a negative amount of stack space? For another, surely 32 bits is enough for the size of a stack frame...?

All together, I think we can shrink SigData from 7 u64 to 5 u32, taking padding and alignment into account, for a savings of 64%.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 03 2022 at 15:37):

fitzgen commented on issue #5183:

It might also be possible to entirely remove the end-points of the argument and return ranges. I haven't checked carefully but I think the end-point in one SigData is always equal to the start-point in the next SigData, and the last SigData's end-point is the length of the corresponding backing array.

If we move all the SigData methods that take a SigSet to be on SigSet instead and take a Sig index—which I think is a good idea for API design reasons anyway—then it's easy to access the Sig+1 index.

Yeah, this is possible but a larger change and probably not a good first issue anymore.

Are there any constraints on where the stack-ret arg can appear in the list? For example, if it's always the first argument, we could replace the Option<usize> with a bool. If there are no constraints, we could still cut the field's size in half by getting rid of the Option, if we can pick a reserved value like u16::MAX to indicate that there's no stack-ret arg.

There aren't, but in practice I think it is always the first. Maybe we could just start enforcing that? Again, probably not in the realm of a good first issue though.

Also, do the sized_stack_*_space fields need to be i64? For one thing, what does it mean to reserve a negative amount of stack space? For another, surely 32 bits is enough for the size of a stack frame...?

Yeah I was kinda wondering that too. Haven't investigated it, so didn't want to include it in a good first issue.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 18 2022 at 07:37):

tnachen commented on issue #5183:

I'll try out changing the SigData methods as suggested with Sigset

view this post on Zulip Wasmtime GitHub notifications bot (Nov 18 2022 at 17:19):

jameysharp commented on issue #5183:

Let us know if you have any questions! I'm excited that you're digging into this.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 30 2023 at 22:41):

TornaxO7 commented on issue #5183:

May I ask what's still missing for this issue?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 01 2023 at 00:31):

jameysharp closed issue #5183:

This could maybe give us some small perf gains due to a smaller working set that better fits in cache.

SigData is defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627

The two changes we could make to shrink its size are:

  1. We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle Range<u32> to represent the subslice in SigSet::abi_args for this signature's arguments and returns. Instead we could have a u32 start and a u16 length. Doing this for both args and returns would save a total of 4 bytes.

  2. SigData::stack_reg_arg is currently an Option<usize> but, again because of implementation limits on the number of returns, could be Option<u16>. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.

For anyone who picks this up, we can measure the impact this change has via

  1. Building the bench API on main:
    $ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.so

  2. Building the bench API on the feature branch for this patch:
    $ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.so

  3. Running the sightglass benchmarks:
    $ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
    (You can also try passing --measure perf-counters to see the effects on cache accesses/misses if you're on linux.)

view this post on Zulip Wasmtime GitHub notifications bot (Dec 01 2023 at 00:31):

jameysharp commented on issue #5183:

Good question! I don't remember for sure.

I think the only remaining possibility for shrinking this struct further is the Option<u16> storing stack_ret_arg, which is four bytes with one byte of padding in the middle. I think the isa::CallConv type is one byte, so with all the other fields being four bytes, the call_conv field has three bytes of padding. Therefore if we can pack the calling convention in with the return argument index, we can save four bytes.

However, I don't think that's a "good first issue". I'm not even sure it's worth doing.

So I'm going to go ahead and close this because I think it's basically done. Thanks for checking!


Last updated: Nov 22 2024 at 16:03 UTC