fitzgen opened issue #5183:
This could maybe give us some small perf gains due to a smaller working set that better fits in cache.
SigDatais defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627The two changes we could make to shrink its size are:
We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle
Range<u32>to represent the subslice inSigSet::abi_argsfor this signature's arguments and returns. Instead we could have au32start and au16length. Doing this for both args and returns would save a total of 4 bytes.
SigData::stack_reg_argis currently anOption<usize>but, again because of implementation limits on the number of returns, could beOption<u16>. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.For anyone who picks this up, we can measure the impact this change has via
Building the bench API on
main:
$ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.soBuilding the bench API on the feature branch for this patch:
$ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.soRunning the sightglass benchmarks:
$ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
(You can also try passing--measure perf-countersto see the effects on cache accesses/misses if you're on linux.)
fitzgen labeled issue #5183:
This could maybe give us some small perf gains due to a smaller working set that better fits in cache.
SigDatais defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627The two changes we could make to shrink its size are:
We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle
Range<u32>to represent the subslice inSigSet::abi_argsfor this signature's arguments and returns. Instead we could have au32start and au16length. Doing this for both args and returns would save a total of 4 bytes.
SigData::stack_reg_argis currently anOption<usize>but, again because of implementation limits on the number of returns, could beOption<u16>. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.For anyone who picks this up, we can measure the impact this change has via
Building the bench API on
main:
$ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.soBuilding the bench API on the feature branch for this patch:
$ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.soRunning the sightglass benchmarks:
$ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
(You can also try passing--measure perf-countersto see the effects on cache accesses/misses if you're on linux.)
fitzgen labeled issue #5183:
This could maybe give us some small perf gains due to a smaller working set that better fits in cache.
SigDatais defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627The two changes we could make to shrink its size are:
We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle
Range<u32>to represent the subslice inSigSet::abi_argsfor this signature's arguments and returns. Instead we could have au32start and au16length. Doing this for both args and returns would save a total of 4 bytes.
SigData::stack_reg_argis currently anOption<usize>but, again because of implementation limits on the number of returns, could beOption<u16>. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.For anyone who picks this up, we can measure the impact this change has via
Building the bench API on
main:
$ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.soBuilding the bench API on the feature branch for this patch:
$ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.soRunning the sightglass benchmarks:
$ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
(You can also try passing--measure perf-countersto see the effects on cache accesses/misses if you're on linux.)
fitzgen labeled issue #5183:
This could maybe give us some small perf gains due to a smaller working set that better fits in cache.
SigDatais defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627The two changes we could make to shrink its size are:
We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle
Range<u32>to represent the subslice inSigSet::abi_argsfor this signature's arguments and returns. Instead we could have au32start and au16length. Doing this for both args and returns would save a total of 4 bytes.
SigData::stack_reg_argis currently anOption<usize>but, again because of implementation limits on the number of returns, could beOption<u16>. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.For anyone who picks this up, we can measure the impact this change has via
Building the bench API on
main:
$ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.soBuilding the bench API on the feature branch for this patch:
$ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.soRunning the sightglass benchmarks:
$ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
(You can also try passing--measure perf-countersto see the effects on cache accesses/misses if you're on linux.)
jameysharp commented on issue #5183:
It might also be possible to entirely remove the end-points of the argument and return ranges. I haven't checked carefully but I think the end-point in one
SigDatais always equal to the start-point in the nextSigData, and the lastSigData's end-point is the length of the corresponding backing array.If we move all the
SigDatamethods that take aSigSetto be onSigSetinstead and take aSigindex—which I think is a good idea for API design reasons anyway—then it's easy to access theSig+1index.Are there any constraints on where the stack-ret arg can appear in the list? For example, if it's always the first argument, we could replace the
Option<usize>with abool. If there are no constraints, we could still cut the field's size in half by getting rid of theOption, if we can pick a reserved value likeu16::MAXto indicate that there's no stack-ret arg.Also, do the
sized_stack_*_spacefields need to bei64? For one thing, what does it mean to reserve a negative amount of stack space? For another, surely 32 bits is enough for the size of a stack frame...?All together, I think we can shrink
SigDatafrom 7u64to 5u32, taking padding and alignment into account, for a savings of 64%.
fitzgen commented on issue #5183:
It might also be possible to entirely remove the end-points of the argument and return ranges. I haven't checked carefully but I think the end-point in one
SigDatais always equal to the start-point in the nextSigData, and the lastSigData's end-point is the length of the corresponding backing array.If we move all the
SigDatamethods that take aSigSetto be onSigSetinstead and take aSigindex—which I think is a good idea for API design reasons anyway—then it's easy to access theSig+1index.Yeah, this is possible but a larger change and probably not a
good first issueanymore.Are there any constraints on where the stack-ret arg can appear in the list? For example, if it's always the first argument, we could replace the
Option<usize>with abool. If there are no constraints, we could still cut the field's size in half by getting rid of theOption, if we can pick a reserved value likeu16::MAXto indicate that there's no stack-ret arg.There aren't, but in practice I think it is always the first. Maybe we could just start enforcing that? Again, probably not in the realm of a
good first issuethough.Also, do the
sized_stack_*_spacefields need to bei64? For one thing, what does it mean to reserve a negative amount of stack space? For another, surely 32 bits is enough for the size of a stack frame...?Yeah I was kinda wondering that too. Haven't investigated it, so didn't want to include it in a
good first issue.
tnachen commented on issue #5183:
I'll try out changing the SigData methods as suggested with Sigset
jameysharp commented on issue #5183:
Let us know if you have any questions! I'm excited that you're digging into this.
TornaxO7 commented on issue #5183:
May I ask what's still missing for this issue?
jameysharp closed issue #5183:
This could maybe give us some small perf gains due to a smaller working set that better fits in cache.
SigDatais defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627The two changes we could make to shrink its size are:
We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle
Range<u32>to represent the subslice inSigSet::abi_argsfor this signature's arguments and returns. Instead we could have au32start and au16length. Doing this for both args and returns would save a total of 4 bytes.
SigData::stack_reg_argis currently anOption<usize>but, again because of implementation limits on the number of returns, could beOption<u16>. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.For anyone who picks this up, we can measure the impact this change has via
Building the bench API on
main:
$ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.soBuilding the bench API on the feature branch for this patch:
$ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.soRunning the sightglass benchmarks:
$ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite
(You can also try passing--measure perf-countersto see the effects on cache accesses/misses if you're on linux.)
jameysharp commented on issue #5183:
Good question! I don't remember for sure.
I think the only remaining possibility for shrinking this struct further is the
Option<u16>storingstack_ret_arg, which is four bytes with one byte of padding in the middle. I think theisa::CallConvtype is one byte, so with all the other fields being four bytes, thecall_convfield has three bytes of padding. Therefore if we can pack the calling convention in with the return argument index, we can save four bytes.However, I don't think that's a "good first issue". I'm not even sure it's worth doing.
So I'm going to go ahead and close this because I think it's basically done. Thanks for checking!
Last updated: Dec 06 2025 at 06:05 UTC