alexcrichton opened PR #6303 from alexcrichton:save-less
to bytecodealliance:main
:
The AArch64 AAPCS calling convention, which all our backends use for register allocation, indicates that the low 64-bits of the v8-v15 registers are all caller-saved. Cranelift doesn't track precisely which registers are used, however, so it says the entire register is a caller-saved register. This currently interacts poorly where a call from one function to another where the ABIs differ forces the caller to save all of v8-v15 in the prologue and restore it in the epilogue.
Currently in all cases, however, this isn't actually necessary. The AArch64 backend also has an optimization where if both the caller and the callee are using the same ABI then the clobbers of a
call
instruction are not counted in the clobber set for the function since nothing new can be clobbered. This way ifv8
is never used, for example, it's not considered clobbered and it's not saved. This logic, however, is comparing ABIs exactly which means that different names for the same ABI, which don't differ in register allocation, force saves to happen.This comes up with trampolines generated by Wasmtime where the calling convention of the trampoline is
WasmtimeSystemV
, for example, where the callee (a wasm function) isFast
. Because these differ it means that all trampolines are generating saves/restores of registers, despite the actual underlying calling convention being the same.This commit updates the optimization that skips including a
call
instruction in the clobber set by comparing the caller and callee's ABI clobber sets. If both clobber the same registers then for the purposes of clobbers it's as-if they were the same ABI, so thecall
can be skipped.Overall this removes unnecessary saves/restores in trampolines generated by Cranelift and shrinks the size of spidermonkey.wasm by 2% after #6262.
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
alexcrichton requested cfallin for a review on PR #6303.
alexcrichton requested wasmtime-compiler-reviewers for a review on PR #6303.
cfallin submitted PR review:
Looks good, thanks!
Strictly speaking the check could be even looser, I think: all that's required is that the clobbers allowed by the caller's ABI contain the clobbers allowed by the callee's ABI.
PRegSet
is a bitset (u128
) internally, so we could implement this withcaller_clobbers & callee_clobbers == callee_clobbers
(which testscallee_clobbers
⊂caller_clobbers
). Unfortunately this is an opaque newtype wrapper and I didn't think to add.into_bits()
at the time, and the definition is in regalloc2. If you feel super-motivated I'm happy to review a PR for that otherwise I think this is totally fine!
alexcrichton updated PR #6303.
cfallin submitted PR review.
alexcrichton merged PR #6303.
Last updated: Jan 24 2025 at 00:11 UTC