Stream: git-wasmtime

Topic: wasmtime / PR #6303 aarch64: Remove unnecessary saves on ...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2023 at 00:09):

alexcrichton opened PR #6303 from alexcrichton:save-less to bytecodealliance:main:

The AArch64 AAPCS calling convention, which all our backends use for register allocation, indicates that the low 64-bits of the v8-v15 registers are all caller-saved. Cranelift doesn't track precisely which registers are used, however, so it says the entire register is a caller-saved register. This currently interacts poorly where a call from one function to another where the ABIs differ forces the caller to save all of v8-v15 in the prologue and restore it in the epilogue.

Currently in all cases, however, this isn't actually necessary. The AArch64 backend also has an optimization where if both the caller and the callee are using the same ABI then the clobbers of a call instruction are not counted in the clobber set for the function since nothing new can be clobbered. This way if v8 is never used, for example, it's not considered clobbered and it's not saved. This logic, however, is comparing ABIs exactly which means that different names for the same ABI, which don't differ in register allocation, force saves to happen.

This comes up with trampolines generated by Wasmtime where the calling convention of the trampoline is WasmtimeSystemV, for example, where the callee (a wasm function) is Fast. Because these differ it means that all trampolines are generating saves/restores of registers, despite the actual underlying calling convention being the same.

This commit updates the optimization that skips including a call instruction in the clobber set by comparing the caller and callee's ABI clobber sets. If both clobber the same registers then for the purposes of clobbers it's as-if they were the same ABI, so the call can be skipped.

Overall this removes unnecessary saves/restores in trampolines generated by Cranelift and shrinks the size of spidermonkey.wasm by 2% after #6262.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2023 at 00:09):

alexcrichton requested cfallin for a review on PR #6303.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2023 at 00:09):

alexcrichton requested wasmtime-compiler-reviewers for a review on PR #6303.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2023 at 00:17):

cfallin submitted PR review:

Looks good, thanks!

Strictly speaking the check could be even looser, I think: all that's required is that the clobbers allowed by the caller's ABI contain the clobbers allowed by the callee's ABI. PRegSet is a bitset (u128) internally, so we could implement this with caller_clobbers & callee_clobbers == callee_clobbers (which tests callee_clobberscaller_clobbers). Unfortunately this is an opaque newtype wrapper and I didn't think to add .into_bits() at the time, and the definition is in regalloc2. If you feel super-motivated I'm happy to review a PR for that otherwise I think this is totally fine!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2023 at 15:43):

alexcrichton updated PR #6303.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2023 at 17:45):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 28 2023 at 18:24):

alexcrichton merged PR #6303.


Last updated: Jan 24 2025 at 00:11 UTC