wasmtime / PR #13055 perf(aarch64): use lr-only linkage f... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #13055 perf(aarch64): use lr-only linkage f...

Wasmtime GitHub notifications bot (Apr 11 2026 at 10:13):

pnodet opened PR #13055 from pnodet:aarch64-opt2-lr-only to bytecodealliance:main.

Wasmtime GitHub notifications bot (Apr 11 2026 at 10:16):

pnodet edited PR #13055:

This updates AArch64 prologue/epilogue generation to use an LR-only linkage frame for a narrow set of simple regular-call functions. Instead of always doing + and restoring both registers, we now use with the same 16-byte stack adjustment when it’s safe to do so.

The optimization is intentionally conservative. It does not apply when frame pointers are required, when unwind info is enabled, when return-address signing is enabled, or when the frame layout needs full FP-based setup. In those cases we keep the existing FP/LR path unchanged.

The goal is to trim unnecessary frame setup/teardown work in the common eligible cases while preserving ABI alignment and keeping behavior identical outside that narrow window.

Wasmtime GitHub notifications bot (Apr 11 2026 at 10:17):

pnodet edited PR #13055:

This updates AArch64 prologue/epilogue generation to use an LR-only linkage frame for a narrow set of simple regular-call functions. Instead of always doing stp fp, lr + mov fp, sp and restoring both registers, we now use str/ldr lr with the same 16-byte stack adjustment when it’s safe to do so.

The optimization is intentionally conservative. It does not apply when frame pointers are required, when unwind info is enabled, when return-address signing is enabled, or when the frame layout needs full FP-based setup. In those cases we keep the existing FP/LR path unchanged.

The goal is to trim unnecessary frame setup/teardown work in the common eligible cases while preserving ABI alignment and keeping behavior identical outside that narrow window.

Wasmtime GitHub notifications bot (Apr 11 2026 at 13:03):

github-actions[bot] added the label cranelift on PR #13055.

Wasmtime GitHub notifications bot (Apr 11 2026 at 13:03):

github-actions[bot] added the label cranelift:area:aarch64 on PR #13055.

Wasmtime GitHub notifications bot (Apr 23 2026 at 17:34):

cfallin commented on PR #13055:

@pnodet I see this is still a draft but could you clarify what performance changes, if any, you've measured with this change?

Naively at least, I would expect that the store-pair of fp/lr and the single store of lr, both to a 16-byte slot on the stack, to have almost equal performance on modern CPUs -- the hardware does the store in a single action in either case (single store-buffer slot, single instruction issue), just a different datapath width. Maybe different execution ports, small differences in ILP-heavy workloads, etc. Have you measured a speedup with this?

Last updated: Jun 01 2026 at 09:49 UTC