Stream: git-wasmtime

Topic: wasmtime / PR #13055 perf(aarch64): use lr-only linkage f...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2026 at 10:13):

pnodet opened PR #13055 from pnodet:aarch64-opt2-lr-only to bytecodealliance:main.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2026 at 10:16):

pnodet edited PR #13055:

This updates AArch64 prologue/epilogue generation to use an LR-only linkage frame for a narrow set of simple regular-call functions. Instead of always doing + and restoring both registers, we now use with the same 16-byte stack adjustment when it’s safe to do so.

The optimization is intentionally conservative. It does not apply when frame pointers are required, when unwind info is enabled, when return-address signing is enabled, or when the frame layout needs full FP-based setup. In those cases we keep the existing FP/LR path unchanged.

The goal is to trim unnecessary frame setup/teardown work in the common eligible cases while preserving ABI alignment and keeping behavior identical outside that narrow window.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2026 at 10:17):

pnodet edited PR #13055:

This updates AArch64 prologue/epilogue generation to use an LR-only linkage frame for a narrow set of simple regular-call functions. Instead of always doing stp fp, lr + mov fp, sp and restoring both registers, we now use str/ldr lr with the same 16-byte stack adjustment when it’s safe to do so.

The optimization is intentionally conservative. It does not apply when frame pointers are required, when unwind info is enabled, when return-address signing is enabled, or when the frame layout needs full FP-based setup. In those cases we keep the existing FP/LR path unchanged.

The goal is to trim unnecessary frame setup/teardown work in the common eligible cases while preserving ABI alignment and keeping behavior identical outside that narrow window.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2026 at 13:03):

github-actions[bot] added the label cranelift on PR #13055.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2026 at 13:03):

github-actions[bot] added the label cranelift:area:aarch64 on PR #13055.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 23 2026 at 17:34):

cfallin commented on PR #13055:

@pnodet I see this is still a draft but could you clarify what performance changes, if any, you've measured with this change?

Naively at least, I would expect that the store-pair of fp/lr and the single store of lr, both to a 16-byte slot on the stack, to have almost equal performance on modern CPUs -- the hardware does the store in a single action in either case (single store-buffer slot, single instruction issue), just a different datapath width. Maybe different execution ports, small differences in ILP-heavy workloads, etc. Have you measured a speedup with this?


Last updated: May 03 2026 at 23:15 UTC