cfallin opened PR #4644 from fix-branch-veneer-threshold
to main
:
To determine whether we need to insert a "veneer island" of
branch-range extension veneers, we need to know ahead of emitting a
basic block the worst-case size of that block. This is because veneers
only go between blocks (we could plop one in the middle of a block but
that would require another jump around it and would probably pessimize
some code significantly), and we can't back up once we emit a block.To compute this worst-case size, we take the number of instructions
and multiply by the largest possible size of one pseudoinst (e.g., on
aarch64, this is 44 bytes; it explicitly excludes theEmitIsland
pseudo-op which is used before large jumptable inline offset tables
are emitted). This is conservative, but it always works, and veneers
are somewhat rare in practice (function body >1MiB on aarch64 for
example).Unfortunately this logic didn't account for the spill/reload/move
instructions inserted by the register allocator, and in one example in
issue #4629, a block had only one instruction but 482
edge-moves (!). This came at just the wrong time as we were
approaching the 1MiB limit on aarch64.This PR fixes that issue, and fixes the logic to actually look at the
correct next block (next infinal_order
rather than numerically
next), as a bonus correctness fix.Fixes #4629.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin requested alexcrichton for a review on PR #4644.
alexcrichton submitted PR review.
cfallin merged PR #4644.
Last updated: Jan 24 2025 at 00:11 UTC