cfallin edited PR #6804:
Currently,
MachBuffer
is used to resolve forward and backward references to labels (basic-block entry points and other branch targets) within function bodies as well as calls to functions within a whole module. It implements a single-pass algorithm that tracks "pending fixups" -- references to some other place -- and fills in the offsets when known, also handling upgrading ranges by emitting veneers on architectures that need that behavior.In #6798 it was reported that on aarch64, module emission is slow for a certain large module. This module ends up forcing many islands of veneers but has branches whose ranges cross those islands and need not be extended with a veneer yet. This case exposes quadratic behavior: the island-emission logic puts the fixup back in a list and replaces the fixup list with that leftover-fixup list after processing all fixups.
This PR instead does the following:
- When an island is emitted, all unresolved forward references get a veneer, even if their current label kind would still be in range. This prevents fixups from hanging around arbitrarily long when they have longer range kinds (e.g., aarch64's
Branch26
) while lots of islands for shorter-range references (e.g.Branch19
) are emitted.- When a label fixup reaches its "longest-range" kind (
PCRel32
for branches, for example), we put it in a separate list that we don't process again until the very last "last-chance island" after all emission.These two rules together will bound the amount of processing to
O(|fixups| * |label kinds|)
rather thanO(|fixups|^2)
.The first new rule theoretically causes more veneers to be emitted, but in practice is not too likely to make a difference, I think. Unfortunately my aarch64 laptop is unavailable at the moment; if someone on that platform could benchmark the impact that would be quite appreciated!
Fixes #6798.
cfallin has marked PR #6804 as ready for review.
cfallin requested wasmtime-compiler-reviewers for a review on PR #6804.
cfallin requested fitzgen for a review on PR #6804.
alexcrichton submitted PR review:
I'm a bit on the fence about this, but I think we should probably go with it. The main part that doesn't sit well with me is that I don't think there's a great way to describe this algorithm and its consequences. Any branch can impact any other branch at any time, so what a branch is doing can't be explained or understood with any local reasoning.
Running this through sightglass shows no perf difference, but that's expected. What I'd be worried about is that a critical branch in a hot loop could go awry and cause performance issues. That's a very precise situation to be in though and additionally in some sense if an island is emitted in the middle of a loop the you've already lost perf-wise anyway.
This also feels a bit weird in the sense that we do veneer promotion a few times but there's not really a great way to describe the consequences of it. For example there's not really a great reason to force all pending fixups to get a veneer every time an island is emitted other than "it was the smallest patch at the time to fix quadratic behavior". The comments indicate only in passing that the purpose of this is to avoid quadratic behavior, but I think it would be worth expanding on that. The quadratic-ness isn't necessarily inherent because as I described by representing
fixup_records
with aBinaryHeap
it's possible to restore local reasoning about branches without quadratic behavior. That solution isn't viable, however, with the branch optimizations happening in the buffer.I left a few coments as well for stylistic things, but given how this is expected to have little-to-no impact on runtime and clearly has a large impact on compile time I think it's good to land this. I do think the comments should be updated though to indicate that the purpose for this is purely to avoid larger changes and the performance implications of this are uncertain and likely to cause issues at some point, but that "point" is far enough away that we're buying ourselves runway.
alexcrichton submitted PR review:
I'm a bit on the fence about this, but I think we should probably go with it. The main part that doesn't sit well with me is that I don't think there's a great way to describe this algorithm and its consequences. Any branch can impact any other branch at any time, so what a branch is doing can't be explained or understood with any local reasoning.
Running this through sightglass shows no perf difference, but that's expected. What I'd be worried about is that a critical branch in a hot loop could go awry and cause performance issues. That's a very precise situation to be in though and additionally in some sense if an island is emitted in the middle of a loop the you've already lost perf-wise anyway.
This also feels a bit weird in the sense that we do veneer promotion a few times but there's not really a great way to describe the consequences of it. For example there's not really a great reason to force all pending fixups to get a veneer every time an island is emitted other than "it was the smallest patch at the time to fix quadratic behavior". The comments indicate only in passing that the purpose of this is to avoid quadratic behavior, but I think it would be worth expanding on that. The quadratic-ness isn't necessarily inherent because as I described by representing
fixup_records
with aBinaryHeap
it's possible to restore local reasoning about branches without quadratic behavior. That solution isn't viable, however, with the branch optimizations happening in the buffer.I left a few coments as well for stylistic things, but given how this is expected to have little-to-no impact on runtime and clearly has a large impact on compile time I think it's good to land this. I do think the comments should be updated though to indicate that the purpose for this is purely to avoid larger changes and the performance implications of this are uncertain and likely to cause issues at some point, but that "point" is far enough away that we're buying ourselves runway.
alexcrichton created PR review comment:
I know I've said this before but if call sites require
/* foo = */ false
for disambiguation then I think it's better to useFoo::Yes
(or w/e as appropriate) instead because then it requires it to be human readable rather than only optionally with this style being readable.
alexcrichton created PR review comment:
Instead of having
supports_veneer
both here and above could the logic to test for veneer support be pushed intouse_label_at_offset
? There it would determine whether to push onto eitherfixup_records
orfixup_records_max_range
, and would enable having an invariant where all entries infixup_records
support veneers
cfallin submitted PR review.
cfallin created PR review comment:
Unfortunately this one is not quite as simple as it seems: the fixup needs to go into
fixup_records
even if it doesn't support veneers if it is added as a result of a branch, because of the invariant tying recent-branches to fixups. It would have been nice though!
cfallin submitted PR review.
cfallin created PR review comment:
Yep, an old C-ism that dies hard -- thanks, fixed.
cfallin updated PR #6804.
cfallin updated PR #6804.
cfallin has enabled auto merge for PR #6804.
cfallin merged PR #6804.
Last updated: Jan 24 2025 at 00:11 UTC