Stream: git-wasmtime

Topic: wasmtime / PR #6804 Cranelift: avoid quadratic behavior i...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2023 at 22:54):

cfallin edited PR #6804:

Currently, MachBuffer is used to resolve forward and backward references to labels (basic-block entry points and other branch targets) within function bodies as well as calls to functions within a whole module. It implements a single-pass algorithm that tracks "pending fixups" -- references to some other place -- and fills in the offsets when known, also handling upgrading ranges by emitting veneers on architectures that need that behavior.

In #6798 it was reported that on aarch64, module emission is slow for a certain large module. This module ends up forcing many islands of veneers but has branches whose ranges cross those islands and need not be extended with a veneer yet. This case exposes quadratic behavior: the island-emission logic puts the fixup back in a list and replaces the fixup list with that leftover-fixup list after processing all fixups.

This PR instead does the following:

These two rules together will bound the amount of processing to O(|fixups| * |label kinds|) rather than O(|fixups|^2).

The first new rule theoretically causes more veneers to be emitted, but in practice is not too likely to make a difference, I think. Unfortunately my aarch64 laptop is unavailable at the moment; if someone on that platform could benchmark the impact that would be quite appreciated!

Fixes #6798.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2023 at 22:54):

cfallin has marked PR #6804 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2023 at 22:54):

cfallin requested wasmtime-compiler-reviewers for a review on PR #6804.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2023 at 22:54):

cfallin requested fitzgen for a review on PR #6804.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2023 at 17:48):

alexcrichton submitted PR review:

I'm a bit on the fence about this, but I think we should probably go with it. The main part that doesn't sit well with me is that I don't think there's a great way to describe this algorithm and its consequences. Any branch can impact any other branch at any time, so what a branch is doing can't be explained or understood with any local reasoning.

Running this through sightglass shows no perf difference, but that's expected. What I'd be worried about is that a critical branch in a hot loop could go awry and cause performance issues. That's a very precise situation to be in though and additionally in some sense if an island is emitted in the middle of a loop the you've already lost perf-wise anyway.

This also feels a bit weird in the sense that we do veneer promotion a few times but there's not really a great way to describe the consequences of it. For example there's not really a great reason to force all pending fixups to get a veneer every time an island is emitted other than "it was the smallest patch at the time to fix quadratic behavior". The comments indicate only in passing that the purpose of this is to avoid quadratic behavior, but I think it would be worth expanding on that. The quadratic-ness isn't necessarily inherent because as I described by representing fixup_records with a BinaryHeap it's possible to restore local reasoning about branches without quadratic behavior. That solution isn't viable, however, with the branch optimizations happening in the buffer.

I left a few coments as well for stylistic things, but given how this is expected to have little-to-no impact on runtime and clearly has a large impact on compile time I think it's good to land this. I do think the comments should be updated though to indicate that the purpose for this is purely to avoid larger changes and the performance implications of this are uncertain and likely to cause issues at some point, but that "point" is far enough away that we're buying ourselves runway.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2023 at 17:48):

alexcrichton submitted PR review:

I'm a bit on the fence about this, but I think we should probably go with it. The main part that doesn't sit well with me is that I don't think there's a great way to describe this algorithm and its consequences. Any branch can impact any other branch at any time, so what a branch is doing can't be explained or understood with any local reasoning.

Running this through sightglass shows no perf difference, but that's expected. What I'd be worried about is that a critical branch in a hot loop could go awry and cause performance issues. That's a very precise situation to be in though and additionally in some sense if an island is emitted in the middle of a loop the you've already lost perf-wise anyway.

This also feels a bit weird in the sense that we do veneer promotion a few times but there's not really a great way to describe the consequences of it. For example there's not really a great reason to force all pending fixups to get a veneer every time an island is emitted other than "it was the smallest patch at the time to fix quadratic behavior". The comments indicate only in passing that the purpose of this is to avoid quadratic behavior, but I think it would be worth expanding on that. The quadratic-ness isn't necessarily inherent because as I described by representing fixup_records with a BinaryHeap it's possible to restore local reasoning about branches without quadratic behavior. That solution isn't viable, however, with the branch optimizations happening in the buffer.

I left a few coments as well for stylistic things, but given how this is expected to have little-to-no impact on runtime and clearly has a large impact on compile time I think it's good to land this. I do think the comments should be updated though to indicate that the purpose for this is purely to avoid larger changes and the performance implications of this are uncertain and likely to cause issues at some point, but that "point" is far enough away that we're buying ourselves runway.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2023 at 17:48):

alexcrichton created PR review comment:

I know I've said this before but if call sites require /* foo = */ false for disambiguation then I think it's better to use Foo::Yes (or w/e as appropriate) instead because then it requires it to be human readable rather than only optionally with this style being readable.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2023 at 17:48):

alexcrichton created PR review comment:

Instead of having supports_veneer both here and above could the logic to test for veneer support be pushed into use_label_at_offset? There it would determine whether to push onto either fixup_records or fixup_records_max_range, and would enable having an invariant where all entries in fixup_records support veneers

view this post on Zulip Wasmtime GitHub notifications bot (Aug 07 2023 at 19:41):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 07 2023 at 19:41):

cfallin created PR review comment:

Unfortunately this one is not quite as simple as it seems: the fixup needs to go into fixup_records even if it doesn't support veneers if it is added as a result of a branch, because of the invariant tying recent-branches to fixups. It would have been nice though!

view this post on Zulip Wasmtime GitHub notifications bot (Aug 07 2023 at 19:43):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 07 2023 at 19:43):

cfallin created PR review comment:

Yep, an old C-ism that dies hard -- thanks, fixed.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 07 2023 at 19:43):

cfallin updated PR #6804.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 07 2023 at 19:54):

cfallin updated PR #6804.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 07 2023 at 19:55):

cfallin has enabled auto merge for PR #6804.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 07 2023 at 21:07):

cfallin merged PR #6804.


Last updated: Jan 24 2025 at 00:11 UTC