cfallin opened PR #4902 from isle-trie-build
to main
:
Previously, the ISLE trie-building process (which constructs the tree of match operations lowered into the generated Rust code) generated code for each priority level separately, for simplicity and to avoid subtle bugs in edge-ordering (see e.g. #4093, #4102 and #4117).
This PR implements a "priority frontier" approach that allows sharing a single match edge across rules of different priority levels, but in a fairly simple way: it builds the trie in rule-priority order (highest to lowest), and at each decision node, tracks the last edge that got an insertion at the previous priority level. Then a rule at the current priority level can be inserted into the subtree of any edge at or beyond that one, but not before it. This requires constant space per node and allows for significant flexibility. Within the range of possible insertion locations, we respect the sort order over trie symbols, as currently, so that matches on the same enum get placed adjacently to each other and can merge into
match
statements.The advantage of this approach is that as we use more priority levels to break ties when rules overlap (as we are planning to do, for more semantic clarity), we do not regress compile-time performance by the use of more priority levels.
The generated code at least on AArch64 is actually almost unchanged (a few
let ... = arg0;
trivial arg-value match ops at the top become shared now), which indicates that our priority order already locked down the important tie-breaks and that this new trie-building mechanism otherwise respects the same ordering that we currently do.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin requested elliottt for a review on PR #4902.
cfallin updated PR #4902 from isle-trie-build
to main
.
elliottt created PR review comment:
Worth using a cached key here?
elliottt created PR review comment:
What do you think about adding a struct to name this pair? Something like
InsertPointer
maybe?
elliottt submitted PR review.
elliottt created PR review comment:
Is the priority value in
last_prio
tracked only for this assert?
elliottt submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
I haven't looked at the surrounding context, but I'd guess that this expression is a single memory access and maybe a couple of arithmetic instructions. So I'd guess that copying the cached keys into another array wouldn't save any time and would waste some space.
If we're micro-optimizing I'd go the other direction and use
sort_unstable_by_key
. If you want a stable sort, in this case you can get it by making the key be(prio, index)
.Unrelated: you might consider using
std::cmp::Reverse
instead of negating the priority. IMO that serves as better documentation, enough that you could maybe delete the comment. I also wouldn't be surprised if it generates better code (by swapping the order of comparison rather than doing any negation) and it avoids worries about integer overflow for negating the most-negative signed value.
cfallin updated PR #4902 from isle-trie-build
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
It was, yeah, and I guess as a side-effect of using the same type for
cur_prio
andlast_prio
(it's actually needed forcur_prio
). Thislet start =
expression got much simpler in the refactor though so it's no longer here.
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin created PR review comment:
Oh, neat, I didn't know about
std::cmp::Reverse
-- thanks! Updated to use that. (And agreed that caching likely doesn't make much sense here: it would just be avoiding theunwrap_or(0)
but that's something like 3 instructions (cmp, generate 0, conditional-move) at the cost of more allocation and memory traffic.
cfallin submitted PR review.
cfallin updated PR #4902 from isle-trie-build
to main
.
cfallin updated PR #4902 from isle-trie-build
to main
.
cfallin updated PR #4902 from isle-trie-build
to main
.
elliottt submitted PR review.
elliottt created PR review comment:
I think this is the same, but avoids testing
start
at each iteration..skip(start.or_else(0)) .position(|edge| edge.symbol == op)
elliottt edited PR review comment.
cfallin updated PR #4902 from isle-trie-build
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Not quite I think, as then
position
will return an index shifted down by the skipped amount, but we can add that back. Updated with that extra bit as another.map()
in the chain.
elliottt created PR review comment:
Ah right, inserting between
start
andcur-prio.edge_idx
means thatcur_prio.edge_idx
needs to be incremented to point at the same element :+1:
elliottt submitted PR review.
elliottt submitted PR review.
elliottt created PR review comment:
This is also the case that currently requires that
cur_prio.edge_idx
be set toNone
on line 251. Is it worth calling that connection out with a comment?
elliottt edited PR review comment.
elliottt edited PR review comment.
cfallin updated PR #4902 from isle-trie-build
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Yup, added (actually at the comment above this where
None
is set).
elliottt submitted PR review.
elliottt created PR review comment:
Yep, I was hoping to add another comment here referencing that one to tie the two together.
cfallin updated PR #4902 from isle-trie-build
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, sure thing -- added one here too!
elliottt submitted PR review.
elliottt created PR review comment:
There's a problem here with removing the edge at
start
from consideration when inserting a new node: if none of the rules at previous priorities had an op inserted that would sort less than the op we're inserting, the node atstart
may shadow it. Here's theextractor_ordering_bug.isle
test with the additional rule(rule 3 (test (identity 3)) 3)
that shows the bug:(type u32 (primitive u32)) (decl identity (u32) u32) (extern extractor infallible identity identity) (decl is_zero (u32) u32) (extern extractor is_zero is_zero) (decl test (u32) u32) (rule 3 (test (identity 3)) 3) ;; This exposes a bug where infallible extractors were running before fallible ;; ones, as the derived ordering for the `Extract` type was ordering them ahead ;; of the fallible ones. The result is that the fallible `is_zero` extractor ;; never runs, as the `identity` extractor will always succeed before it's ;; called. (rule (test (identity x)) x) (rule (test (is_zero x)) 2)
thread 'main' panicked at 'assertion failed: `(left == right)` left: `Some(0)`, right: `Some(2)`', /tmp/.tmpSbDsCf/extractor_ordering_bug_main.rs:21:5 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace test test_run_run_extractor_ordering_bug ... FAILED
cfallin submitted PR review.
cfallin created PR review comment:
So, the issue with that is that removing the "last frontier plus one" logic for the starting-point also creates a bug: it allows a lower-priority rule to insert an edge prior to an edge that leads to higher-priority leaves.
The fundamental issue is that the sorting of match ops within a priority is at odds with the sharing of the match edges between priorities. The only way for sharing to occur is for the match op in a given prio range to be last, and that same match op to be first in the next prio range.
So there are two possible design regimes:
- Sort order within a prio is absolutely needed for correctness. This is where we were before: the "more specific rule wins" idea. The test case above relies on this: a fallible extractor is more specific than an infallible extractor, and so it must come first.
- Sort order within a prio is arbitrary, and priorities break ties.
My understanding of this whole situation is that by deciding to disallow overlapping rules not disambiguated by priorities, we have eliminated the need for cases like this: we would not allow the ISLE rules above to exist at the same priority, because they overlap.
It was in this context that we realized the need for the "sharing across priority levels" scheme, and this is how this PR arose. In other words, this PR relies on the fact that we are allowed to reorder rules arbitrarily within a priority, and does so in order to maximize sharing.
However, if we also still want to have stricter ordering within a priority, and hold this ordering as semantically binding (as the test case above does), then we must abandon this PR, and this optimization scheme generally. This is because we cannot both strictly sort ops within a priority, and share between priorities, unless the last op of a given priority (in sort order) happens to be the same as the first op of the next priority (in sort order).
So: do we ban overlaps, and then use the resulting freedom to merge edges, getting our efficiency back after adding prios to disambiguate overlaps? Or do we go back to the system that we had before, and strictly respect op ordering?
cfallin closed without merge PR #4902.
Last updated: Nov 22 2024 at 17:03 UTC