elliottt edited PR #4906 from trevor/isle-overlap-checker
to main
:
Introduce overlap checking for ISLE as the start of addressing #4717.
The overlap checker detects cases where ISLE rules could fire for the same input. Consider the following example from the x64 lowerings (line numbers added for clarity):
13: (rule (sse_xor_op $F32X4) (SseOpcode.Xorps)) 14: (rule (sse_xor_op $F64X2) (SseOpcode.Xorpd)) 15: (rule (sse_xor_op (multi_lane _bits _lanes)) (SseOpcode.Pxor))
Rule 15 overlaps separately with rules 13 and 14, but rules 13 and 14 don't overlap with each other. With overlap checking enabled this will be reported as an overlap of all three rules, where the rule on line 15 will be identified as the source of the overlap (error messages could definitely be improved here). In this case the overlap can be resolved by setting a lower priority for rule 3.
Error: × overlap error: rules are overlapping │ ../x64.isle:15:0 │ ../x64.isle:13:0 │ ../x64.isle:14:0
For another example, consider the following:
1: (rule (example $F32 25) ...) 2: (rule (example $F32 (fallible_extractor y)) ...) 3: (rule (example $F64 _) ...)
In this case, only the rules on lines 1 and 2 overlap, as we don't know that the fallible extractor won't fire successfully on the input
25
. As the rule on line 3 has a different leading pattern ($F64
) it won't be part of the overlap group generated.However, if the example looks instead like the following:
1: (rule (example $F32 (fallible_extractor 25)) ...) 2: (rule (example $F32 (fallible_extractor 31)) ...) 3: (rule (example $F64 _) ...)
We consider there to be no overlap in the rules as fallible extractors are expected to be pure.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
- [ ] This has been discussed in issue #..., or if not, please tell us why
here.- [ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.- [ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
elliottt updated PR #4906 from trevor/isle-overlap-checker
to main
.
jameysharp submitted PR review.
jameysharp created PR review comment:
How much trouble would it be to make this a runtime option of some sort?
jameysharp created PR review comment:
How would you feel about a
HashSet<RuleId>
instead ofVec<bool>
? Or, if you want the edges sorted by rule ID, at least aBTreeSet
.As written, this uses O(n^2) space even if there aren't any errors.
A cute side benefit is that then you can tag this struct with
#[derive(Default)]
and useNode::default
instead of having to hand-implementNode::new
.
jameysharp submitted PR review.
jameysharp created PR review comment:
There's a neat idiom if
self.edges
were aHashSet
orBTreeSet
:if self.edges.insert(other.0) { self.degree += 1; }
The same pattern works for calling
edges.remove
inremove_edge
below.
jameysharp created PR review comment:
/// Check for overlapping rules within a single priority group.
jameysharp created PR review comment:
I'm not sure whether it's an improvement here, but there's a handy macro in the standard library for this pattern:
matches!( self.get_term(id).kind, TermKind::Decl { constructor_kind: Some(ConstructorKind::InternalConstructor), .. }, )
jameysharp created PR review comment:
Assuming that you switch both
Node
andErrors
to using hash-tables, you can implementadd_edge
concisely like this:self.nodes.entry(a.0).or_insert_with(Node::new).add_edge(b); self.nodes.entry(b.0).or_insert_with(Node::new).add_edge(a);
jameysharp created PR review comment:
Is the comment that "the edges are normalized by ordering the rule ids" still true? I'm guessing that's left over from an earlier version.
Like for
Node
, I'd preferHashMap
overVec
so there's no heap allocation unless an error is actually found. And like forNode
that means you can use#[derive(Default)]
.
jameysharp created PR review comment:
I'd suggest making
Errors::report
consumeself
, instead of taking it by reference. You don't need it any more afterward, and anyway there's nothing useful in it: all the edges have been deleted.
jameysharp created PR review comment:
Does this guarantee that the degree is monotonically decreasing? I can't quite pin it down but I feel like there's some situation where higher-degree rules can reduce some other rule's degree to the point that by the time you get there, it has lower degree than later rules (but still non-zero degree).
Here's someplace I'd actually be happier with an O(n^2) algorithm: Iterate over the rules to find one that currently has maximum degree (perhaps with
Iterator::max_by_key
), process it, and repeat. As long asn
here is the number of rules that actually have overlaps, rather than the total number of rules, a quadratic algorithm is free in the expected case where we've already fixed all the overlaps.
jameysharp created PR review comment:
The
Errors::remove_edges
method is only called byreport
. I'd suggest inlining it there, and here's why:If you change
Errors
to contain aHashMap
, then you could establish an invariant that every node in the map has non-zero degree. That's useful becauseErrors::is_empty
can just callHashMap::is_empty
, andErrors::report
doesn't need to filter the set of nodes to find non-empty ones.That invariant automatically holds initially and after any sequence of calls to
add_edge
. Butremove_edges
breaks that invariant unless you're careful. Still, you don't need the invariant once you start looping over the overlaps inreport
, so if you inline this there then you don't have to be careful about the invariant.
jameysharp created PR review comment:
I don't think there's a good idiom for "loop over all pairs in this slice", but I found this version of it confusing for a moment. Maybe this instead?
let mut rows = &matrix.rows[..]; while let Some((rule, rest)) = rows.split_first() { for other in rest { self.add_edge(rule.rule, other.rule); } rows = others; }
jameysharp created PR review comment:
I think
errors.report
will correctly return an empty vector if there are no errors, so I don't think you need theis_empty
check up front. Also I think Rust will let you match directly on the returned vector:match errors.report(&env) { [] => Ok(()), [e] => Err(e), errors => Err(Error::Errors(errors)), }
jameysharp created PR review comment:
I think
sort_unstable_by_key
should do fine here, and it lets you avoid temporary heap allocations for the cached keys. You can impose a total order for the sort by using the pair of(degree, id)
as the sort key, so stable sort isn't important. Neither is the order that you iterate over nodes when building therules
vector, so it's okay forself.nodes
to be aHashMap
.
elliottt submitted PR review.
elliottt created PR review comment:
I used a HashSet to begin with, and it was considerably slower when building without
--release
. The switch to this dense representation made the runtime go from 55s for the x64 lowering down to about 10s. One thought I had was that the vector could be anOption<Vec<Bool>>
where it would only allocate the bool vector if there was overlap present involving that rule. Another thought I had was that something like c++'s specialization ofstd::vector<bool>
would be really nice here.
elliottt submitted PR review.
elliottt created PR review comment:
The motivation for improving performance when building without
--release
is that cranelift currently builds the isle source without--release
in the cranelift-codegen build.rs.
jameysharp created PR review comment:
You don't need to wrap the
Vec
in anOption
;Vec::new()
already doesn't heap-allocate, so you can just useVec::is_empty
as your flag instead ofSome
/None
. And yeah, I keep wanting to use thebitvec
crate for bit-packed boolean vectors, but we don't use it anywhere else and we don't add dependencies if we can help it.But also: wat? How could HashSet be that much slower? I feel like there must have been something else going on. (While trying to come up with hypotheses that might explain the difference, I worked out that the HashSet is smaller than the
Vec<bool>
as long as a rule overlaps with less than 11% of the other rules, but that doesn't seem likely to answer the question.)Also, in the long run I don't want to optimize for the case where there are a bunch of overlap errors if doing that makes things worse when there aren't any errors. While we're fixing the current backlog of overlaps, of course, it's reasonable to try to keep the debug loop short. But on the other hand, we can use
--release
builds of islec for that, right?
jameysharp submitted PR review.
elliottt submitted PR review.
elliottt created PR review comment:
My assumption about why the vector is faster is that all the allocation is done up front. I don't know the specifics of how
HashSet
is implemented under the hood, but a lot of insertions to different rows could cause an allocation for each insertion, right?I agree that it's not worth over-optimizing for the case where we have lots of overlap, but we're talking about 1MB of memory to track the overlaps in the x64 lowerings. Paired with the one-off allocation strategy when we need to start adding overlap edges, it feels like this is a reasonable strategy to keep, but if you'd rather go back to the
HashSet
I'm not opposed.
jameysharp submitted PR review.
jameysharp created PR review comment:
For anyone else following this: After a bunch of profiling with Trevor, we've learned that the
Errors::overlap_error
method is getting called super frequently, to the point that many of the x64 rules are getting marked millions of times each. So tiny differences in how long that method takes get multiplied out to big runtime differences, which is why two data structures show significantly different performance despite both supporting O(1) access. Even the bounds-checks in the vector version have a measurable impact, if I remember theperf
report correctly.It looks like wildcard handling is currently exponential. We investigated ways to reduce that and Trevor has some things he's planning to try. I think fixing this may not only improve runtime but also fix some false positives in the overlap check.
elliottt updated PR #4906 from trevor/isle-overlap-checker
to main
.
elliottt updated PR #4906 from trevor/isle-overlap-checker
to main
.
elliottt created PR review comment:
I've changed it so that the feature controls whether or not the overlaps are considered errors. It currently only prints the number of overlaps when the feature is disabled, would you prefer that it always prints the errors out even if they're not going to fail the build?
elliottt submitted PR review.
elliottt submitted PR review.
elliottt created PR review comment:
I reworked the overlap checking to work over pairs of related rules instead, which is much faster and reported more accurate overlap information.
jameysharp submitted PR review.
jameysharp created PR review comment:
Nah, printing just the overlap count is fine, I think.
elliottt updated PR #4906 from trevor/isle-overlap-checker
to main
.
elliottt updated PR #4906 from trevor/isle-overlap-checker
to main
.
elliottt updated PR #4906 from trevor/isle-overlap-checker
to main
.
elliottt submitted PR review.
elliottt created PR review comment:
Great idea!
elliottt submitted PR review.
elliottt created PR review comment:
I could go either way on this, do you feel strongly about switching to the macro?
elliottt submitted PR review.
elliottt created PR review comment:
I think this is a bad name for this type now, how about
Patterns
?
elliottt has marked PR #4906 as ready for review.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
As far as I can tell, this field is never read and can be removed, along with the
is_single_constructor_enum
method. I'm puzzled why you didn't get a build warning about this...Once you do that I believe
Variant
andExtractor
are treated the same way everywhere so you can merge them. I'm assuming that the sameTermId
is never used both for an enum variant and for an extractor.
jameysharp created PR review comment:
I do not feel strongly about that, nope!
jameysharp created PR review comment:
This is immediately converted back to an iterator by its only caller, so I'd return
impl Iterator<Item = RuleId>
instead of calling.collect()
.
elliottt submitted PR review.
elliottt created PR review comment:
As far as I can tell, this field is never read and can be removed, along with the
is_single_constructor_enum
method. I'm puzzled why you didn't get a build warning about this...This was originally when I was going to try to reason about the interaction of single-constructor enums to know that no other case was ever possible. I'm not sure there's anything we can do with that information now, so it should be removed.
Once you do that I believe
Variant
andExtractor
are treated the same way everywhere so you can merge them. I'm assuming that the sameTermId
is never used both for an enum variant and for an extractor.We'd still need to be able to differentiate between a Variant and an Extractor for the case where the match fails: in that case the variant means that the two don't overlap, while the extractor could still potentially fire.
elliottt updated PR #4906 from trevor/isle-overlap-checker
to main
.
jameysharp submitted PR review.
jameysharp created PR review comment:
Yeah, I figured that out after writing this comment. :sweat_smile:
jameysharp updated PR #4906 from trevor/isle-overlap-checker
to main
.
jameysharp updated PR #4906 from trevor/isle-overlap-checker
to main
.
jameysharp updated PR #4906 from trevor/isle-overlap-checker
to main
.
jameysharp updated PR #4906 from trevor/isle-overlap-checker
to main
.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Do we need to flatten the
and
s here? E.g. what happens if I have a left-hand side like(rule (f (and (and (g _) (h _)) (k _))) rhs)
?
jameysharp submitted PR review.
jameysharp created PR review comment:
We would need to do that, except we decided to flatten those earlier in the pipeline, and merged that in #4915.
cfallin created PR review comment:
Oh, excellent! I somehow missed that. Thanks!
cfallin submitted PR review.
cfallin submitted PR review.
elliottt merged PR #4906.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
This adds a lot of dependencies to Cranelift. Is the overlap checking really so expensive that paralleling it makes a lot of difference?
bjorn3 submitted PR review.
bjorn3 created PR review comment:
I just checked it. Takes less than half a second total even when not parallelizing. Will open a PR to remove rayon.
elliottt created PR review comment:
We depend on rayon elsewhere in cranelift, which is why we added it to the overlap checker. Were you removing it completely from cranelift?
elliottt submitted PR review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
wasmtime-cranelift depends on rayon, but cranelift itself doesn't apart from cranelift-isle because of this PR. https://github.com/bjorn3/rustc_codegen_cranelift/commit/266e96785ab71834b917bf474f130a6d8fdecd4b reverted cg_clif back to cranelift 0.88.1 from before this PR. This commit removes rayon from cg_clif's Cargo.lock.
elliottt created PR review comment:
Ah I see. My assumption had been that since it was in
cranelift/Cargo.toml
that it would be okay to rely on it in isle.
elliottt submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
FWIW
cranelift/Cargo.toml
defines the crate forcranelift-tools
, the CLI utility; we do want to keep dependencies ofcranelift-codegen
as minimal as possible, and sincecranelift-isle
is a build-dep of that I'd consider the same policy to apply. Sorry for not catching this earlier!
jameysharp submitted PR review.
jameysharp created PR review comment:
If I remember correctly, when we added the rayon dependency, overlap checking was dramatically slower, so parallelism shaved a lot of time off the wasmtime build. But by the time we actually merged overlap checking to
main
, we'd fixed its performance, so rayon didn't matter any more; we just didn't re-evaluate it at that point.So thank you for #5101: it's a good change.
Last updated: Jan 24 2025 at 00:11 UTC