Stream: git-wasmtime

Topic: wasmtime / PR #4906 Add an overlap checker to ISLE


view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 17:18):

elliottt edited PR #4906 from trevor/isle-overlap-checker to main:

Introduce overlap checking for ISLE as the start of addressing #4717.

The overlap checker detects cases where ISLE rules could fire for the same input. Consider the following example from the x64 lowerings (line numbers added for clarity):

13: (rule (sse_xor_op $F32X4) (SseOpcode.Xorps))
14: (rule (sse_xor_op $F64X2) (SseOpcode.Xorpd))
15: (rule (sse_xor_op (multi_lane _bits _lanes)) (SseOpcode.Pxor))

Rule 15 overlaps separately with rules 13 and 14, but rules 13 and 14 don't overlap with each other. With overlap checking enabled this will be reported as an overlap of all three rules, where the rule on line 15 will be identified as the source of the overlap (error messages could definitely be improved here). In this case the overlap can be resolved by setting a lower priority for rule 3.

Error:
  × overlap error: rules are overlapping
     ../x64.isle:15:0
     ../x64.isle:13:0
     ../x64.isle:14:0

For another example, consider the following:

1: (rule (example $F32 25) ...)
2: (rule (example $F32 (fallible_extractor y)) ...)
3: (rule (example $F64 _) ...)

In this case, only the rules on lines 1 and 2 overlap, as we don't know that the fallible extractor won't fire successfully on the input 25. As the rule on line 3 has a different leading pattern ($F64) it won't be part of the overlap group generated.

However, if the example looks instead like the following:

1: (rule (example $F32 (fallible_extractor 25)) ...)
2: (rule (example $F32 (fallible_extractor 31)) ...)
3: (rule (example $F64 _) ...)

We consider there to be no overlap in the rules as fallible extractors are expected to be pure.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 17:38):

elliottt updated PR #4906 from trevor/isle-overlap-checker to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp created PR review comment:

How much trouble would it be to make this a runtime option of some sort?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp created PR review comment:

How would you feel about a HashSet<RuleId> instead of Vec<bool>? Or, if you want the edges sorted by rule ID, at least a BTreeSet.

As written, this uses O(n^2) space even if there aren't any errors.

A cute side benefit is that then you can tag this struct with #[derive(Default)] and use Node::default instead of having to hand-implement Node::new.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp created PR review comment:

There's a neat idiom if self.edges were a HashSet or BTreeSet:

        if self.edges.insert(other.0) {
            self.degree += 1;
        }

The same pattern works for calling edges.remove in remove_edge below.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp created PR review comment:

/// Check for overlapping rules within a single priority group.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp created PR review comment:

I'm not sure whether it's an improvement here, but there's a handy macro in the standard library for this pattern:

        matches!(
            self.get_term(id).kind,
            TermKind::Decl {
                constructor_kind: Some(ConstructorKind::InternalConstructor),
                ..
            },
        )

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp created PR review comment:

Assuming that you switch both Node and Errors to using hash-tables, you can implement add_edge concisely like this:

        self.nodes.entry(a.0).or_insert_with(Node::new).add_edge(b);
        self.nodes.entry(b.0).or_insert_with(Node::new).add_edge(a);

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp created PR review comment:

Is the comment that "the edges are normalized by ordering the rule ids" still true? I'm guessing that's left over from an earlier version.

Like for Node, I'd prefer HashMap over Vec so there's no heap allocation unless an error is actually found. And like for Node that means you can use #[derive(Default)].

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp created PR review comment:

I'd suggest making Errors::report consume self, instead of taking it by reference. You don't need it any more afterward, and anyway there's nothing useful in it: all the edges have been deleted.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp created PR review comment:

Does this guarantee that the degree is monotonically decreasing? I can't quite pin it down but I feel like there's some situation where higher-degree rules can reduce some other rule's degree to the point that by the time you get there, it has lower degree than later rules (but still non-zero degree).

Here's someplace I'd actually be happier with an O(n^2) algorithm: Iterate over the rules to find one that currently has maximum degree (perhaps with Iterator::max_by_key), process it, and repeat. As long as n here is the number of rules that actually have overlaps, rather than the total number of rules, a quadratic algorithm is free in the expected case where we've already fixed all the overlaps.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp created PR review comment:

The Errors::remove_edges method is only called by report. I'd suggest inlining it there, and here's why:

If you change Errors to contain a HashMap, then you could establish an invariant that every node in the map has non-zero degree. That's useful because Errors::is_empty can just call HashMap::is_empty, and Errors::report doesn't need to filter the set of nodes to find non-empty ones.

That invariant automatically holds initially and after any sequence of calls to add_edge. But remove_edges breaks that invariant unless you're careful. Still, you don't need the invariant once you start looping over the overlaps in report, so if you inline this there then you don't have to be careful about the invariant.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp created PR review comment:

I don't think there's a good idiom for "loop over all pairs in this slice", but I found this version of it confusing for a moment. Maybe this instead?

        let mut rows = &matrix.rows[..];
        while let Some((rule, rest)) = rows.split_first() {
            for other in rest {
                self.add_edge(rule.rule, other.rule);
            }
            rows = others;
        }

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp created PR review comment:

I think errors.report will correctly return an empty vector if there are no errors, so I don't think you need the is_empty check up front. Also I think Rust will let you match directly on the returned vector:

    match errors.report(&env) {
        [] => Ok(()),
        [e] => Err(e),
        errors => Err(Error::Errors(errors)),
    }

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 18:57):

jameysharp created PR review comment:

I think sort_unstable_by_key should do fine here, and it lets you avoid temporary heap allocations for the cached keys. You can impose a total order for the sort by using the pair of (degree, id) as the sort key, so stable sort isn't important. Neither is the order that you iterate over nodes when building the rules vector, so it's okay for self.nodes to be a HashMap.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 19:23):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 19:23):

elliottt created PR review comment:

I used a HashSet to begin with, and it was considerably slower when building without --release. The switch to this dense representation made the runtime go from 55s for the x64 lowering down to about 10s. One thought I had was that the vector could be an Option<Vec<Bool>> where it would only allocate the bool vector if there was overlap present involving that rule. Another thought I had was that something like c++'s specialization of std::vector<bool> would be really nice here.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 19:28):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 19:28):

elliottt created PR review comment:

The motivation for improving performance when building without --release is that cranelift currently builds the isle source without --release in the cranelift-codegen build.rs.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 20:07):

jameysharp created PR review comment:

You don't need to wrap the Vec in an Option; Vec::new() already doesn't heap-allocate, so you can just use Vec::is_empty as your flag instead of Some/None. And yeah, I keep wanting to use the bitvec crate for bit-packed boolean vectors, but we don't use it anywhere else and we don't add dependencies if we can help it.

But also: wat? How could HashSet be that much slower? I feel like there must have been something else going on. (While trying to come up with hypotheses that might explain the difference, I worked out that the HashSet is smaller than the Vec<bool> as long as a rule overlaps with less than 11% of the other rules, but that doesn't seem likely to answer the question.)

Also, in the long run I don't want to optimize for the case where there are a bunch of overlap errors if doing that makes things worse when there aren't any errors. While we're fixing the current backlog of overlaps, of course, it's reasonable to try to keep the debug loop short. But on the other hand, we can use --release builds of islec for that, right?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 20:07):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 20:22):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2022 at 20:22):

elliottt created PR review comment:

My assumption about why the vector is faster is that all the allocation is done up front. I don't know the specifics of how HashSet is implemented under the hood, but a lot of insertions to different rows could cause an allocation for each insertion, right?

I agree that it's not worth over-optimizing for the case where we have lots of overlap, but we're talking about 1MB of memory to track the overlaps in the x64 lowerings. Paired with the one-off allocation strategy when we need to start adding overlap edges, it feels like this is a reasonable strategy to keep, but if you'd rather go back to the HashSet I'm not opposed.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 14 2022 at 01:22):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 14 2022 at 01:22):

jameysharp created PR review comment:

For anyone else following this: After a bunch of profiling with Trevor, we've learned that the Errors::overlap_error method is getting called super frequently, to the point that many of the x64 rules are getting marked millions of times each. So tiny differences in how long that method takes get multiplied out to big runtime differences, which is why two data structures show significantly different performance despite both supporting O(1) access. Even the bounds-checks in the vector version have a measurable impact, if I remember the perf report correctly.

It looks like wildcard handling is currently exponential. We investigated ways to reduce that and Trevor has some things he's planning to try. I think fixing this may not only improve runtime but also fix some false positives in the overlap check.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 01:06):

elliottt updated PR #4906 from trevor/isle-overlap-checker to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 18:41):

elliottt updated PR #4906 from trevor/isle-overlap-checker to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:10):

elliottt created PR review comment:

I've changed it so that the feature controls whether or not the overlaps are considered errors. It currently only prints the number of overlaps when the feature is disabled, would you prefer that it always prints the errors out even if they're not going to fail the build?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:10):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:11):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:11):

elliottt created PR review comment:

I reworked the overlap checking to work over pairs of related rules instead, which is much faster and reported more accurate overlap information.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:16):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:16):

jameysharp created PR review comment:

Nah, printing just the overlap count is fine, I think.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:26):

elliottt updated PR #4906 from trevor/isle-overlap-checker to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:29):

elliottt updated PR #4906 from trevor/isle-overlap-checker to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:34):

elliottt updated PR #4906 from trevor/isle-overlap-checker to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:34):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:34):

elliottt created PR review comment:

Great idea!

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:35):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:35):

elliottt created PR review comment:

I could go either way on this, do you feel strongly about switching to the macro?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:58):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 19:58):

elliottt created PR review comment:

I think this is a bad name for this type now, how about Patterns?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 15 2022 at 21:19):

elliottt has marked PR #4906 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2022 at 17:58):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2022 at 17:58):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2022 at 17:58):

jameysharp created PR review comment:

As far as I can tell, this field is never read and can be removed, along with the is_single_constructor_enum method. I'm puzzled why you didn't get a build warning about this...

Once you do that I believe Variant and Extractor are treated the same way everywhere so you can merge them. I'm assuming that the same TermId is never used both for an enum variant and for an extractor.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2022 at 17:58):

jameysharp created PR review comment:

I do not feel strongly about that, nope!

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2022 at 17:58):

jameysharp created PR review comment:

This is immediately converted back to an iterator by its only caller, so I'd return impl Iterator<Item = RuleId> instead of calling .collect().

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2022 at 21:00):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2022 at 21:00):

elliottt created PR review comment:

As far as I can tell, this field is never read and can be removed, along with the is_single_constructor_enum method. I'm puzzled why you didn't get a build warning about this...

This was originally when I was going to try to reason about the interaction of single-constructor enums to know that no other case was ever possible. I'm not sure there's anything we can do with that information now, so it should be removed.

Once you do that I believe Variant and Extractor are treated the same way everywhere so you can merge them. I'm assuming that the same TermId is never used both for an enum variant and for an extractor.

We'd still need to be able to differentiate between a Variant and an Extractor for the case where the match fails: in that case the variant means that the two don't overlap, while the extractor could still potentially fire.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2022 at 21:02):

elliottt updated PR #4906 from trevor/isle-overlap-checker to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2022 at 21:23):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 16 2022 at 21:23):

jameysharp created PR review comment:

Yeah, I figured that out after writing this comment. :sweat_smile:

view this post on Zulip Wasmtime GitHub notifications bot (Sep 19 2022 at 20:20):

jameysharp updated PR #4906 from trevor/isle-overlap-checker to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 19 2022 at 23:00):

jameysharp updated PR #4906 from trevor/isle-overlap-checker to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 19 2022 at 23:25):

jameysharp updated PR #4906 from trevor/isle-overlap-checker to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 19 2022 at 23:57):

jameysharp updated PR #4906 from trevor/isle-overlap-checker to main.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 21 2022 at 20:05):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 21 2022 at 20:05):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 21 2022 at 20:05):

cfallin created PR review comment:

Do we need to flatten the ands here? E.g. what happens if I have a left-hand side like

(rule
  (f
    (and
      (and (g _) (h _))
      (k _)))
  rhs)

?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 21 2022 at 20:07):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 21 2022 at 20:07):

jameysharp created PR review comment:

We would need to do that, except we decided to flatten those earlier in the pipeline, and merged that in #4915.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 21 2022 at 20:30):

cfallin created PR review comment:

Oh, excellent! I somehow missed that. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Sep 21 2022 at 20:30):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 21 2022 at 20:30):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 21 2022 at 21:08):

elliottt merged PR #4906.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 11:53):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 11:53):

bjorn3 created PR review comment:

This adds a lot of dependencies to Cranelift. Is the overlap checking really so expensive that paralleling it makes a lot of difference?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 11:57):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 11:57):

bjorn3 created PR review comment:

I just checked it. Takes less than half a second total even when not parallelizing. Will open a PR to remove rayon.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 17:15):

elliottt created PR review comment:

We depend on rayon elsewhere in cranelift, which is why we added it to the overlap checker. Were you removing it completely from cranelift?

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 17:15):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 17:21):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 17:21):

bjorn3 created PR review comment:

wasmtime-cranelift depends on rayon, but cranelift itself doesn't apart from cranelift-isle because of this PR. https://github.com/bjorn3/rustc_codegen_cranelift/commit/266e96785ab71834b917bf474f130a6d8fdecd4b reverted cg_clif back to cranelift 0.88.1 from before this PR. This commit removes rayon from cg_clif's Cargo.lock.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 18:46):

elliottt created PR review comment:

Ah I see. My assumption had been that since it was in cranelift/Cargo.toml that it would be okay to rely on it in isle.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 18:46):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 19:06):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 23 2022 at 19:06):

cfallin created PR review comment:

FWIW cranelift/Cargo.toml defines the crate for cranelift-tools, the CLI utility; we do want to keep dependencies of cranelift-codegen as minimal as possible, and since cranelift-isle is a build-dep of that I'd consider the same policy to apply. Sorry for not catching this earlier!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 20:04):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 24 2022 at 20:04):

jameysharp created PR review comment:

If I remember correctly, when we added the rayon dependency, overlap checking was dramatically slower, so parallelism shaved a lot of time off the wasmtime build. But by the time we actually merged overlap checking to main, we'd fixed its performance, so rayon didn't matter any more; we just didn't re-evaluate it at that point.

So thank you for #5101: it's a good change.


Last updated: Oct 23 2024 at 20:03 UTC