jameysharp opened issue #6128:
Feature
An ISLE pattern like
(iadd (pattern_a) (pattern_b))
should also match if(iadd (pattern_b) (pattern_a))
would match, becauseiadd
is a commutative operation.Benefit
We currently have many rules, both in mid-end optimization and in backend lowering, where we have to write out all possible rules that are equivalent up to commutativity. If there are N commutative patterns in a rule's left-hand side, that requires writing up to 2^N rules. This is tedious and error-prone.
It's worthwhile having ISLE generate code for each distinct form of these rules because then the ISLE compiler can build an efficient pattern-matching tree that evaluates each part of the pattern as few times as possible.
In the mid-end optimization rules, we've considered rewriting commutative instructions to their flipped equivalents, so that all commutative alternatives are available in the same e-class (see #5546). But it's cheaper to have more ISLE rules than to double the number of instructions in e-classes, the GVN map, and the dataflow graph. And this doesn't help backends because the egraph cost function can pick either version, so there's no guarantee the backend will see the operands in the order it wants (see #5709).
If we can declare once that a term like
iadd
is commutative and have ISLE generate all the equivalent rules, then we can get the performance advantages of an optimized pattern-matching tree while writing a minimum set of rules.Challenges
These commutative alternative rules often overlap with each other. If the sub-patterns overlap each other, then swapping them will produce a rule which overlaps with the original. The simplest example is that
(iadd x y)
overlaps with(iadd y x)
sincex
andy
are wildcard patterns which only bind variables.In some cases, swapping the operands produces exactly the same rule. For example,
(iadd x x)
only matches if both operands are the same SSA value or e-class, and matches exactly the same inputs when flipped. In these cases, generating a duplicate lowering rule will lead to either an error during overlap checking or an assertion failure during code generation, depending on where we expand the rule. Generating a duplicate egraph rule will produce correct results but do more work than necessary.In other cases, swapping the operands matches the same inputs when flipped but the right-hand side could evaluate to different results depending on which version of the rule is selected. A rule which lowers
(iadd x y)
to(x64_add x y)
would, when flipped, lower to(x64_add y x)
instead. The result should be equivalent but is not the same machine instruction. We need to ensure that we choose one of these alternatives deterministically.Implementation
We've discussed several alternative implementation plans, and I'm probably missing some. None have yet emerged as the "right" answer.
One idea is a new flag when declaring a term.
(decl commutative iadd (Value Value) Inst)
would mean that when the term is used as an extractor, then implicitly two versions of the enclosing rule are generated, with the subpatterns swapped.Another approach is to add some kind of support for or-patterns to the ISLE language. For example, if we allow a term to be defined by multiple internal extractors, then we might create a helper term for matching 2-element value arrays in commutative instructions:
(decl value_array_commutative (Value Value) ValueArray2) (extractor (value_array_commutative x y) (value_array_2 x y)) (extractor (value_array_commutative y x) (value_array_2 x y))
An alternative syntax is to add a special
(or ...)
pattern combinator, as proposed by @Kmeakin in https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/Commutative.20rewrites/near/344151148:(decl bor (Type Value Value) Value) (extractor (bor ty x y) (or (inst_data ty (InstructionData.Binary (Opcode.Bor) (value_array_2 x y))) (inst_data ty (InstructionData.Binary (Opcode.Bor) (value_array_2 y x)))) )
I prefer these versions over the
commutative
flag because it allows us to be explicit about how to match the alternatives. For example, we could write explicit alternatives foricmp
which reverse the condition code if matching the operands flipped (hand-waving thatintcc_reverse
is currently a constructor, but would need to be an extractor):(decl icmp (IntCC Value Value) Inst) (extractor (icmp Cond x y) (inst_data (InstructionData.IntCompare (Opcode.Icmp) (value_array_2 x y) Cond)) ) (extractor (icmp Cond x y) (inst_data (InstructionData.IntCompare (Opcode.Icmp) (value_array_2 y x) (intcc_reverse Cond))) )
We might add priorities to internal extractors to declare how they should interact with overlap checking. But that leaves priorities ambiguous in a pattern like
(isub (iadd a b) (iadd c d))
: swappinga
withb
would apparently have the same priority as swappingc
withd
.It might also be useful to declare some constructors as commutative: If we've produced two versions of a rule with left-hand patterns swapped due to commutativity, and find that they match the same inputs and produce the same right-hand side up to constructor commutativity, then we don't need the duplicate rule. Currently we can just avoid writing such rules but if all uses of
iadd
match an or-pattern, then ISLE needs to figure out how to avoid unnecessary work.As an extension, it would be nice if ISLE could emit Rust
match
or-patterns to group match arms which bind the same variables in different orders or from different enum variants. I suspect that optimization would never be relevant for the commutative-operator use case though: if we can tell the right-hand side is the same, I think that always means that the left-hand patterns are equivalent and the rules are just duplicates. I have other uses in mind for or-patterns which might benefit from this optimization though.A bunch of people have expressed interest in this topic: @cfallin @fitzgen @elliottt @afonso360 @alexcrichton
jameysharp labeled issue #6128:
Feature
An ISLE pattern like
(iadd (pattern_a) (pattern_b))
should also match if(iadd (pattern_b) (pattern_a))
would match, becauseiadd
is a commutative operation.Benefit
We currently have many rules, both in mid-end optimization and in backend lowering, where we have to write out all possible rules that are equivalent up to commutativity. If there are N commutative patterns in a rule's left-hand side, that requires writing up to 2^N rules. This is tedious and error-prone.
It's worthwhile having ISLE generate code for each distinct form of these rules because then the ISLE compiler can build an efficient pattern-matching tree that evaluates each part of the pattern as few times as possible.
In the mid-end optimization rules, we've considered rewriting commutative instructions to their flipped equivalents, so that all commutative alternatives are available in the same e-class (see #5546). But it's cheaper to have more ISLE rules than to double the number of instructions in e-classes, the GVN map, and the dataflow graph. And this doesn't help backends because the egraph cost function can pick either version, so there's no guarantee the backend will see the operands in the order it wants (see #5709).
If we can declare once that a term like
iadd
is commutative and have ISLE generate all the equivalent rules, then we can get the performance advantages of an optimized pattern-matching tree while writing a minimum set of rules.Challenges
These commutative alternative rules often overlap with each other. If the sub-patterns overlap each other, then swapping them will produce a rule which overlaps with the original. The simplest example is that
(iadd x y)
overlaps with(iadd y x)
sincex
andy
are wildcard patterns which only bind variables.In some cases, swapping the operands produces exactly the same rule. For example,
(iadd x x)
only matches if both operands are the same SSA value or e-class, and matches exactly the same inputs when flipped. In these cases, generating a duplicate lowering rule will lead to either an error during overlap checking or an assertion failure during code generation, depending on where we expand the rule. Generating a duplicate egraph rule will produce correct results but do more work than necessary.In other cases, swapping the operands matches the same inputs when flipped but the right-hand side could evaluate to different results depending on which version of the rule is selected. A rule which lowers
(iadd x y)
to(x64_add x y)
would, when flipped, lower to(x64_add y x)
instead. The result should be equivalent but is not the same machine instruction. We need to ensure that we choose one of these alternatives deterministically.Implementation
We've discussed several alternative implementation plans, and I'm probably missing some. None have yet emerged as the "right" answer.
One idea is a new flag when declaring a term.
(decl commutative iadd (Value Value) Inst)
would mean that when the term is used as an extractor, then implicitly two versions of the enclosing rule are generated, with the subpatterns swapped.Another approach is to add some kind of support for or-patterns to the ISLE language. For example, if we allow a term to be defined by multiple internal extractors, then we might create a helper term for matching 2-element value arrays in commutative instructions:
(decl value_array_commutative (Value Value) ValueArray2) (extractor (value_array_commutative x y) (value_array_2 x y)) (extractor (value_array_commutative y x) (value_array_2 x y))
An alternative syntax is to add a special
(or ...)
pattern combinator, as proposed by @Kmeakin in https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/Commutative.20rewrites/near/344151148:(decl bor (Type Value Value) Value) (extractor (bor ty x y) (or (inst_data ty (InstructionData.Binary (Opcode.Bor) (value_array_2 x y))) (inst_data ty (InstructionData.Binary (Opcode.Bor) (value_array_2 y x)))) )
I prefer these versions over the
commutative
flag because it allows us to be explicit about how to match the alternatives. For example, we could write explicit alternatives foricmp
which reverse the condition code if matching the operands flipped (hand-waving thatintcc_reverse
is currently a constructor, but would need to be an extractor):(decl icmp (IntCC Value Value) Inst) (extractor (icmp Cond x y) (inst_data (InstructionData.IntCompare (Opcode.Icmp) (value_array_2 x y) Cond)) ) (extractor (icmp Cond x y) (inst_data (InstructionData.IntCompare (Opcode.Icmp) (value_array_2 y x) (intcc_reverse Cond))) )
We might add priorities to internal extractors to declare how they should interact with overlap checking. But that leaves priorities ambiguous in a pattern like
(isub (iadd a b) (iadd c d))
: swappinga
withb
would apparently have the same priority as swappingc
withd
.It might also be useful to declare some constructors as commutative: If we've produced two versions of a rule with left-hand patterns swapped due to commutativity, and find that they match the same inputs and produce the same right-hand side up to constructor commutativity, then we don't need the duplicate rule. Currently we can just avoid writing such rules but if all uses of
iadd
match an or-pattern, then ISLE needs to figure out how to avoid unnecessary work.As an extension, it would be nice if ISLE could emit Rust
match
or-patterns to group match arms which bind the same variables in different orders or from different enum variants. I suspect that optimization would never be relevant for the commutative-operator use case though: if we can tell the right-hand side is the same, I think that always means that the left-hand patterns are equivalent and the rules are just duplicates. I have other uses in mind for or-patterns which might benefit from this optimization though.A bunch of people have expressed interest in this topic: @cfallin @fitzgen @elliottt @afonso360 @alexcrichton
github-actions[bot] commented on issue #6128:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
jameysharp commented on issue #6128:
@cfallin, @fitzgen, @elliottt and I discussed this today and clarified some of the design constraints.
We prefer the explicit
(or ...)
pattern approach.We don't want to try to hide commutative patterns inside terms generated by cranelift-codegen-meta. Doing that would require ISLE to analyze or-patterns to identify when multiple alternatives are equivalent. Instead, if you match on, for example,
(bor x y)
then you'll always matchx
andy
in that order. We can easily hand-write terms for when you explicitly want a commutative match, like so:(extractor (bor_commute x y) (or (bor x y) (bor y x)))
We have open questions about semantics of or-patterns when used outside of multi-terms. So we've decided to initially implement them only for use in the mid-end
simplify
multi-constructor, and figure out how they should work for backends later.When a rule author uses an or-pattern, they're declaring that all the alternatives are equivalent to the same right-hand side. So for the purposes of overlap checking, none of the alternatives within the same pattern should be treated as overlapping. However, for backends, the right-hand side may expand differently depending on which alternative we selected, so the match order is observable and maybe should be explicitly specified. For the mid-end, we're going to select all the alternatives anyway, so we don't need to think about order, and multi-terms are already excluded from overlap checking.
I had hoped to preserve or-patterns until ISLE generates Rust source, to allow emitting Rust
match
or-patterns when possible. However that seems hard so we're going to implement this as an early rewrite, probably in thesema
module. Once we have something working maybe we can do better later. Anyway, this will generate code that is no worse than the current state of things, where we're writing a separate rule for each alternative by hand.Something I forgot to mention: I've also wanted or-patterns to replace external extractors like
fits_in_64
but there's an awkward interaction with error checking. At least initially, I think the following won't work:(extractor (fits_in_32 ty) (and ty (or $I8 $I16 $I32 $F32))) (extractor (ty_float ty) (and ty (or $F32 $F64))) ... (fits_in_32 (ty_float ty)) ...
That pattern will expand to sub-rules such as
(and $I8 $F32)
, which we'll reject as an unsatisfiable rule. I kind of wanted this to be accepted as long as at least one of the sub-rules is valid (in this case,(and $F32 $F32)
). But I think that's going to hide bugs, and also it's hard to report a useful error message for the case where no sub-rules are valid.Huh. An or-pattern which doesn't bind any variables can't expand to a different right-hand side based on which case matched, so those could be allowed in backends without introducing overlap or worrying about match order. Maybe that's a useful next step after mid-end usage.
jameysharp commented on issue #6128:
Implementing or-patterns in ISLE is turning out to be surprisingly complicated but I've figured out some things.
There's no need to explicitly limit where or-patterns may be used (e.g. only in multi-constructors). Instead let's implement them as syntactic sugar for writing out a set of rules which all have the same priority. Later maybe we'll figure out a priority scheme, and until then, if you use them to write a pair of rules which overlap and aren't in a multi-constructor, your pattern will be rejected by the overlap checker. Similarly, the
(fits_int_32 (ty_float _))
case will be rejected since the equivalent Cartesian product of rules would be rejected.Where should this syntactic sugar be expanded?
- Parsing is too early.
- I haven't been able to come up with a way to express this in the "sea of nodes" construction in
trie_again
, so it should be expanded before that or during its construction.- Type-checking is complicated enough without building a Cartesian product of rules. Also the typing and name-binding rules for or-patterns require that all branches bind the same names to the same types (but possibly in a different order) so expanding this there just makes it more complicated.
- We could have an explicit pass that rewrites patterns with or-patterns into sets of patterns without or-patterns, but I don't like it.
So my current plan is to expand this syntax in the
Pattern::visit
method insema
, which is used during construction of thetrie_again
representation. The only way I can see to do that is to CPS-transform that method. In particular, the map of variable bindings may be different in each expansion, and the expressions and patterns from if-lets and the right-hand side need to be interpreted with every distinct map of variable bindings.
fitzgen commented on issue #6128:
Why don’t you like the explicit pass? That seems more reasonable than CPS converting parts of the compiler to me.
jameysharp commented on issue #6128:
That is a fair question!
The plain answer is I just don't like walking the pattern and building an exponential number of clones of the pattern tree while searching for the or-pattern nodes in it. It makes me feel gross.
A somewhat more defensible answer is I can avoid duplicating a bunch of the work to build these rules if I do the expansion during the traversal that builds the
trie_again
representation. With a separate earlier pass, I have to lean harder on the hash-cons map to deduplicate binding sites that are common across sub-rules. Sincebuild.rs
is compiled in debug mode, that could potentially have a significant impact.It's also worth noting that I don't think the CPS-conversion I need is a big deal. It just affects one function, which recursively walks a pattern and calls caller-provided visitor methods, so it's only a few short match arms.
That said, continuation-passing style is kind of a big hammer to pull out in Rust, so I understand your concern!
fitzgen commented on issue #6128:
Ah yeah if it is just one function, that isn't a big deal.
scottmcm commented on issue #6128:
Hmm, would the
(or …)
approach also let us handle equivalenticmp
s?Like instead of
(rule (ult ty x y) (icmp ty (IntCC.UnsignedLessThan) x y))
would we be able to have this?
(rule (ult ty x y) (or (icmp ty (IntCC.UnsignedLessThan) x y) (icmp ty (IntCC.UnsignedGreaterThan) y x) ))
jameysharp commented on issue #6128:
Your example places the
(or …)
on the right-hand side of a rule, which wouldn't work, but I think what you meant was this extractor fromprelude.isle
instead:(extractor (ult ty x y) (icmp ty (IntCC.UnsignedLessThan) x y))
And yes, I would expect the equivalent extractor pattern to work:
(extractor (ult ty x y) (or (icmp ty (IntCC.UnsignedLessThan) x y) (icmp ty (IntCC.UnsignedGreaterThan) y x) )
Which would sure be handy, right?
scottmcm commented on issue #6128:
Oops, you're right, I definitely meant the extractor.
And yes, that sounds extremely handy!
For example, right now all the spaceship optimizations look for
(a > b) - (a < b)
, and improving thelt
/gt
extractors like that would make them immediately start working for things like(b < a) - (a < b)
too.So or-patterns sounds great, as it would help for more than just commutative scenarios.
Last updated: Nov 22 2024 at 17:03 UTC