jameysharp opened PR #5195 from isle-new-ir
to main
:
I've replaced the overlap checker with one that uses my new IR for ISLE. This is a step toward using the new IR for codegen too. This step gives me some confidence in the correctness of the IR construction without having to get all the codegen details right yet.
This isn't quite ready to merge yet, but it's at a milestone where I think it should pass CI, so I thought folks like @elliottt or @fitzgen might want to take a look.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
elliottt submitted PR review.
elliottt created PR review comment:
:rofl:
jameysharp updated PR #5195 from isle-new-ir
to main
.
jameysharp has marked PR #5195 as ready for review.
jameysharp updated PR #5195 from isle-new-ir
to main
.
jameysharp updated PR #5195 from isle-new-ir
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Can we call this an "unreachable rule"? I agree that "unmatchable" is a little more specific to a term-rewriting system and may otherwise be a better descriptor, but I think that "unreachable" has more immediate and intuitive connotations for someone trying to learn how to write ISLE rules.
cfallin created PR review comment:
Could we make these actual newtype wrappers, like
pub struct TupleIndex(u8);
with all the necessary derives?I used to define index types like this as well (and islec may have artifacts of that still) but I tend to prefer the actual-newtype style now to avoid the silent assignments of one type to another that's possible with shallow aliases.
cfallin created PR review comment:
This is a neat little data-structure -- perhaps it could be factored out? I guess it's semantically equivalent to a union-find with a set of all elements stored at the UF root as well, but it's nicer than that because it's constant-space (per binding).
EquivClasses<T>
or something like that...(Similar comment as above re: ensuring the
HashMap
nondeterminism doesn't leak to users of this IR, as well)
cfallin created PR review comment:
In general I've tried to use
BTreeMap
where possible in IR to maintain dete rministic codegen. It probably doesn't matter now for overlap checking but this may become an issue later, if we don't remember to sort. Is it reasonable to use aBTreeMap
here to ensure that at the top level? Or if you'd prefer to keep aHashMap
, make this non-pub
and either just the per-keyget
/get_mut
as methods, or a "return sorted constrained binding IDs" method if needed?
cfallin created PR review comment:
Can you say why this is needed for correctness in this comment, for clarity?
(Because the iteration pattern below catches (present in small, not present in big) and we would miss that case if iterating over
big
in the outermost loop, right?)
cfallin created PR review comment:
Probably doesn't matter too much in practice right now (rule bodies should have O(a dozen) or so bindings at most?) but to avoid potential quadratic behavior here we could use a doubly-linked circular list. Genuine question: I wonder if this will become an ISLE compilations-speed issue in the future if we have very large machine-generated rule bodies of some sort?
cfallin created PR review comment:
Can you expand this comment a bit more, explaining the overall scheme and why it works? Copy the constraints, meaning... push all constraints to one member of the equivalence class? Or...?
cfallin created PR review comment:
Can we have a comment here for now stating that we're ignoring
equals
and that this is fine (albeit imprecise) and why?
cfallin created PR review comment:
This one is a bit mystifying too, sorry -- leaves a few questions for me: why remove an equivalence class after processing just one link? What do you mean by "all the constraints are equal"? Where did that check happen?
cfallin created PR review comment:
what are the "bindings" here -- can you name this variable more descriptively? They're sub-bindings (bindings whose value must be computed first), or...? Can we factor out the logic here into a helper?
cfallin created PR review comment:
OK so we're following an equivalence link, somehow rebuilding any Variants (ie renormalizing the
source
), and then pushing that constraint back on the workqueue?
cfallin created PR review comment:
And after reading the body of this function -- it's extremely confusing. Variable names are actively trying to melt my brain (
constraints
is a workqueue of sorts,constraint
is a local thing;binding
is something that comes fromconstraints
, butbindings
is something pulled out ofconstraint
). It sort of reads like it's meant to normalize variant extractions.I have to ask -- would it be better to take a more immutable approach to normalization here, where we (i) hashcons expression nodes, (ii) normalize all args (using e.g. a union-find) before creating a node on those args, and (iii) eagerly process equalities to create the union-find ahead of time (or as we go, at least)? Trying to reprocess post-facto here seems fraught with complexity and in any case is quite hard to understand (and thus maintain)!
cfallin submitted PR review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
BTreeMap
has the disadvantage of being slower to compile thanHashMap
due to generating a lot more LLVM ir. cranelift-isle is decently fast to compule, so it doesn't matter that much right now, but it might be something to keep in mind.
jameysharp updated PR #5195 from isle-new-ir
to main
.
jameysharp submitted PR review.
jameysharp created PR review comment:
I originally tried to maintain an invariant during construction that every binding site could have a concrete constraint, or could participate in an equality constraint, but not both. I couldn't reason through making that work. My head hurt less when I just established that invariant after construction instead.
I've renamed many of the variables and added about 60 lines of comments to this function. Could you take another look and see if it makes more sense now?
jameysharp created PR review comment:
Yeah, I thought about using union-find until I realized I needed to visit every element of the equivalence class starting from an arbitrary element of the class. Although I'm not 100% sure that's as important as I thought it was, so union-find might still be an option.
I'm amused to realize that union-find plus a set for each root is _also_ O(1) space per binding, it just has larger constants... I think you could maintain first-child/next-sibling pointers in a union-find data structure, which would only triple the constant factors over my version.
I'm not immediately convinced it's worth factoring out a separate type for this though. I'm inclined to see if it gets more complex in the rest of review first.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
The way I'm expecting to use this is to iterate over
RuleSet::bindings
and then look up constraints accordingly. I guess a publicget
method with a privateHashMap
should be fine for that...
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Hmm -- now that I think about it more, I'm not sure this is always a valid transform as a general IR normalization/canonicalization (as opposed to something useful just for overlap-checking). Specifically, I think it loses information -- consider
(term x @ (T.A y _) x) --> (term (T.A y _) (T.A y _))
if we then match against the input
(term (T.A 1 2) (T.A 1 3))
, the original form will not match, but the rewritten form will match (incorrectly, wrt the original pattern).I suppose this is fine for overlap checking -- the rule always "expands" (matches more) under this transform, I think, since we can lose equality constraints -- but I think it is only exact (my inner thermodynamicist is going to invent the word "isoconstraining" for this) if the constraint on the constrained-and-equals site fully captures the value (if a variant, all fields have patterns that, recursively, fully capture the value, or are just a variable binding).
So if we hope to use this IR as a basis for compilation, I think we need to avoid this transform, though I suppose we will want to find a way to get the same effect for overlap checking, unless we have a more direct way of handling the equality constraints. (It feels like maybe there should be a more direct way to handle the equality constraints, but I don't have any good ideas right now, sorry!)
cfallin edited PR review comment.
cfallin edited PR review comment.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, and now that I write all that, I see the
y
term in the above, and I've looked at the code again and am understanding the bit about handling all bindings within the variant -- I think I've convinced myself of the need for what you are actually doing already. So, nevermind all of the above :-)
fitzgen created PR review comment:
fitzgen submitted PR review.
jameysharp updated PR #5195 from isle-new-ir
to main
.
jameysharp updated PR #5195 from isle-new-ir
to main
.
jameysharp submitted PR review.
jameysharp created PR review comment:
I went ahead and used union-find, factored out to a separate type since that's a little more complex.
It's a good change, except that now removing a set from the data structure requires visiting every member of every set to figure out whether it's part of the desired set.
I'm inclined to not try to improve that, but I have some ideas if we find it's a performance problem later.
jameysharp created PR review comment:
I've fixed this potential quadratic behavior, using union-find.
jameysharp submitted PR review.
jameysharp created PR review comment:
I've tried making the equality normalization incremental, in commit 0b9fe9040. It turns out to be pretty much exactly the same amount of code as doing it in a post-processing pass, unless I've missed a trick somewhere.
Also, I think it will do more work if three or more binding sites with enum-variant constraints get set equal to each other. In that case it runs n-1 times, and each time it builds all fields for two of the binding sites, instead of running once and building the fields for n binding sites all together. Then, if any field on the first binding site had a constraint, the same reasoning applies recursively.
So I'm thinking about dropping that change before merging. Would you take a look and see what you think?
jameysharp submitted PR review.
cfallin created PR review comment:
Let's make a specific note of the
y
here -- this is why it is still exactly equivalent, and not a coarsening
cfallin created PR review comment:
can we say "entry in
equals
" as a parenthetical to reminder the reader here?
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Add a comment here why convert to expr-land and dedup -- so that we strongly normalize external-extractor-of-a-given-binding-site
cfallin created PR review comment:
Add a comment here describing why safe to only check for constraint on
b
if we didn't invokeset_constraint_on_class(a, ...)
above -- because the above will otherwise see any constraints onb
because it's part of the same equivalence class now (due to merge above).
jameysharp updated PR #5195 from isle-new-ir
to main
.
jameysharp submitted PR review.
jameysharp created PR review comment:
Strongly normalizing is a red herring in this case. External extractor calls would be normalized whether we represent them as an
Expr
or as aBinding
here. The entire reason for converting from pattern to expression and back around external extractor calls is because I designed this representation to closely resemble what we can easily implement with Rust as our codegen target, and you can't do a function call in a Rust pattern match. I've added comments in a variety of places which I hope would have clarified this for you if I'd written them before your code review.
jameysharp has enabled auto merge for PR #5195.
jameysharp merged PR #5195.
Last updated: Jan 24 2025 at 00:11 UTC