Stream: git-wasmtime

Topic: wasmtime / PR #5195 cranelift-isle: New IR and revised ov...


view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 01:17):

jameysharp opened PR #5195 from isle-new-ir to main:

I've replaced the overlap checker with one that uses my new IR for ISLE. This is a step toward using the new IR for codegen too. This step gives me some confidence in the correctness of the IR construction without having to get all the codegen details right yet.

This isn't quite ready to merge yet, but it's at a milestone where I think it should pass CI, so I thought folks like @elliottt or @fitzgen might want to take a look.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 04:18):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 04:18):

elliottt created PR review comment:

:rofl:

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 21:26):

jameysharp updated PR #5195 from isle-new-ir to main.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2022 at 21:27):

jameysharp has marked PR #5195 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 05 2022 at 01:41):

jameysharp updated PR #5195 from isle-new-ir to main.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 06 2022 at 03:25):

jameysharp updated PR #5195 from isle-new-ir to main.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin created PR review comment:

Can we call this an "unreachable rule"? I agree that "unmatchable" is a little more specific to a term-rewriting system and may otherwise be a better descriptor, but I think that "unreachable" has more immediate and intuitive connotations for someone trying to learn how to write ISLE rules.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin created PR review comment:

Could we make these actual newtype wrappers, like pub struct TupleIndex(u8); with all the necessary derives?

I used to define index types like this as well (and islec may have artifacts of that still) but I tend to prefer the actual-newtype style now to avoid the silent assignments of one type to another that's possible with shallow aliases.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin created PR review comment:

This is a neat little data-structure -- perhaps it could be factored out? I guess it's semantically equivalent to a union-find with a set of all elements stored at the UF root as well, but it's nicer than that because it's constant-space (per binding). EquivClasses<T> or something like that...

(Similar comment as above re: ensuring the HashMap nondeterminism doesn't leak to users of this IR, as well)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin created PR review comment:

In general I've tried to use BTreeMap where possible in IR to maintain dete rministic codegen. It probably doesn't matter now for overlap checking but this may become an issue later, if we don't remember to sort. Is it reasonable to use a BTreeMap here to ensure that at the top level? Or if you'd prefer to keep a HashMap, make this non-pub and either just the per-key get/get_mut as methods, or a "return sorted constrained binding IDs" method if needed?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin created PR review comment:

Can you say why this is needed for correctness in this comment, for clarity?

(Because the iteration pattern below catches (present in small, not present in big) and we would miss that case if iterating over big in the outermost loop, right?)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin created PR review comment:

Probably doesn't matter too much in practice right now (rule bodies should have O(a dozen) or so bindings at most?) but to avoid potential quadratic behavior here we could use a doubly-linked circular list. Genuine question: I wonder if this will become an ISLE compilations-speed issue in the future if we have very large machine-generated rule bodies of some sort?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin created PR review comment:

Can you expand this comment a bit more, explaining the overall scheme and why it works? Copy the constraints, meaning... push all constraints to one member of the equivalence class? Or...?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin created PR review comment:

Can we have a comment here for now stating that we're ignoring equals and that this is fine (albeit imprecise) and why?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin created PR review comment:

This one is a bit mystifying too, sorry -- leaves a few questions for me: why remove an equivalence class after processing just one link? What do you mean by "all the constraints are equal"? Where did that check happen?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin created PR review comment:

what are the "bindings" here -- can you name this variable more descriptively? They're sub-bindings (bindings whose value must be computed first), or...? Can we factor out the logic here into a helper?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin created PR review comment:

OK so we're following an equivalence link, somehow rebuilding any Variants (ie renormalizing the source), and then pushing that constraint back on the workqueue?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin created PR review comment:

And after reading the body of this function -- it's extremely confusing. Variable names are actively trying to melt my brain (constraints is a workqueue of sorts, constraint is a local thing; binding is something that comes from constraints, but bindings is something pulled out of constraint). It sort of reads like it's meant to normalize variant extractions.

I have to ask -- would it be better to take a more immutable approach to normalization here, where we (i) hashcons expression nodes, (ii) normalize all args (using e.g. a union-find) before creating a node on those args, and (iii) eagerly process equalities to create the union-find ahead of time (or as we go, at least)? Trying to reprocess post-facto here seems fraught with complexity and in any case is quite hard to understand (and thus maintain)!

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:21):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:31):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 07 2022 at 23:31):

bjorn3 created PR review comment:

BTreeMap has the disadvantage of being slower to compile than HashMap due to generating a lot more LLVM ir. cranelift-isle is decently fast to compule, so it doesn't matter that much right now, but it might be something to keep in mind.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 02:18):

jameysharp updated PR #5195 from isle-new-ir to main.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 02:22):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 02:22):

jameysharp created PR review comment:

I originally tried to maintain an invariant during construction that every binding site could have a concrete constraint, or could participate in an equality constraint, but not both. I couldn't reason through making that work. My head hurt less when I just established that invariant after construction instead.

I've renamed many of the variables and added about 60 lines of comments to this function. Could you take another look and see if it makes more sense now?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 02:35):

jameysharp created PR review comment:

Yeah, I thought about using union-find until I realized I needed to visit every element of the equivalence class starting from an arbitrary element of the class. Although I'm not 100% sure that's as important as I thought it was, so union-find might still be an option.

I'm amused to realize that union-find plus a set for each root is _also_ O(1) space per binding, it just has larger constants... I think you could maintain first-child/next-sibling pointers in a union-find data structure, which would only triple the constant factors over my version.

I'm not immediately convinced it's worth factoring out a separate type for this though. I'm inclined to see if it gets more complex in the rest of review first.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 02:35):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 02:38):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 02:38):

jameysharp created PR review comment:

The way I'm expecting to use this is to iterate over RuleSet::bindings and then look up constraints accordingly. I guess a public get method with a private HashMap should be fine for that...

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 03:20):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 03:20):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 03:20):

cfallin created PR review comment:

Hmm -- now that I think about it more, I'm not sure this is always a valid transform as a general IR normalization/canonicalization (as opposed to something useful just for overlap-checking). Specifically, I think it loses information -- consider

(term x @ (T.A y _) x)  --> (term (T.A y _) (T.A y _))

if we then match against the input (term (T.A 1 2) (T.A 1 3)), the original form will not match, but the rewritten form will match (incorrectly, wrt the original pattern).

I suppose this is fine for overlap checking -- the rule always "expands" (matches more) under this transform, I think, since we can lose equality constraints -- but I think it is only exact (my inner thermodynamicist is going to invent the word "isoconstraining" for this) if the constraint on the constrained-and-equals site fully captures the value (if a variant, all fields have patterns that, recursively, fully capture the value, or are just a variable binding).

So if we hope to use this IR as a basis for compilation, I think we need to avoid this transform, though I suppose we will want to find a way to get the same effect for overlap checking, unless we have a more direct way of handling the equality constraints. (It feels like maybe there should be a more direct way to handle the equality constraints, but I don't have any good ideas right now, sorry!)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 03:21):

cfallin edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 03:21):

cfallin edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 03:27):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 03:27):

cfallin created PR review comment:

Ah, and now that I write all that, I see the y term in the above, and I've looked at the code again and am understanding the bit about handling all bindings within the variant -- I think I've convinced myself of the need for what you are actually doing already. So, nevermind all of the above :-)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 17:57):

fitzgen created PR review comment:

https://www.youtube.com/watch?v=qTA0RuZoIxM

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 17:57):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 20:17):

jameysharp updated PR #5195 from isle-new-ir to main.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 21:13):

jameysharp updated PR #5195 from isle-new-ir to main.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 21:17):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 21:17):

jameysharp created PR review comment:

I went ahead and used union-find, factored out to a separate type since that's a little more complex.

It's a good change, except that now removing a set from the data structure requires visiting every member of every set to figure out whether it's part of the desired set.

I'm inclined to not try to improve that, but I have some ideas if we find it's a performance problem later.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 21:17):

jameysharp created PR review comment:

I've fixed this potential quadratic behavior, using union-find.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 21:17):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 21:29):

jameysharp created PR review comment:

I've tried making the equality normalization incremental, in commit 0b9fe9040. It turns out to be pretty much exactly the same amount of code as doing it in a post-processing pass, unless I've missed a trick somewhere.

Also, I think it will do more work if three or more binding sites with enum-variant constraints get set equal to each other. In that case it runs n-1 times, and each time it builds all fields for two of the binding sites, instead of running once and building the fields for n binding sites all together. Then, if any field on the first binding site had a constraint, the same reasoning applies recursively.

So I'm thinking about dropping that change before merging. Would you take a look and see what you think?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 08 2022 at 21:29):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 00:08):

cfallin created PR review comment:

Let's make a specific note of the y here -- this is why it is still exactly equivalent, and not a coarsening

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 00:08):

cfallin created PR review comment:

can we say "entry in equals" as a parenthetical to reminder the reader here?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 00:08):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 00:08):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 00:08):

cfallin created PR review comment:

Add a comment here why convert to expr-land and dedup -- so that we strongly normalize external-extractor-of-a-given-binding-site

view this post on Zulip Wasmtime GitHub notifications bot (Nov 10 2022 at 00:08):

cfallin created PR review comment:

Add a comment here describing why safe to only check for constraint on b if we didn't invoke set_constraint_on_class(a, ...) above -- because the above will otherwise see any constraints on b because it's part of the same equivalence class now (due to merge above).

view this post on Zulip Wasmtime GitHub notifications bot (Nov 14 2022 at 01:41):

jameysharp updated PR #5195 from isle-new-ir to main.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 14 2022 at 01:46):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 14 2022 at 01:46):

jameysharp created PR review comment:

Strongly normalizing is a red herring in this case. External extractor calls would be normalized whether we represent them as an Expr or as a Binding here. The entire reason for converting from pattern to expression and back around external extractor calls is because I designed this representation to closely resemble what we can easily implement with Rust as our codegen target, and you can't do a function call in a Rust pattern match. I've added comments in a variety of places which I hope would have clarified this for you if I'd written them before your code review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 14 2022 at 01:48):

jameysharp has enabled auto merge for PR #5195.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 14 2022 at 02:29):

jameysharp merged PR #5195.


Last updated: Jan 24 2025 at 00:11 UTC