Stream: cranelift

Topic: ELI5: What is an acyclic egraph?


view this post on Zulip Olivier FAURE (Sep 22 2022 at 20:28):

I've been looking at the egraph RFC and the surrounding work lately; I think I have an okay understanding of most concepts involved, but one thing I'm not clear on is what acyclic-egraphs are, and how exactly they're supposed to be useful.

I get the general idea (we make the graph thing acyclic somehow, do constant-folding more aggressively, and generally apply equivalences "in-flight" during the merging passes), but I still have trouble wrapping my head around it.

Some questions I have:

I'm not sure whether I'm hitting the mark with those questions. The whole concept of egraphs is fascinating and I'm especially interested in the idea of solving a bunch of optimizations at once with a forward pass by structuring your data really well.

view this post on Zulip Chris Fallin (Sep 22 2022 at 20:30):

@Olivier FAURE I just merged the initial implementation of our egraph crate, whose toplevel doc-comment hopefully answers more of your questions: https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/egraph/src/lib.rs

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Olivier FAURE (Sep 22 2022 at 20:31):

Well that was fast!

view this post on Zulip Chris Fallin (Sep 22 2022 at 20:31):

the next step, i.e. building the aegraph, rewriting it with rules, and elaborating out of it, exists in the draft PR that is linked from the RFC thread, and I'm (still) working on rebasing to the latest and cleaning it up, but it will be merged soon :-)

view this post on Zulip Chris Fallin (Sep 22 2022 at 20:32):

Well that was fast!

not really, the prototype was codeveloped with the RFC in my best effort at the "the proposed work is feasible because it's already been done" RFC style :-)

view this post on Zulip Olivier FAURE (Sep 22 2022 at 21:24):

So, looking at the code:

pub struct EClass {
    // formats:
    //
    // 00 | unused  (31 bits)         | NodeKey (31 bits)
    // 01 | eclass_parent   (31 bits) | NodeKey (31 bits)
    // 10 | eclass_parent_1 (31 bits) | eclass_parent_id_2 (31 bits)
    bits: u64,
}

What is a parent in that context? Also, if an eclass has more than two parents, what happens? Are they stored with the two-parents form as a binary tree?

view this post on Zulip Olivier FAURE (Sep 22 2022 at 21:25):

And how does an eclass store enodes in that implementation?

view this post on Zulip Olivier FAURE (Sep 22 2022 at 21:26):

(This might be explained in the documentation, sorry if it is; I'm mostly skimming the file and giving stream-of-thoughts questions)

view this post on Zulip Chris Fallin (Sep 22 2022 at 21:34):

This might be explained in the documentation, sorry if it is

Yes, I believe it should be; anything that's not, please feel free to submit PRs to update docs

view this post on Zulip Olivier FAURE (Sep 22 2022 at 22:00):

Each eclass id refers to a table entry that can be one of:

So, if I understand correctly, in this format eclasses are stored as cons-lists? With special nodes to mark equivalences?

I guess I'm a bit confused by the use of the word "parent". For instance, if you have an egraph that only consists of five unrelated constants, that egraph would be stored as

eclass1    node=iconst(1)
eclass2    node=iconst(2) parent=eclass1
eclass3    node=iconst(3) parent=eclass2

Right? But the world "parent" implies a hierarchy between these three eclasses, even though they're "parallel", they're just stored in a linked list.

Actually, I'm not sure the example above is right. When the documentation says "An enode and an earlier eclass id it is appended to;", it's not clear if that means "an enode being added to an eclass" or "a new eclass with a single enode being added to a linked list of eclasses". If it's the first, then my example above is wrong (but the use of the word "parent" is still confusing).

view this post on Zulip Olivier FAURE (Sep 22 2022 at 22:04):

    eclass5    iconst(2)
    eclass6    union(iadd(eclass2, eclass5), eclass4)

So in this snippet, is eclass6 supposed to be the second format of enode (01) or the third (10)? If it's the second, the use of the word "union" seems a bit confusing given that the explanation above uses "union node" to refer to the third type. If it's the third type, then shouldn't the snippet as follows?

    eclass5    iconst(2)
    eclass6    iadd(eclass2, eclass5)
    eclass7    union(eclass6, eclass4)

view this post on Zulip Chris Fallin (Sep 22 2022 at 22:12):

Not a cons-list but a tree. "Parent" comes from that view

view this post on Zulip Chris Fallin (Sep 22 2022 at 22:12):

I unfortunately don't have a lot of time to walk you through the code in detail now; the work is still in progress and the code and RFC will have to stand as they are; but if after reading and thinking you have specific questions I'm happy to answer some

view this post on Zulip Olivier FAURE (Sep 22 2022 at 22:19):

Right, no problem. Thanks for answering my questions!

view this post on Zulip Olivier FAURE (Sep 26 2022 at 08:39):

Okay, reading the egg crate, I think I understand better. "Parent" in the doc above means "parent in the unionfind structure", not "parent in the dataflow graph". That makes a lot more sense. (Though it's kinda confusing if you're not familiar with unionfind)


Last updated: Dec 23 2024 at 12:05 UTC