I've been looking at the egraph RFC and the surrounding work lately; I think I have an okay understanding of most concepts involved, but one thing I'm not clear on is what acyclic-egraphs are, and how exactly they're supposed to be useful.
I get the general idea (we make the graph thing acyclic somehow, do constant-folding more aggressively, and generally apply equivalences "in-flight" during the merging passes), but I still have trouble wrapping my head around it.
Some questions I have:
I'm not sure whether I'm hitting the mark with those questions. The whole concept of egraphs is fascinating and I'm especially interested in the idea of solving a bunch of optimizations at once with a forward pass by structuring your data really well.
@Olivier FAURE I just merged the initial implementation of our egraph crate, whose toplevel doc-comment hopefully answers more of your questions: https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/egraph/src/lib.rs
Well that was fast!
the next step, i.e. building the aegraph, rewriting it with rules, and elaborating out of it, exists in the draft PR that is linked from the RFC thread, and I'm (still) working on rebasing to the latest and cleaning it up, but it will be merged soon :-)
Well that was fast!
not really, the prototype was codeveloped with the RFC in my best effort at the "the proposed work is feasible because it's already been done" RFC style :-)
So, looking at the code:
pub struct EClass {
// formats:
//
// 00 | unused (31 bits) | NodeKey (31 bits)
// 01 | eclass_parent (31 bits) | NodeKey (31 bits)
// 10 | eclass_parent_1 (31 bits) | eclass_parent_id_2 (31 bits)
bits: u64,
}
What is a parent in that context? Also, if an eclass has more than two parents, what happens? Are they stored with the two-parents form as a binary tree?
And how does an eclass store enodes in that implementation?
(This might be explained in the documentation, sorry if it is; I'm mostly skimming the file and giving stream-of-thoughts questions)
This might be explained in the documentation, sorry if it is
Yes, I believe it should be; anything that's not, please feel free to submit PRs to update docs
Each eclass id refers to a table entry that can be one of:
- A single enode;
- An enode and an earlier eclass id it is appended to;
- A "union node" with two earlier eclass ids.
So, if I understand correctly, in this format eclasses are stored as cons-lists? With special nodes to mark equivalences?
I guess I'm a bit confused by the use of the word "parent". For instance, if you have an egraph that only consists of five unrelated constants, that egraph would be stored as
eclass1 node=iconst(1)
eclass2 node=iconst(2) parent=eclass1
eclass3 node=iconst(3) parent=eclass2
Right? But the world "parent" implies a hierarchy between these three eclasses, even though they're "parallel", they're just stored in a linked list.
Actually, I'm not sure the example above is right. When the documentation says "An enode and an earlier eclass id it is appended to;", it's not clear if that means "an enode being added to an eclass" or "a new eclass with a single enode being added to a linked list of eclasses". If it's the first, then my example above is wrong (but the use of the word "parent" is still confusing).
eclass5 iconst(2) eclass6 union(iadd(eclass2, eclass5), eclass4)
So in this snippet, is eclass6 supposed to be the second format of enode (01) or the third (10)? If it's the second, the use of the word "union" seems a bit confusing given that the explanation above uses "union node" to refer to the third type. If it's the third type, then shouldn't the snippet as follows?
eclass5 iconst(2)
eclass6 iadd(eclass2, eclass5)
eclass7 union(eclass6, eclass4)
Not a cons-list but a tree. "Parent" comes from that view
I unfortunately don't have a lot of time to walk you through the code in detail now; the work is still in progress and the code and RFC will have to stand as they are; but if after reading and thinking you have specific questions I'm happy to answer some
Right, no problem. Thanks for answering my questions!
Okay, reading the egg crate, I think I understand better. "Parent" in the doc above means "parent in the unionfind structure", not "parent in the dataflow graph". That makes a lot more sense. (Though it's kinda confusing if you're not familiar with unionfind)
Last updated: Jan 24 2025 at 00:11 UTC