wasmtime / PR #5382 egraph support: rewrite to work in te... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #5382 egraph support: rewrite to work in te...

Wasmtime GitHub notifications bot (Dec 06 2022 at 05:21):

cfallin opened PR #5382 from egraph-in-dfg-rebase to main:

(Stacked on #5380.)

This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.

The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one. This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have multiple representations: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.

Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).

Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.

The implementation performs two passes overall:

One, a forward pass in RPO (to see defs before uses), that (i) removes
"pure" instructions from the layout and (ii) optimizes as it goes. As
before, we eagerly optimize, so we form the entire union of optimized
forms of a value before we see any uses of that value. This lets us
rewrite uses to use the most "up-to-date" form of the value and
canonicalize and optimize that form.

The eager rewriting and acyclic representation make each other work
(we could not eagerly rewrite if there were cycles; and acyclicity
does not miss optimization opportunities only because the first time
we introduce a value, we immediately produce its "best" form). This
design choice is also what allows us to avoid the "parent pointers"
and fixpoint loop of traditional egraphs.

This forward optimization pass keeps a scoped hashmap to "intern"
nodes (thus performing GVN), and also interleaves on a per-instruction
level with alias analysis. The interleaving with alias analysis allows
alias analysis to see the most optimized form of each address (so it
can see equivalences), and allows the next value to see any
equivalences (reuses of loads or stored values) that alias analysis
uncovers.

Two, a forward pass in domtree preorder, that "elaborates" pure enodes
back into the layout, possibly in multiple places if needed. This
tracks the loop nest and hoists nodes as needed, performing LICM as it
goes. Note that by doing this in forward order, we avoid the
"fixpoint" that traditional LICM needs: we hoist a def before its
uses, so when we place a node, we place it in the right place the
first time rather than moving later.

This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.

On spidermonkey.wasm running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup). More benchmarks to come.

Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:

Co-authored-by: Jamey Sharp <jsharp@fastly.com>

Wasmtime GitHub notifications bot (Dec 06 2022 at 05:21):

cfallin requested fitzgen for a review on PR #5382.

Wasmtime GitHub notifications bot (Dec 06 2022 at 05:56):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

Wasmtime GitHub notifications bot (Dec 06 2022 at 10:48):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 10:48):

bjorn3 created PR review comment:

We already have an fxhash impl in the fx module.

Wasmtime GitHub notifications bot (Dec 06 2022 at 10:54):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 10:54):

bjorn3 created PR review comment:

What exactly is hashbrown necessary for? std::collection::HashMap is a wrapper around hashbrown.

Wasmtime GitHub notifications bot (Dec 06 2022 at 17:28):

cfallin created PR review comment:

It's necessary for crate::ctxhash::CtxHashMap, which is a hashmap that uses external context for hashing and equality. HashMap isn't sufficient for this.

Note that this was previously in the cranelift-egraph crate, which was unconditionally required by cranelift-codegen, and now has been moved into cranelift-codegen; so there's no change in the dependency footprint.

Wasmtime GitHub notifications bot (Dec 06 2022 at 17:28):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 17:29):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

Wasmtime GitHub notifications bot (Dec 06 2022 at 17:30):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 17:30):

cfallin created PR review comment:

Ah, yes, I had forgotten that; thanks, updated.

Wasmtime GitHub notifications bot (Dec 06 2022 at 17:37):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

Wasmtime GitHub notifications bot (Dec 06 2022 at 17:51):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

Wasmtime GitHub notifications bot (Dec 06 2022 at 18:43):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

Wasmtime GitHub notifications bot (Dec 06 2022 at 18:43):

cfallin edited PR #5382 from egraph-in-dfg-rebase to main:

This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.

The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one. This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have multiple representations: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.

Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).

Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.

The implementation performs two passes overall:

One, a forward pass in RPO (to see defs before uses), that (i) removes
"pure" instructions from the layout and (ii) optimizes as it goes. As
before, we eagerly optimize, so we form the entire union of optimized
forms of a value before we see any uses of that value. This lets us
rewrite uses to use the most "up-to-date" form of the value and
canonicalize and optimize that form.

The eager rewriting and acyclic representation make each other work
(we could not eagerly rewrite if there were cycles; and acyclicity
does not miss optimization opportunities only because the first time
we introduce a value, we immediately produce its "best" form). This
design choice is also what allows us to avoid the "parent pointers"
and fixpoint loop of traditional egraphs.

This forward optimization pass keeps a scoped hashmap to "intern"
nodes (thus performing GVN), and also interleaves on a per-instruction
level with alias analysis. The interleaving with alias analysis allows
alias analysis to see the most optimized form of each address (so it
can see equivalences), and allows the next value to see any
equivalences (reuses of loads or stored values) that alias analysis
uncovers.

Two, a forward pass in domtree preorder, that "elaborates" pure enodes
back into the layout, possibly in multiple places if needed. This
tracks the loop nest and hoists nodes as needed, performing LICM as it
goes. Note that by doing this in forward order, we avoid the
"fixpoint" that traditional LICM needs: we hoist a def before its
uses, so when we place a node, we place it in the right place the
first time rather than moving later.

This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.

On spidermonkey.wasm running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup). More benchmarks to come.

Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:

Co-authored-by: Jamey Sharp <jsharp@fastly.com>

Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

Nitpick: in general, I think a where clause would make these methods easier to read.

Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

Could we wrap this up into process_skeleton_inst or something like that, just to make the overall algorithm a little easier to read at a glance?

Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

InstructionData is Copy so this can be

            NewOrExistingInst::New(data, ty) => (*ty, *data),

Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

This is shorter and allows for reserving capacity for all children at once:
            block_stack.extend(self.domtree_children.children(block));

Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

Can we package this up in a self.egraph_pass() method, similar to other passes, rather than have all the bits of state exposed in our pass-ordering / mid end? I think that would be a bit nicer.

Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

What about something like iadd_carry, which is a pure instruction with two results, how does that fit in here? We used to return tuples and then project fields out of them, but now that we are using the DFG directly, how is this handled?

This could be simplified as
suggestion let orig_value = self.func.dfg.first_result(inst);

Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

Should this be behind if cfg!(feature = "trace-log") or something? Or are we pretty confident that LLVM will clean this up?

Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

I would expect this to more closely mirror std::hash::Hash::hash, like
fn ctx_hash<H>(&self, state: &mut H, value: &Value)
    where H: Hasher;
Is there a reason it doesn't?

Wasmtime GitHub notifications bot (Dec 06 2022 at 20:37):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 20:37):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 20:37):

fitzgen created PR review comment:

A comment at the top of Cost explaining what sentinel values there are would be useful here, especially since I had a comment about how the loop level max should be 3 not 2 until I realized that it was leaving room for sentinels.

Wasmtime GitHub notifications bot (Dec 06 2022 at 20:37):

fitzgen created PR review comment:

Do we really need u32s here? With the Cost we limit the loop level to 2 at most...

Wasmtime GitHub notifications bot (Dec 06 2022 at 20:37):

fitzgen created PR review comment:

Additionally, can we debug assert that there are no ValueDef::Unions in the layout after running this pass?

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin created PR review comment:

Done!

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin created PR review comment:

No particular reason, just expediency at the time (it was this way in the original egraph prototype then carried over). Updated as suggested!

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin created PR review comment:

Done!

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin created PR review comment:

Done!

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:31):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:31):

cfallin created PR review comment:

Good point, added the conditional to be sure (it never hurts to make LLVM's life easier in this way either...).

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:31):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:31):

cfallin created PR review comment:

Ah, the DomTreeChildIter doesn't give a size hint (it traverses a linked list internally) so we won't get that one-reservation benefit, but this form is still clearer, so updated, thanks!

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:31):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:31):

cfallin created PR review comment:

This was a bit ugly because of AliasAnalysis carrying a lifetime parameter, and ended up propagating lifetimes outward to the point that the ISLE context has three lifetimes with two constraints between them. But now that it's done, the factoring is kind of nice, so I think I'll leave it...

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:37):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:37):

cfallin created PR review comment:

Ah, great point with first_result, I hadn't remembered that method existed. Updated!

Regarding multi-output instructions in general: yes, this was a bit of a compromise for now. Jamey and I went back and forth a bit on an Inst-centric vs. Value-centric design in the elaborator and ultimately basing everything on Values was far, far simpler; so to keep everything straight, we need to deal only with single-result instructions.

We judged this to be acceptable-ish because the most common multi-result instructions are calls. There are indeed also adds-with-carries but, for now, not optimizing these probably will not have a material impact on speedups.

Ultimately I'm curious whether we might be able to adapt a tuple-types-and-projection approach to CLIF as well, though that's a much larger change.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:37):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:50):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:50):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:50):

cfallin created PR review comment:

Added! Hopefully the explanation helps.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:50):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Dec 06 2022 at 21:50):

cfallin created PR review comment:

Ah, reverted, thanks for catching this!

For the curious, this change came from a dead-end (since reverted) where we invented a notion of a "pseudo-loop nest" based on the domtree (basic idea is that we wanted the loop nest to fully be a subtree of the domtree; consider e.g. a loop exit dominating the rest of the function, that's where the loop tree and domtree differ) and loop nests could get much deeper. But we've reverted all that and LoopLevel should correspond only to true loop-nest level again, now.

Wasmtime GitHub notifications bot (Dec 06 2022 at 22:25):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

Wasmtime GitHub notifications bot (Dec 06 2022 at 22:26):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

Wasmtime GitHub notifications bot (Dec 06 2022 at 22:58):

cfallin merged PR #5382.

Last updated: Apr 17 2025 at 19:03 UTC