Stream: git-wasmtime

Topic: wasmtime / PR #5382 egraph support: rewrite to work in te...


view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 05:21):

cfallin opened PR #5382 from egraph-in-dfg-rebase to main:

(Stacked on #5380.)

This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.

The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one. This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have multiple representations: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.

Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).

Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.

The implementation performs two passes overall:

This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.

On spidermonkey.wasm running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup). More benchmarks to come.

Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:

Co-authored-by: Jamey Sharp <jsharp@fastly.com>

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 05:21):

cfallin requested fitzgen for a review on PR #5382.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 05:56):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 10:48):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 10:48):

bjorn3 created PR review comment:

We already have an fxhash impl in the fx module.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 10:54):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 10:54):

bjorn3 created PR review comment:

What exactly is hashbrown necessary for? std::collection::HashMap is a wrapper around hashbrown.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 17:28):

cfallin created PR review comment:

It's necessary for crate::ctxhash::CtxHashMap, which is a hashmap that uses external context for hashing and equality. HashMap isn't sufficient for this.

Note that this was previously in the cranelift-egraph crate, which was unconditionally required by cranelift-codegen, and now has been moved into cranelift-codegen; so there's no change in the dependency footprint.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 17:28):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 17:29):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 17:30):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 17:30):

cfallin created PR review comment:

Ah, yes, I had forgotten that; thanks, updated.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 17:37):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 17:51):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 18:43):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 18:43):

cfallin edited PR #5382 from egraph-in-dfg-rebase to main:

This work rewrites the "egraph"-based optimization framework in
Cranelift to operate on aegraphs (acyclic egraphs) represented in the
CLIF itself rather than as a separate data structure to which and from
which we translate the CLIF.

The basic idea is to add a new kind of value, a "union", that is like an
alias but refers to two other values rather than one. This allows us to
represent an eclass of enodes (values) as a tree. The union node allows
for a value to have multiple representations: either constituent value
could be used, and (in well-formed CLIF produced by correct
optimization rules) they must be equivalent.

Like the old egraph infrastructure, we take advantage of acyclicity and
eager rule application to do optimization in a single pass. Like before,
we integrate GVN (during the optimization pass) and LICM (during
elaboration).

Unlike the old egraph infrastructure, everything stays in the
DataFlowGraph. "Pure" enodes are represented as instructions that have
values attached, but that are not placed into the function layout. When
entering "egraph" form, we remove them from the layout while optimizing.
When leaving "egraph" form, during elaboration, we can place an
instruction back into the layout the first time we elaborate the enode;
if we elaborate it more than once, we clone the instruction.

The implementation performs two passes overall:

This PR replaces the old (a)egraph implementation. It removes both the
cranelift-egraph crate and the logic in cranelift-codegen that uses it.

On spidermonkey.wasm running a simple recursive Fibonacci
microbenchmark, this work shows 5.5% compile-time reduction and 7.7%
runtime improvement (speedup). More benchmarks to come.

Most of this implementation was done in (very productive) pair
programming sessions with Jamey Sharp, thus:

Co-authored-by: Jamey Sharp <jsharp@fastly.com>

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

Nitpick: in general, I think a where clause would make these methods easier to read.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

Could we wrap this up into process_skeleton_inst or something like that, just to make the overall algorithm a little easier to read at a glance?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

InstructionData is Copy so this can be

            NewOrExistingInst::New(data, ty) => (*ty, *data),

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

This is shorter and allows for reserving capacity for all children at once:

            block_stack.extend(self.domtree_children.children(block));

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

Can we package this up in a self.egraph_pass() method, similar to other passes, rather than have all the bits of state exposed in our pass-ordering / mid end? I think that would be a bit nicer.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

Should this be behind if cfg!(feature = "trace-log") or something? Or are we pretty confident that LLVM will clean this up?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 18:56):

fitzgen created PR review comment:

I would expect this to more closely mirror std::hash::Hash::hash, like

fn ctx_hash<H>(&self, state: &mut H, value: &Value)
    where H: Hasher;

Is there a reason it doesn't?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 20:37):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 20:37):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 20:37):

fitzgen created PR review comment:

A comment at the top of Cost explaining what sentinel values there are would be useful here, especially since I had a comment about how the loop level max should be 3 not 2 until I realized that it was leaving room for sentinels.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 20:37):

fitzgen created PR review comment:

Do we really need u32s here? With the Cost we limit the loop level to 2 at most...

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 20:37):

fitzgen created PR review comment:

Additionally, can we debug assert that there are no ValueDef::Unions in the layout after running this pass?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin created PR review comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin created PR review comment:

No particular reason, just expediency at the time (it was this way in the original egraph prototype then carried over). Updated as suggested!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin created PR review comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin created PR review comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:30):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:31):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:31):

cfallin created PR review comment:

Good point, added the conditional to be sure (it never hurts to make LLVM's life easier in this way either...).

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:31):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:31):

cfallin created PR review comment:

Ah, the DomTreeChildIter doesn't give a size hint (it traverses a linked list internally) so we won't get that one-reservation benefit, but this form is still clearer, so updated, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:31):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:31):

cfallin created PR review comment:

This was a bit ugly because of AliasAnalysis carrying a lifetime parameter, and ended up propagating lifetimes outward to the point that the ISLE context has three lifetimes with two constraints between them. But now that it's done, the factoring is kind of nice, so I think I'll leave it...

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:37):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:37):

cfallin created PR review comment:

Ah, great point with first_result, I hadn't remembered that method existed. Updated!

Regarding multi-output instructions in general: yes, this was a bit of a compromise for now. Jamey and I went back and forth a bit on an Inst-centric vs. Value-centric design in the elaborator and ultimately basing everything on Values was far, far simpler; so to keep everything straight, we need to deal only with single-result instructions.

We judged this to be acceptable-ish because the most common multi-result instructions are calls. There are indeed also adds-with-carries but, for now, not optimizing these probably will not have a material impact on speedups.

Ultimately I'm curious whether we might be able to adapt a tuple-types-and-projection approach to CLIF as well, though that's a much larger change.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:37):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:50):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:50):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:50):

cfallin created PR review comment:

Added! Hopefully the explanation helps.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:50):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 21:50):

cfallin created PR review comment:

Ah, reverted, thanks for catching this!

For the curious, this change came from a dead-end (since reverted) where we invented a notion of a "pseudo-loop nest" based on the domtree (basic idea is that we wanted the loop nest to fully be a subtree of the domtree; consider e.g. a loop exit dominating the rest of the function, that's where the loop tree and domtree differ) and loop nests could get much deeper. But we've reverted all that and LoopLevel should correspond only to true loop-nest level again, now.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 22:25):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 22:26):

cfallin updated PR #5382 from egraph-in-dfg-rebase to main.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 06 2022 at 22:58):

cfallin merged PR #5382.


Last updated: Oct 23 2024 at 20:03 UTC