Stream: git-wasmtime

Topic: wasmtime / issue #4128 Cranelift: develop a mid-end optim...


view this post on Zulip Wasmtime GitHub notifications bot (May 10 2022 at 23:15):

cfallin opened issue #4128:

As part of a general push to improve the quality of generated code, we want to extend the set of optimizations that we perform on the IR (CLIF) beyond the current set of GVN, LICM, DCE, etc passes, and develop some general analyses to support these and the CLIF-to-MachInst lowering pass. For example, we may want:

We should generally strive to write these analyses and transforms with the pass-specific bits in ISLE. This will bring both short- and long-term benefits:

The main steps of this work will be:

Possibly, if we can, it may be interesting to also:

Theese last two steps are intentionally vague and bring up questions of pass direction and pass ordering; but I suspect that we may be able to do something to get down to a handful of passes if we are careful. In any case, finding a way to make this work would be a bonus, and the main benefit provided by the work in this issue overall is ease of development, better likelihood of correctness, and compatibility with verification efforts.

view this post on Zulip Wasmtime GitHub notifications bot (May 10 2022 at 23:15):

cfallin labeled issue #4128:

As part of a general push to improve the quality of generated code, we want to extend the set of optimizations that we perform on the IR (CLIF) beyond the current set of GVN, LICM, DCE, etc passes, and develop some general analyses to support these and the CLIF-to-MachInst lowering pass. For example, we may want:

We should generally strive to write these analyses and transforms with the pass-specific bits in ISLE. This will bring both short- and long-term benefits:

The main steps of this work will be:

Possibly, if we can, it may be interesting to also:

Theese last two steps are intentionally vague and bring up questions of pass direction and pass ordering; but I suspect that we may be able to do something to get down to a handful of passes if we are careful. In any case, finding a way to make this work would be a bonus, and the main benefit provided by the work in this issue overall is ease of development, better likelihood of correctness, and compatibility with verification efforts.

view this post on Zulip Wasmtime GitHub notifications bot (May 10 2022 at 23:15):

cfallin labeled issue #4128:

As part of a general push to improve the quality of generated code, we want to extend the set of optimizations that we perform on the IR (CLIF) beyond the current set of GVN, LICM, DCE, etc passes, and develop some general analyses to support these and the CLIF-to-MachInst lowering pass. For example, we may want:

We should generally strive to write these analyses and transforms with the pass-specific bits in ISLE. This will bring both short- and long-term benefits:

The main steps of this work will be:

Possibly, if we can, it may be interesting to also:

Theese last two steps are intentionally vague and bring up questions of pass direction and pass ordering; but I suspect that we may be able to do something to get down to a handful of passes if we are careful. In any case, finding a way to make this work would be a bonus, and the main benefit provided by the work in this issue overall is ease of development, better likelihood of correctness, and compatibility with verification efforts.

view this post on Zulip Wasmtime GitHub notifications bot (May 10 2022 at 23:36):

cfallin edited issue #4128:

As part of a general push to improve the quality of generated code, we want to extend the set of optimizations that we perform on the IR (CLIF) beyond the current set of GVN, LICM, DCE, etc passes, and develop some general analyses to support these and the CLIF-to-MachInst lowering pass. For example, we may want:

We should generally strive to write these analyses and transforms with the pass-specific bits in ISLE. This will bring both short- and long-term benefits:

The main steps of this work will be:

Possibly, if we can, it may be interesting to also:

Theese last two steps are intentionally vague and bring up questions of pass direction and pass ordering; but I suspect that we may be able to do something to get down to a handful of passes if we are careful. In any case, finding a way to make this work would be a bonus, and the main benefit provided by the work in this issue overall is ease of development, better likelihood of correctness, and compatibility with verification efforts.

view this post on Zulip Wasmtime GitHub notifications bot (May 11 2022 at 05:59):

bjorn3 commented on issue #4128:

Investigate whether we can fuse the above with the backend lowering.

That would prevent optimizations that can't be streamed from running after these streamable optimizations, right?

view this post on Zulip Wasmtime GitHub notifications bot (May 11 2022 at 17:03):

cfallin commented on issue #4128:

Right, it would be a particular configuration when compilation speed is more important. I don't think we would unconditionally build the backend in this way, as long as we also want to support higher optimization levels.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2022 at 20:33):

cfallin commented on issue #4128:

For anyone curious, I've been doing a lot of thinking and reading trying to work out a good approach here, with several toplevel goals: (i) enable rewrites in ISLE, (ii) solve the phase-ordering problem in a better way.

Briefly, after introducing alias analysis in #4163, it became apparent to me that we need a more uniform approach to interleaving optimizations and allowing them to interact. That had already been the idea to a large degree with the desire to use ISLE to express transforms (the DSL compiler could combine rules from all kinds of analyses/transforms into a uniform rewrite step iterated until fixpoint); but it had become apparent that adding alias analysis (depending on GVN/alias/cprop/const-folding to prove equal addresses) sort of pushed the problem over the edge. Previously we had few enough rewrites that running simple_preopt once, then GVN once, pretty much was OK; not anymore.

So I've been looking seriously into e-graphs as an IR between CLIF and VCode lowering, with the idea that we could use the excellent egg library, probably with integration to ISLE (or rather with ISLE integration to egg: generating slightly different code that can match on the egraph).

The reason I've been thinking that rewriting from CLIF into an egraph, then the egraph straight to VCode, is that adapting the Lower infrastructure to query an egraph directly is actually probably simpler than trying to lower back to CLIF, in addition to the efficiency gained from removing a step. Furthermore our CLIF optimizations can then be shifted later in the pipeline, to become egraph rewrite rules instead; and we could subsume GVN and LICM entirely by implicit invariants and hashconsing/merging in the process of building the egraph and lowering out of it.

One can represent control flow by creating explicit nodes for "block predicates" and edges for "state flow", akin to a PDG or VSDG. The hard part is actually coming out of the egraph representation back to linearized code; in other words, deciding when to compute each node.

I've gone through a few iterations here and will go ahead and link my braindump/notes of my explorations to date, but in brief have discovered that there is a reason for multiple PhD theses to exist on just this topic, and at least my latest formulation reduces to a weighted set cover problem, which excitingly enough is one of the canonical NP-complete problems. So, I'm starting to think that there may be an intermediate design point where we retain some information about the original program order in CLIF in the egraph nodes, but introduce targetted rewrites/merging steps that explicitly do something like GVN. That would give us the best of both worlds, namely the expression visibility across the function body for rewrite rules to work, without exciting exponential blowups in compile time.

I'll note for completeness too that all of this I envision being used only in an "optimizations enabled" compilation; Cranelfit-without-opts would skip all of this, and continue to lower straight from CLIF to VCode. (Whether that implies two parameterizations/monomorphizations of each backend, I'm not sure yet; details TBD.)

Anyway, that's my update to keep folks posted; hopefully more soon!

view this post on Zulip Wasmtime GitHub notifications bot (Nov 02 2022 at 16:18):

fitzgen commented on issue #4128:

This is implemented in the recent e-graphs work. Closing.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 02 2022 at 16:18):

fitzgen closed issue #4128:

As part of a general push to improve the quality of generated code, we want to extend the set of optimizations that we perform on the IR (CLIF) beyond the current set of GVN, LICM, DCE, etc passes, and develop some general analyses to support these and the CLIF-to-MachInst lowering pass. For example, we may want:

We should generally strive to write these analyses and transforms with the pass-specific bits in ISLE. This will bring both short- and long-term benefits:

The main steps of this work will be:

Possibly, if we can, it may be interesting to also:

Theese last two steps are intentionally vague and bring up questions of pass direction and pass ordering; but I suspect that we may be able to do something to get down to a handful of passes if we are careful. In any case, finding a way to make this work would be a bonus, and the main benefit provided by the work in this issue overall is ease of development, better likelihood of correctness, and compatibility with verification efforts.


Last updated: Oct 23 2024 at 20:03 UTC