wasmtime / issue #4128 Cranelift: develop a mid-end optim... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #4128 Cranelift: develop a mid-end optim...

Wasmtime GitHub notifications bot (May 10 2022 at 23:15):

As part of a general push to improve the quality of generated code, we want to extend the set of optimizations that we perform on the IR (CLIF) beyond the current set of GVN, LICM, DCE, etc passes, and develop some general analyses to support these and the CLIF-to-MachInst lowering pass. For example, we may want:

Alias analysis, and redundant load elimination, store-to-load forwarding, and dead store elimination using its results;

More general constant propagation and folding (compile-time evaluation) using the CLIF interpreter;

Bounds-check elimination, finding when one check dominates another and makes it unnecessary;

Integer range analysis, to help with bounds-check elimination;

Demanded-bits and defined-bits analyses, either over the CLIF or in concert with the lowering (since at least defined-bits may depend on the instructions chosen);

and others.

We should generally strive to write these analyses and transforms with the pass-specific bits in ISLE. This will bring both short- and long-term benefits:

Pattern-matching is a very good fit for the sorts of transfer/meet functions that many analyses require, allowing for concise and less error-prone expression of the ideas;

The optimizations that allow for efficient merging of many different matching rules during lowering would also bring benefits to any complex transfer function in an analysis;

Building passes out of ISLE rules that analyze CLIF and transform CLIF to CLIF lets us eventually fuse this mid-end with the CLIF-to-MachInst lowering pass;

Putting everything into the DSL in which we express our backend lowering rewrites allows us to reuse whatever formal-verification machinery we build around it.

The main steps of this work will be:

[ ] Develop a lazy analysis framework over CLIF, with transfer/meet functions in ISLE;

[ ] Develop a toplevel generic transform pass driver that edits CLIF in-place, using logic written in ISLE with the same CLIF extractors and constructors as for backends;

[ ] Co-develop, with the above, several initial passes (e.g. alias analysis and redundant-load elimination).

Possibly, if we can, it may be interesting to also:

[ ] Find ways to fuse analyses and passes in the mid-end, possibly enabling a lightweight combination of mid-end passes that stream over CLIF once and emit new CLIF, rather than editing in place;

[ ] Investigate whether we can fuse the above with the backend lowering.

Theese last two steps are intentionally vague and bring up questions of pass direction and pass ordering; but I suspect that we may be able to do something to get down to a handful of passes if we are careful. In any case, finding a way to make this work would be a bonus, and the main benefit provided by the work in this issue overall is ease of development, better likelihood of correctness, and compatibility with verification efforts.

Wasmtime GitHub notifications bot (May 10 2022 at 23:15):

cfallin labeled issue #4128:

As part of a general push to improve the quality of generated code, we want to extend the set of optimizations that we perform on the IR (CLIF) beyond the current set of GVN, LICM, DCE, etc passes, and develop some general analyses to support these and the CLIF-to-MachInst lowering pass. For example, we may want:

Alias analysis, and redundant load elimination, store-to-load forwarding, and dead store elimination using its results;

More general constant propagation and folding (compile-time evaluation) using the CLIF interpreter;

Bounds-check elimination, finding when one check dominates another and makes it unnecessary;

Integer range analysis, to help with bounds-check elimination;

Demanded-bits and defined-bits analyses, either over the CLIF or in concert with the lowering (since at least defined-bits may depend on the instructions chosen);

and others.

We should generally strive to write these analyses and transforms with the pass-specific bits in ISLE. This will bring both short- and long-term benefits:

Pattern-matching is a very good fit for the sorts of transfer/meet functions that many analyses require, allowing for concise and less error-prone expression of the ideas;

The optimizations that allow for efficient merging of many different matching rules during lowering would also bring benefits to any complex transfer function in an analysis;

Building passes out of ISLE rules that analyze CLIF and transform CLIF to CLIF lets us eventually fuse this mid-end with the CLIF-to-MachInst lowering pass;

Putting everything into the DSL in which we express our backend lowering rewrites allows us to reuse whatever formal-verification machinery we build around it.

The main steps of this work will be:

[ ] Develop a lazy analysis framework over CLIF, with transfer/meet functions in ISLE;

[ ] Develop a toplevel generic transform pass driver that edits CLIF in-place, using logic written in ISLE with the same CLIF extractors and constructors as for backends;

[ ] Co-develop, with the above, several initial passes (e.g. alias analysis and redundant-load elimination).

Possibly, if we can, it may be interesting to also:

[ ] Find ways to fuse analyses and passes in the mid-end, possibly enabling a lightweight combination of mid-end passes that stream over CLIF once and emit new CLIF, rather than editing in place;

[ ] Investigate whether we can fuse the above with the backend lowering.

Theese last two steps are intentionally vague and bring up questions of pass direction and pass ordering; but I suspect that we may be able to do something to get down to a handful of passes if we are careful. In any case, finding a way to make this work would be a bonus, and the main benefit provided by the work in this issue overall is ease of development, better likelihood of correctness, and compatibility with verification efforts.

Wasmtime GitHub notifications bot (May 10 2022 at 23:15):

cfallin labeled issue #4128:

As part of a general push to improve the quality of generated code, we want to extend the set of optimizations that we perform on the IR (CLIF) beyond the current set of GVN, LICM, DCE, etc passes, and develop some general analyses to support these and the CLIF-to-MachInst lowering pass. For example, we may want:

Alias analysis, and redundant load elimination, store-to-load forwarding, and dead store elimination using its results;

More general constant propagation and folding (compile-time evaluation) using the CLIF interpreter;

Bounds-check elimination, finding when one check dominates another and makes it unnecessary;

Integer range analysis, to help with bounds-check elimination;

Demanded-bits and defined-bits analyses, either over the CLIF or in concert with the lowering (since at least defined-bits may depend on the instructions chosen);

and others.

We should generally strive to write these analyses and transforms with the pass-specific bits in ISLE. This will bring both short- and long-term benefits:

Pattern-matching is a very good fit for the sorts of transfer/meet functions that many analyses require, allowing for concise and less error-prone expression of the ideas;

The optimizations that allow for efficient merging of many different matching rules during lowering would also bring benefits to any complex transfer function in an analysis;

Building passes out of ISLE rules that analyze CLIF and transform CLIF to CLIF lets us eventually fuse this mid-end with the CLIF-to-MachInst lowering pass;

Putting everything into the DSL in which we express our backend lowering rewrites allows us to reuse whatever formal-verification machinery we build around it.

The main steps of this work will be:

[ ] Develop a lazy analysis framework over CLIF, with transfer/meet functions in ISLE;

[ ] Develop a toplevel generic transform pass driver that edits CLIF in-place, using logic written in ISLE with the same CLIF extractors and constructors as for backends;

[ ] Co-develop, with the above, several initial passes (e.g. alias analysis and redundant-load elimination).

Possibly, if we can, it may be interesting to also:

[ ] Find ways to fuse analyses and passes in the mid-end, possibly enabling a lightweight combination of mid-end passes that stream over CLIF once and emit new CLIF, rather than editing in place;

[ ] Investigate whether we can fuse the above with the backend lowering.

Theese last two steps are intentionally vague and bring up questions of pass direction and pass ordering; but I suspect that we may be able to do something to get down to a handful of passes if we are careful. In any case, finding a way to make this work would be a bonus, and the main benefit provided by the work in this issue overall is ease of development, better likelihood of correctness, and compatibility with verification efforts.

Wasmtime GitHub notifications bot (May 10 2022 at 23:36):

cfallin edited issue #4128:

As part of a general push to improve the quality of generated code, we want to extend the set of optimizations that we perform on the IR (CLIF) beyond the current set of GVN, LICM, DCE, etc passes, and develop some general analyses to support these and the CLIF-to-MachInst lowering pass. For example, we may want:

Alias analysis, and redundant load elimination, store-to-load forwarding, and dead store elimination using its results;

More general constant propagation and folding (compile-time evaluation) using the CLIF interpreter;

Bounds-check elimination, finding when one check dominates another and makes it unnecessary;

Integer range analysis, to help with bounds-check elimination;

Demanded-bits and defined-bits analyses, either over the CLIF or in concert with the lowering (since at least defined-bits may depend on the instructions chosen);

and others.

We should generally strive to write these analyses and transforms with the pass-specific bits in ISLE. This will bring both short- and long-term benefits:

Pattern-matching is a very good fit for the sorts of transfer/meet functions that many analyses require, allowing for concise and less error-prone expression of the ideas;

The optimizations that allow for efficient merging of many different matching rules during lowering would also bring benefits to any complex transfer function in an analysis;

Building passes out of ISLE rules that analyze CLIF and transform CLIF to CLIF lets us eventually fuse this mid-end with the CLIF-to-MachInst lowering pass;

Putting everything into the DSL in which we express our backend lowering rewrites allows us to reuse whatever formal-verification machinery we build around it.

The main steps of this work will be:

[ ] Develop a lazy analysis framework over CLIF, with transfer/meet functions in ISLE;

#4129

[ ] Develop a toplevel generic transform pass driver that edits CLIF in-place, using logic written in ISLE with the same CLIF extractors and constructors as for backends;

#4130

[ ] Co-develop, with the above, several initial passes (e.g. alias analysis and redundant-load elimination).

#4131

Possibly, if we can, it may be interesting to also:

[ ] Find ways to fuse analyses and passes in the mid-end, possibly enabling a lightweight combination of mid-end passes that stream over CLIF once and emit new CLIF, rather than editing in place;

[ ] Investigate whether we can fuse the above with the backend lowering.

Theese last two steps are intentionally vague and bring up questions of pass direction and pass ordering; but I suspect that we may be able to do something to get down to a handful of passes if we are careful. In any case, finding a way to make this work would be a bonus, and the main benefit provided by the work in this issue overall is ease of development, better likelihood of correctness, and compatibility with verification efforts.

Wasmtime GitHub notifications bot (May 11 2022 at 05:59):

bjorn3 commented on issue #4128:

Investigate whether we can fuse the above with the backend lowering.

That would prevent optimizations that can't be streamed from running after these streamable optimizations, right?

Wasmtime GitHub notifications bot (May 11 2022 at 17:03):

cfallin commented on issue #4128:

Right, it would be a particular configuration when compilation speed is more important. I don't think we would unconditionally build the backend in this way, as long as we also want to support higher optimization levels.

Wasmtime GitHub notifications bot (May 27 2022 at 20:33):

cfallin commented on issue #4128:

For anyone curious, I've been doing a lot of thinking and reading trying to work out a good approach here, with several toplevel goals: (i) enable rewrites in ISLE, (ii) solve the phase-ordering problem in a better way.

Briefly, after introducing alias analysis in #4163, it became apparent to me that we need a more uniform approach to interleaving optimizations and allowing them to interact. That had already been the idea to a large degree with the desire to use ISLE to express transforms (the DSL compiler could combine rules from all kinds of analyses/transforms into a uniform rewrite step iterated until fixpoint); but it had become apparent that adding alias analysis (depending on GVN/alias/cprop/const-folding to prove equal addresses) sort of pushed the problem over the edge. Previously we had few enough rewrites that running simple_preopt once, then GVN once, pretty much was OK; not anymore.

So I've been looking seriously into e-graphs as an IR between CLIF and VCode lowering, with the idea that we could use the excellent egg library, probably with integration to ISLE (or rather with ISLE integration to egg: generating slightly different code that can match on the egraph).

The reason I've been thinking that rewriting from CLIF into an egraph, then the egraph straight to VCode, is that adapting the Lower infrastructure to query an egraph directly is actually probably simpler than trying to lower back to CLIF, in addition to the efficiency gained from removing a step. Furthermore our CLIF optimizations can then be shifted later in the pipeline, to become egraph rewrite rules instead; and we could subsume GVN and LICM entirely by implicit invariants and hashconsing/merging in the process of building the egraph and lowering out of it.

One can represent control flow by creating explicit nodes for "block predicates" and edges for "state flow", akin to a PDG or VSDG. The hard part is actually coming out of the egraph representation back to linearized code; in other words, deciding when to compute each node.

I've gone through a few iterations here and will go ahead and link my braindump/notes of my explorations to date, but in brief have discovered that there is a reason for multiple PhD theses to exist on just this topic, and at least my latest formulation reduces to a weighted set cover problem, which excitingly enough is one of the canonical NP-complete problems. So, I'm starting to think that there may be an intermediate design point where we retain some information about the original program order in CLIF in the egraph nodes, but introduce targetted rewrites/merging steps that explicitly do something like GVN. That would give us the best of both worlds, namely the expression visibility across the function body for rewrite rules to work, without exciting exponential blowups in compile time.

I'll note for completeness too that all of this I envision being used only in an "optimizations enabled" compilation; Cranelfit-without-opts would skip all of this, and continue to lower straight from CLIF to VCode. (Whether that implies two parameterizations/monomorphizations of each backend, I'm not sure yet; details TBD.)

Anyway, that's my update to keep folks posted; hopefully more soon!

Wasmtime GitHub notifications bot (Nov 02 2022 at 16:18):

fitzgen commented on issue #4128:

This is implemented in the recent e-graphs work. Closing.

Wasmtime GitHub notifications bot (Nov 02 2022 at 16:18):

fitzgen closed issue #4128:

As part of a general push to improve the quality of generated code, we want to extend the set of optimizations that we perform on the IR (CLIF) beyond the current set of GVN, LICM, DCE, etc passes, and develop some general analyses to support these and the CLIF-to-MachInst lowering pass. For example, we may want:

Alias analysis, and redundant load elimination, store-to-load forwarding, and dead store elimination using its results;

More general constant propagation and folding (compile-time evaluation) using the CLIF interpreter;

Bounds-check elimination, finding when one check dominates another and makes it unnecessary;

Integer range analysis, to help with bounds-check elimination;

Demanded-bits and defined-bits analyses, either over the CLIF or in concert with the lowering (since at least defined-bits may depend on the instructions chosen);

and others.

We should generally strive to write these analyses and transforms with the pass-specific bits in ISLE. This will bring both short- and long-term benefits:

Pattern-matching is a very good fit for the sorts of transfer/meet functions that many analyses require, allowing for concise and less error-prone expression of the ideas;

The optimizations that allow for efficient merging of many different matching rules during lowering would also bring benefits to any complex transfer function in an analysis;

Building passes out of ISLE rules that analyze CLIF and transform CLIF to CLIF lets us eventually fuse this mid-end with the CLIF-to-MachInst lowering pass;

Putting everything into the DSL in which we express our backend lowering rewrites allows us to reuse whatever formal-verification machinery we build around it.

The main steps of this work will be:

[ ] Develop a lazy analysis framework over CLIF, with transfer/meet functions in ISLE;

#4129

[ ] Develop a toplevel generic transform pass driver that edits CLIF in-place, using logic written in ISLE with the same CLIF extractors and constructors as for backends;

#4130

[ ] Co-develop, with the above, several initial passes (e.g. alias analysis and redundant-load elimination).

#4131

Possibly, if we can, it may be interesting to also:

[ ] Find ways to fuse analyses and passes in the mid-end, possibly enabling a lightweight combination of mid-end passes that stream over CLIF once and emit new CLIF, rather than editing in place;

[ ] Investigate whether we can fuse the above with the backend lowering.

Theese last two steps are intentionally vague and bring up questions of pass direction and pass ordering; but I suspect that we may be able to do something to get down to a handful of passes if we are careful. In any case, finding a way to make this work would be a bonus, and the main benefit provided by the work in this issue overall is ease of development, better likelihood of correctness, and compatibility with verification efforts.

Last updated: Apr 18 2025 at 05:03 UTC