wasmtime / issue #7321 Move CLIF legalization into ISLE · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #7321 Move CLIF legalization into ISLE

Wasmtime GitHub notifications bot (Oct 21 2023 at 23:26):

Adding myself as a reviewer in addition to Trevor as I definitely want to look this over... this is really exciting!

Wasmtime GitHub notifications bot (Oct 21 2023 at 23:38):

One of the more recent test failures highlights a weakness of this strategy. One of the concerns brought up was that if legalization removes a construct there's nothing preventing optimizations from recreating the construct. That in fact just happened. For example a fcvt_from_uint (splat x) was converted into splat (fcvt_from_uint x) which then failed to codegen on the x64 backend because it no longer implements a lowering for 64-bit fcvt_from_uint. I "fixed" the issue since it only showed up in a pattern where a uextend could be removed which downgraded it to a conversion-from-32-bit integer which works, but that's not a complete fix. The alternative would be to add support to optimizes to understand the current ISA and when appropriate a question could be asked "can I create this node?" and it might return false on some ISAs if it produces something that can't be lowered. Alternatively legalization could run both before and after egraphs, too.

This also reminds me that I have not done requisite performance work for this. I don't think it will have much of an impact on compile time since it's not much different than the previous legalization pass. That being said I should still confirm before landing of course.

Wasmtime GitHub notifications bot (Oct 22 2023 at 00:10):

cfallin commented on issue #7321:

One high-level thought: if legalization rewrites were egraph rules, that would allow us to use the cost mechanism to push away from "illegal" outputs for a given ISA: the op we want to legalize away has infinite cost. That would also address one concern I had skimming this just now, namely the ins_... vs. replace_... ctors, which I suppose is an efficiency vs. orthogonality question (egraph rewrites necessarily create new nodes rather than replacing inst-data in place). The other downside is that it would require us to run the egraphs infra even with opts disabled, since legalizations are necessary for compilation to succeed. However we could either exclude the usual body of opt rules and alias-based optimizations, or write a faster toplevel loop that avoids the egraph / elab roundtrip altogether, if that were a concern. The major upside of course is that it puts these rewrites into the one mechanism we have for CLIF-to-CLIF rewrites already (hence simplicity).

Wasmtime GitHub notifications bot (Oct 22 2023 at 00:17):

cfallin commented on issue #7321:

Ah, and I guess using the egraph pass in this way would also force the question of side-effecting ops -- we need to be able to create new ones, and replace/delete existing ones.

If we keep this as a separate pass for that reason then one possible tweak I'd suggest to the API is to have ctors the same as in the egraph ISLE environment (so (iadd ty x y) rather than ins_iadd / replace_iadd), make them insert before source inst, and have a separate delete; that would make a merging of the layers later a bit easier.

Wasmtime GitHub notifications bot (Oct 22 2023 at 02:45):

github-actions[bot] commented on issue #7321:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:x64", "cranelift:meta", "isle"

Thus the following users have been cc'd because of the following labels:

cfallin: isle

fitzgen: isle

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Oct 23 2023 at 15:28):

alexcrichton commented on issue #7321:

Yeah the ins_* and replace_* I wasn't super happy about, and it represents my best attempt to match what the existing legalizations do only in ISLE instead. I do agree it'd be best to have all this in one place rather than two, but one other thing we'd have to figure out for egraphs would be CFG modification which I don't think currently has infrastructure, but probably would be possible? Or I suppose I should clarify that I would not personally feel confident modifying egraphs for this purpose for either handling side-effecting-ops or updating control flow, although I suspect it's still possible.

I originally wanted to reuse the (iadd ...) extract as a constructor as well, similar to how egraphs does it. One problem with this though was that during if the ctor/etor have the same name then they also have to have the same arguments, and I think this only works when the instruction produces a single result (e.g. somewhere to get the type-to-match from). That didn't work for legalizing trapz for example which didn't produce any results.

I do wish there was something better than ins_* and replace_* though because it does feel error-prone. I had a tough time figuring something else out though so I ended up giving up and settling on translating what was preexisting faithfully. Mostly saying this to express my desire to find a better system as well.

Wasmtime GitHub notifications bot (Oct 24 2023 at 17:16):

cfallin commented on issue #7321:

CFG modification would be difficult at best in the egraph rewrite infra, because that's not really the paradigm the thing is built around -- the skeleton remains fixed and we deal in value representations. I think I'm settling on "edits to the skeleton are a fundamentally different pass"; that's a much clearer factoring, and points toward this basic design of three passes total -- legalize, opt, lower -- each with different possibilities, namely:

Legalize: 1-to-many expansion; can edit the CFG, can replace side-effecting and multivalue ops. Not meant to be the place for optimizations (identities, etc.); purely meant to be a lowering into a "core CLIF" subset that the selected backend can process. Rewrites are in-place mutations and rules are "directed" -- no multiple-representation possibility.

Optimize: many-to-many replacement, on pure ops only. Multiple representations supported, cost function drives the final choice. Meant to get the fixpoint goodness from combination of many identities. Should not re-introduce ops that legalize rewrites out; we may need some mild backend awareness for this to disable some rules (?). Cannot edit the "skeleton" of the CFG of side-effecting ops.

Lower: many-to-many expansion into instructions. Should encode whatever priorities and shortcuts are appropriate for the ISA, but should not embody general optimizations (e.g. algebraic rewrites). Cannot edit the CFG.

which is basically what you have in this PR (I just wanted to write it out explicitly here). So the main question IMHO, if this is OK with everyone, is the design around ins_ vs. replace_. I do feel that that's too big a footgun to take at the moment. Maybe one alternative could be: all ctors are ins_ variants (without the prefix), and we track whether any insts were inserted and delete the original if so? The return type of the legalize entry point could be the values that replace the original inst's values. I think the types of the terms would then be:
(decl bxor (Type Value Value) Inst)
(decl two_result_inst (Type Value Value) Inst)
(decl side_effect_no_result_inst (Type Value) Inst)
then to sketch some more
(decl inst_result (Inst u32) Value)
(extern constructor inst_result inst_result)

(type Values (enum
  (Zero)
  (One (val Value))
  (Two (val1 Value) (val2 Value))))

(decl legalize (Inst) Values)

(decl return_zero (Inst) Values)
(rule (return_zero inst) (Values.Zero))
(decl return_two (Inst) Values)
(rule (return_one inst)
  (let ((value Value (inst_result inst 0)))
    (Values.One value)))
(decl return_two (Inst) Values)
;; ...

;; convenience converter
(decl get_only_result (Inst) Value)
(extern constructor get_only_result get_only_result)
(convert Inst Value get_only_result)

;; now we can do:

(rule (legalize (bxor $I128 a b))
  (if-let (Values.Two a_lo a_hi) (return_two (isplit $I64 a)))
  (if-let (Values.Two b_lo b_hi) (return_two (isplit $I64 b)))
  (let ((lo Value (bxor $I64 a_lo a_hi)) ;; uses implicit converter
          (hi Value (bxor $I64 b_lo b_hi)))
    (Values.Two lo hi)))
and given the rule that emitting any inst deletes the original, and its results are aliased to the return values, this should work. Thoughts?

Wasmtime GitHub notifications bot (Oct 24 2023 at 17:17):

cfallin edited a comment on issue #7321:

CFG modification would be difficult at best in the egraph rewrite infra, because that's not really the paradigm the thing is built around -- the skeleton remains fixed and we deal in value representations. I think I'm settling on "edits to the skeleton are a fundamentally different pass"; that's a much clearer factoring, and points toward this basic design of three passes total -- legalize, opt, lower -- each with different possibilities, namely:

Legalize: 1-to-many expansion; can edit the CFG, can replace side-effecting and multivalue ops. Not meant to be the place for optimizations (identities, etc.); purely meant to be a lowering into a "core CLIF" subset that the selected backend can process. Rewrites are in-place mutations and rules are "directed" -- no multiple-representation possibility.

Optimize: many-to-many replacement, on pure ops only. Multiple representations supported, cost function drives the final choice. Meant to get the fixpoint goodness from combination of many identities. Should not re-introduce ops that legalize rewrites out; we may need some mild backend awareness for this to disable some rules (?). Cannot edit the "skeleton" of the CFG of side-effecting ops.

Lower: many-to-many expansion into instructions. Should encode whatever priorities and shortcuts are appropriate for the ISA, but should not embody general optimizations (e.g. algebraic rewrites). Cannot edit the CFG.

which is basically what you have in this PR (I just wanted to write it out explicitly here). So the main question IMHO, if this is OK with everyone, is the design around ins_ vs. replace_. I do feel that that's too big a footgun to take at the moment. Maybe one alternative could be: all ctors are ins_ variants (without the prefix), and we track whether any insts were inserted and delete the original if so? The return type of the legalize entry point could be the values that replace the original inst's values. I think the types of the terms would then be:
(decl bxor (Type Value Value) Inst)
then to sketch some more
(decl inst_result (Inst u32) Value)
(extern constructor inst_result inst_result)

(type Values (enum
  (Zero)
  (One (val Value))
  (Two (val1 Value) (val2 Value))))

(decl legalize (Inst) Values)

(decl return_zero (Inst) Values)
(rule (return_zero inst) (Values.Zero))
(decl return_two (Inst) Values)
(rule (return_one inst)
  (let ((value Value (inst_result inst 0)))
    (Values.One value)))
(decl return_two (Inst) Values)
;; ...

;; convenience converter
(decl get_only_result (Inst) Value)
(extern constructor get_only_result get_only_result)
(convert Inst Value get_only_result)

;; now we can do:

(rule (legalize (bxor $I128 a b))
  (if-let (Values.Two a_lo a_hi) (return_two (isplit $I64 a)))
  (if-let (Values.Two b_lo b_hi) (return_two (isplit $I64 b)))
  (let ((lo Value (bxor $I64 a_lo a_hi)) ;; uses implicit converter
          (hi Value (bxor $I64 b_lo b_hi)))
    (Values.Two lo hi)))
and given the rule that emitting any inst deletes the original, and its results are aliased to the return values, this should work. Thoughts?

Wasmtime GitHub notifications bot (Oct 24 2023 at 17:35):

cfallin commented on issue #7321:

(And on a little further thought, with some more autoconversions, the return_two bit could go away too)

Wasmtime GitHub notifications bot (Oct 24 2023 at 17:36):

afonso360 commented on issue #7321:

Optimize: Should not re-introduce ops that legalize rewrites out; we may need some mild backend awareness for this to disable some rules (?).

I'm slightly concerned about this, I think we are going to run into a lot of these situations. Not for i128, those are fairly hard operations to recognize.

But here's an example: (or (ishl x) (ushr x)) -> (rotl x) is a transformation that I think would be worth having at the midend, however that is exactly something that we also need to legalize away in RISC-V (if we don't have a certain extension).

So in this case we would also have to disable that rule for the RISC-V backend, if the x flag was not present.
We have a lot of these situation in the RISC-V backend, and my concern is that all of them are also good candidates to invert and say, hey this is also a good midend rule! (It's how I found most of the rule's I've added!)

Which worries me because, we now either have to have a lot of checks in the midend for X arch with Y extension. Or avoid adding rules. RISC-V is particularly bad here since we don't support a lot without extensions, so it might be that we end up adding a lot of legalization rules.

Wasmtime GitHub notifications bot (Oct 24 2023 at 17:37):

afonso360 edited a comment on issue #7321:

Optimize: Should not re-introduce ops that legalize rewrites out; we may need some mild backend awareness for this to disable some rules (?).

I'm slightly concerned about this, I think we are going to run into a lot of these situations. Not for i128, those are fairly hard operations to recognize.

But here's an example: (or (ishl x) (ushr x)) -> (rotl x) is a transformation that I think would be worth having at the midend, however that is exactly something that we also need to legalize away in RISC-V (if we don't have a certain extension).

So in this case we would also have to disable that rule for the RISC-V backend, if the x flag was not present.
We have a lot of these situation in the RISC-V backend, and my concern is that all of them are also good candidates to invert and say, hey this is also a good midend rule! (It's how I found most of the rule's I've added!)

Which worries me because, we now either have to have a lot of checks in the midend for X arch with Y extension. Or avoid adding rules. RISC-V is particularly bad here since we don't support a lot without extensions, so it might be that we end up adding a lot of legalization rules.

Wasmtime GitHub notifications bot (Oct 24 2023 at 17:50):

cfallin commented on issue #7321:

@afonso360 I agree, but I think that's pretty fundamental here: if we start making backends require a specific subset of total CLIF after opts, then we'll have to be aware of that when rewriting CLIF; the alternative I guess is doing legalization after optimization, which is effectively what we do today (in the sense that the legalization -- expansion of i128 ops etc -- is folded into the lowering) but loses optimization opportunities.

Said another way I think there are two separate choices here: what we legalize, and when we legalize. This discussion is about when we legalize (and carries the observation that if we legalize before rewrites, those rewrites need to preserve the legalized subset); but we haven't expanded what we legalize yet, and that's what causes the real issue. If we were to legalize rotates into something else in today's (pre-optimization) legalization phase we'd hit the same issues that you're describing, I think -- we're OK mainly because we legalize away fairly little (the _imm ops, table ops, global values) at the moment.

Wasmtime GitHub notifications bot (Oct 24 2023 at 17:59):

alexcrichton commented on issue #7321:

One option would perhaps be to start running legalization twice? Once before egraphs and once after. Legalization might not expand rotr in the pre-egraph-pass but only in the post-egraph pass. In the theory that legalization isn't too expensive (mostly just an iteration of the IR) it in theory shouldn't add too much cost.

Wasmtime GitHub notifications bot (Oct 25 2023 at 19:59):

alexcrichton commented on issue #7321:

Conclusions from today's meeting (please correct me if I'm wrong):

Replace ins_* and replace_* with a single constructor as (described above).

Add per-backend cost functions to assign a high cost to any instruction which can't be natively lowered to prevent egraphs from generating it.

Only run lowering once before egraphs.

Wasmtime GitHub notifications bot (Nov 13 2023 at 23:11):

alexcrichton commented on issue #7321:

Ok I've come back to this and I've implemented the first piece which is to unify the ctors/etors into one. The downside of that commit, however, is that there's quite a few changes in the test suite, not all of which are positive I think. For example I saw a number of x64 add instructions turn into lea.

Additionally as I've dug in more, I'm not actually sure how to implement the cost function. The nuances of what can and cannot be codegen'd cannot be represented currently in the cost model and what the inputs are. For example given the legalization implemented in this PR the x64 backend cannot codegen fcvt_from_uint but only when the input is a 64-bit type. This means that neither the opcode nor the result type of the instruction is sufficient to determine whether something has a high cost or not, only in combination in this opcode does it become high cost. I couldn't at least initially myself see how to slot that in.

Now one answer could perhaps be "well we just can't legalize that and the backend must support it". I'm not sure how to best handle that.

Wasmtime GitHub notifications bot (Nov 14 2023 at 17:43):

fitzgen commented on issue #7321:

This means that neither the opcode nor the result type of the instruction is sufficient to determine whether something has a high cost or not, only in combination in this opcode does it become high cost. I couldn't at least initially myself see how to slot that in.

We can't pass both of those to the cost function?

Wasmtime GitHub notifications bot (Nov 14 2023 at 17:57):

alexcrichton commented on issue #7321:

Ah sorry I should speak a bit more precisely. Yes both types can be passed in by I gave the wrong impression that this would be sufficient. Instead what actually needs to happen is that a fcvt_from_uint instruction can only fail to be generated on x64 if the input type is i64. The opcode plus the control type (either f32 or f64 in this case) doesn't contain the information about the operand.

Last updated: Apr 17 2025 at 17:03 UTC