cfallin commented on issue #7321:
Adding myself as a reviewer in addition to Trevor as I definitely want to look this over... this is really exciting!
alexcrichton commented on issue #7321:
One of the more recent test failures highlights a weakness of this strategy. One of the concerns brought up was that if legalization removes a construct there's nothing preventing optimizations from recreating the construct. That in fact just happened. For example a
fcvt_from_uint (splat x)
was converted intosplat (fcvt_from_uint x)
which then failed to codegen on the x64 backend because it no longer implements a lowering for 64-bitfcvt_from_uint
. I "fixed" the issue since it only showed up in a pattern where auextend
could be removed which downgraded it to a conversion-from-32-bit integer which works, but that's not a complete fix. The alternative would be to add support to optimizes to understand the current ISA and when appropriate a question could be asked "can I create this node?" and it might returnfalse
on some ISAs if it produces something that can't be lowered. Alternatively legalization could run both before and after egraphs, too.This also reminds me that I have not done requisite performance work for this. I don't think it will have much of an impact on compile time since it's not much different than the previous legalization pass. That being said I should still confirm before landing of course.
cfallin commented on issue #7321:
One high-level thought: if legalization rewrites were egraph rules, that would allow us to use the cost mechanism to push away from "illegal" outputs for a given ISA: the op we want to legalize away has infinite cost. That would also address one concern I had skimming this just now, namely the
ins_...
vs.replace_...
ctors, which I suppose is an efficiency vs. orthogonality question (egraph rewrites necessarily create new nodes rather than replacing inst-data in place). The other downside is that it would require us to run the egraphs infra even with opts disabled, since legalizations are necessary for compilation to succeed. However we could either exclude the usual body of opt rules and alias-based optimizations, or write a faster toplevel loop that avoids the egraph / elab roundtrip altogether, if that were a concern. The major upside of course is that it puts these rewrites into the one mechanism we have for CLIF-to-CLIF rewrites already (hence simplicity).
cfallin commented on issue #7321:
Ah, and I guess using the egraph pass in this way would also force the question of side-effecting ops -- we need to be able to create new ones, and replace/delete existing ones.
If we keep this as a separate pass for that reason then one possible tweak I'd suggest to the API is to have ctors the same as in the egraph ISLE environment (so
(iadd ty x y)
rather thanins_iadd
/replace_iadd
), make them insert before source inst, and have a separatedelete
; that would make a merging of the layers later a bit easier.
github-actions[bot] commented on issue #7321:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:x64", "cranelift:meta", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
alexcrichton commented on issue #7321:
Yeah the
ins_*
andreplace_*
I wasn't super happy about, and it represents my best attempt to match what the existing legalizations do only in ISLE instead. I do agree it'd be best to have all this in one place rather than two, but one other thing we'd have to figure out for egraphs would be CFG modification which I don't think currently has infrastructure, but probably would be possible? Or I suppose I should clarify that I would not personally feel confident modifying egraphs for this purpose for either handling side-effecting-ops or updating control flow, although I suspect it's still possible.I originally wanted to reuse the
(iadd ...)
extract as a constructor as well, similar to how egraphs does it. One problem with this though was that during if the ctor/etor have the same name then they also have to have the same arguments, and I think this only works when the instruction produces a single result (e.g. somewhere to get the type-to-match from). That didn't work for legalizingtrapz
for example which didn't produce any results.I do wish there was something better than
ins_*
andreplace_*
though because it does feel error-prone. I had a tough time figuring something else out though so I ended up giving up and settling on translating what was preexisting faithfully. Mostly saying this to express my desire to find a better system as well.
cfallin commented on issue #7321:
CFG modification would be difficult at best in the egraph rewrite infra, because that's not really the paradigm the thing is built around -- the skeleton remains fixed and we deal in value representations. I think I'm settling on "edits to the skeleton are a fundamentally different pass"; that's a much clearer factoring, and points toward this basic design of three passes total -- legalize, opt, lower -- each with different possibilities, namely:
Legalize: 1-to-many expansion; can edit the CFG, can replace side-effecting and multivalue ops. Not meant to be the place for optimizations (identities, etc.); purely meant to be a lowering into a "core CLIF" subset that the selected backend can process. Rewrites are in-place mutations and rules are "directed" -- no multiple-representation possibility.
Optimize: many-to-many replacement, on pure ops only. Multiple representations supported, cost function drives the final choice. Meant to get the fixpoint goodness from combination of many identities. Should not re-introduce ops that legalize rewrites out; we may need some mild backend awareness for this to disable some rules (?). Cannot edit the "skeleton" of the CFG of side-effecting ops.
Lower: many-to-many expansion into instructions. Should encode whatever priorities and shortcuts are appropriate for the ISA, but should not embody general optimizations (e.g. algebraic rewrites). Cannot edit the CFG.
which is basically what you have in this PR (I just wanted to write it out explicitly here). So the main question IMHO, if this is OK with everyone, is the design around
ins_
vs.replace_
. I do feel that that's too big a footgun to take at the moment. Maybe one alternative could be: all ctors areins_
variants (without the prefix), and we track whether any insts were inserted and delete the original if so? The return type of thelegalize
entry point could be the values that replace the original inst's values. I think the types of the terms would then be:(decl bxor (Type Value Value) Inst) (decl two_result_inst (Type Value Value) Inst) (decl side_effect_no_result_inst (Type Value) Inst)
then to sketch some more
(decl inst_result (Inst u32) Value) (extern constructor inst_result inst_result) (type Values (enum (Zero) (One (val Value)) (Two (val1 Value) (val2 Value)))) (decl legalize (Inst) Values) (decl return_zero (Inst) Values) (rule (return_zero inst) (Values.Zero)) (decl return_two (Inst) Values) (rule (return_one inst) (let ((value Value (inst_result inst 0))) (Values.One value))) (decl return_two (Inst) Values) ;; ... ;; convenience converter (decl get_only_result (Inst) Value) (extern constructor get_only_result get_only_result) (convert Inst Value get_only_result) ;; now we can do: (rule (legalize (bxor $I128 a b)) (if-let (Values.Two a_lo a_hi) (return_two (isplit $I64 a))) (if-let (Values.Two b_lo b_hi) (return_two (isplit $I64 b))) (let ((lo Value (bxor $I64 a_lo a_hi)) ;; uses implicit converter (hi Value (bxor $I64 b_lo b_hi))) (Values.Two lo hi)))
and given the rule that emitting any inst deletes the original, and its results are aliased to the return values, this should work. Thoughts?
cfallin edited a comment on issue #7321:
CFG modification would be difficult at best in the egraph rewrite infra, because that's not really the paradigm the thing is built around -- the skeleton remains fixed and we deal in value representations. I think I'm settling on "edits to the skeleton are a fundamentally different pass"; that's a much clearer factoring, and points toward this basic design of three passes total -- legalize, opt, lower -- each with different possibilities, namely:
Legalize: 1-to-many expansion; can edit the CFG, can replace side-effecting and multivalue ops. Not meant to be the place for optimizations (identities, etc.); purely meant to be a lowering into a "core CLIF" subset that the selected backend can process. Rewrites are in-place mutations and rules are "directed" -- no multiple-representation possibility.
Optimize: many-to-many replacement, on pure ops only. Multiple representations supported, cost function drives the final choice. Meant to get the fixpoint goodness from combination of many identities. Should not re-introduce ops that legalize rewrites out; we may need some mild backend awareness for this to disable some rules (?). Cannot edit the "skeleton" of the CFG of side-effecting ops.
Lower: many-to-many expansion into instructions. Should encode whatever priorities and shortcuts are appropriate for the ISA, but should not embody general optimizations (e.g. algebraic rewrites). Cannot edit the CFG.
which is basically what you have in this PR (I just wanted to write it out explicitly here). So the main question IMHO, if this is OK with everyone, is the design around
ins_
vs.replace_
. I do feel that that's too big a footgun to take at the moment. Maybe one alternative could be: all ctors areins_
variants (without the prefix), and we track whether any insts were inserted and delete the original if so? The return type of thelegalize
entry point could be the values that replace the original inst's values. I think the types of the terms would then be:(decl bxor (Type Value Value) Inst)
then to sketch some more
(decl inst_result (Inst u32) Value) (extern constructor inst_result inst_result) (type Values (enum (Zero) (One (val Value)) (Two (val1 Value) (val2 Value)))) (decl legalize (Inst) Values) (decl return_zero (Inst) Values) (rule (return_zero inst) (Values.Zero)) (decl return_two (Inst) Values) (rule (return_one inst) (let ((value Value (inst_result inst 0))) (Values.One value))) (decl return_two (Inst) Values) ;; ... ;; convenience converter (decl get_only_result (Inst) Value) (extern constructor get_only_result get_only_result) (convert Inst Value get_only_result) ;; now we can do: (rule (legalize (bxor $I128 a b)) (if-let (Values.Two a_lo a_hi) (return_two (isplit $I64 a))) (if-let (Values.Two b_lo b_hi) (return_two (isplit $I64 b))) (let ((lo Value (bxor $I64 a_lo a_hi)) ;; uses implicit converter (hi Value (bxor $I64 b_lo b_hi))) (Values.Two lo hi)))
and given the rule that emitting any inst deletes the original, and its results are aliased to the return values, this should work. Thoughts?
cfallin commented on issue #7321:
(And on a little further thought, with some more autoconversions, the
return_two
bit could go away too)
afonso360 commented on issue #7321:
Optimize: Should not re-introduce ops that legalize rewrites out; we may need some mild backend awareness for this to disable some rules (?).
I'm slightly concerned about this, I think we are going to run into a lot of these situations. Not for i128, those are fairly hard operations to recognize.
But here's an example: (or (ishl x) (ushr x)) -> (rotl x) is a transformation that I think would be worth having at the midend, however that is exactly something that we also need to legalize away in RISC-V (if we don't have a certain extension).
So in this case we would also have to disable that rule for the RISC-V backend, if the
x
flag was not present.
We have a lot of these situation in the RISC-V backend, and my concern is that all of them are also good candidates to invert and say, hey this is also a good midend rule! (It's how I found most of the rule's I've added!)Which worries me because, we now either have to have a lot of checks in the midend for X arch with Y extension. Or avoid adding rules. RISC-V is particularly bad here since we don't support a lot without extensions, so it might be that we end up adding a lot of legalization rules.
afonso360 edited a comment on issue #7321:
Optimize: Should not re-introduce ops that legalize rewrites out; we may need some mild backend awareness for this to disable some rules (?).
I'm slightly concerned about this, I think we are going to run into a lot of these situations. Not for i128, those are fairly hard operations to recognize.
But here's an example:
(or (ishl x) (ushr x)) -> (rotl x)
is a transformation that I think would be worth having at the midend, however that is exactly something that we also need to legalize away in RISC-V (if we don't have a certain extension).So in this case we would also have to disable that rule for the RISC-V backend, if the
x
flag was not present.
We have a lot of these situation in the RISC-V backend, and my concern is that all of them are also good candidates to invert and say, hey this is also a good midend rule! (It's how I found most of the rule's I've added!)Which worries me because, we now either have to have a lot of checks in the midend for X arch with Y extension. Or avoid adding rules. RISC-V is particularly bad here since we don't support a lot without extensions, so it might be that we end up adding a lot of legalization rules.
cfallin commented on issue #7321:
@afonso360 I agree, but I think that's pretty fundamental here: if we start making backends require a specific subset of total CLIF after opts, then we'll have to be aware of that when rewriting CLIF; the alternative I guess is doing legalization after optimization, which is effectively what we do today (in the sense that the legalization -- expansion of i128 ops etc -- is folded into the lowering) but loses optimization opportunities.
Said another way I think there are two separate choices here: what we legalize, and when we legalize. This discussion is about when we legalize (and carries the observation that if we legalize before rewrites, those rewrites need to preserve the legalized subset); but we haven't expanded what we legalize yet, and that's what causes the real issue. If we were to legalize rotates into something else in today's (pre-optimization) legalization phase we'd hit the same issues that you're describing, I think -- we're OK mainly because we legalize away fairly little (the
_imm
ops, table ops, global values) at the moment.
alexcrichton commented on issue #7321:
One option would perhaps be to start running legalization twice? Once before egraphs and once after. Legalization might not expand
rotr
in the pre-egraph-pass but only in the post-egraph pass. In the theory that legalization isn't too expensive (mostly just an iteration of the IR) it in theory shouldn't add too much cost.
alexcrichton commented on issue #7321:
Conclusions from today's meeting (please correct me if I'm wrong):
- Replace
ins_*
andreplace_*
with a single constructor as (described above).- Add per-backend cost functions to assign a high cost to any instruction which can't be natively lowered to prevent egraphs from generating it.
- Only run lowering once before egraphs.
alexcrichton commented on issue #7321:
Ok I've come back to this and I've implemented the first piece which is to unify the ctors/etors into one. The downside of that commit, however, is that there's quite a few changes in the test suite, not all of which are positive I think. For example I saw a number of x64
add
instructions turn intolea
.Additionally as I've dug in more, I'm not actually sure how to implement the cost function. The nuances of what can and cannot be codegen'd cannot be represented currently in the cost model and what the inputs are. For example given the legalization implemented in this PR the x64 backend cannot codegen
fcvt_from_uint
but only when the input is a 64-bit type. This means that neither the opcode nor the result type of the instruction is sufficient to determine whether something has a high cost or not, only in combination in this opcode does it become high cost. I couldn't at least initially myself see how to slot that in.Now one answer could perhaps be "well we just can't legalize that and the backend must support it". I'm not sure how to best handle that.
fitzgen commented on issue #7321:
This means that neither the opcode nor the result type of the instruction is sufficient to determine whether something has a high cost or not, only in combination in this opcode does it become high cost. I couldn't at least initially myself see how to slot that in.
We can't pass both of those to the cost function?
alexcrichton commented on issue #7321:
Ah sorry I should speak a bit more precisely. Yes both types can be passed in by I gave the wrong impression that this would be sufficient. Instead what actually needs to happen is that a
fcvt_from_uint
instruction can only fail to be generated on x64 if the input type is i64. The opcode plus the control type (either f32 or f64 in this case) doesn't contain the information about the operand.
Last updated: Nov 22 2024 at 16:03 UTC