Stream: cranelift

Topic: e-graph ISLE rules questions


view this post on Zulip Riccardo D'Ambrosio (Aug 04 2022 at 21:26):

:wave: @Chris Fallin

I'm trying to add some of the div/rem opts i talked about on github, i just had a few questions on things im a bit stuck on if you dont mind:

would this be the overall correct way to implement the basic transform?

(rule (simplify (udiv ty x c @ (iconst ty (power_of_two _))))
      (ushr ty x c))

I am mostly just confused on how to handle the immediate RHS value. I wrote this small extractor here:

    fn power_of_two(&mut self, imm: Imm64) -> Option<Imm64> {
        if let Some((false, pow)) = i64_is_power_of_two(imm.bits()) {
            Some(Imm64::new(pow as i64))
        } else {
            None
        }
    }

As part of prelude_opt.isle to extract the power of two imm. I then run cargo run -- test ./filetests/filetests/egraph/algebraic.clif for which i added another function below the existing one:

function %div_rem_pow_2(i32) -> i32 {
block0(v0: i32):
    v1 = iconst.i32 2
    v2 = udiv v0, v1
    ; check: v2 = ushr v0, v1
    return v2
}

It seems to fail however. I presume i am doing something wrong or missing something. I am also unsure if i should insert an upper bound for power_of_two, since the existing preopts seem to do that too.

view this post on Zulip Riccardo D'Ambrosio (Aug 04 2022 at 21:31):

in particular, im not sure how to match a const on the rhs with a specific extractor, and then also use the value in the rewrite

view this post on Zulip Chris Fallin (Aug 04 2022 at 21:31):

@Riccardo D'Ambrosio the general shape here looks correct. I'm not sure why the if-let on i64_is_power_of_two's result matches a false (what does the returned tuple mean? the function definition isn't here to check). Ah, and the rule as-written will use the original 2^x constant as the shift amount, rather than x

view this post on Zulip Chris Fallin (Aug 04 2022 at 21:31):

You'll probably want to write (iconst ty (power_of_two c)) and then use c as the shift-amount

view this post on Zulip Riccardo D'Ambrosio (Aug 04 2022 at 21:32):

its the function from simple_preopt.rs, the false is to check if the power of two is not negative

view this post on Zulip Riccardo D'Ambrosio (Aug 04 2022 at 21:32):

which it should not be because this is udiv, but im not sure

view this post on Zulip Riccardo D'Ambrosio (Aug 04 2022 at 21:34):

Chris Fallin said:

You'll probably want to write (iconst ty (power_of_two c)) and then use c as the shift-amount

iirc i tried that, but it complained that Variable 'c' has type Imm64 but we need Id in context. Which makes sense to me, but i wasn't quite sure how to put a new value into the graph and get back an ID

view this post on Zulip Chris Fallin (Aug 04 2022 at 21:34):

ah! for that you probably want (iconst ty c) in the right-hand side

view this post on Zulip Chris Fallin (Aug 04 2022 at 21:34):

(iconst along with the other opcodes work as both extractors and constructors in the mid-end)

view this post on Zulip Riccardo D'Ambrosio (Aug 04 2022 at 21:35):

I see

view this post on Zulip Riccardo D'Ambrosio (Aug 04 2022 at 21:35):

it compiles but it does not pass the test:

FAIL ./filetests/filetests/egraph/algebraic.clif: optimize

Caused by:
    filecheck failed for function on line 15:
    #0 check: v2 = ushr v0, v1
    > function %div_rem_pow_2(i32) -> i32 fast {
    Missed #0: \bv2 = ushr v0, v1\b
    >                                 block0(v0: i32):
    >                                     v1 = iconst.i32 2
    >                                     v2 = udiv v0, v1
    >                                     return v2
    > }

view this post on Zulip Chris Fallin (Aug 04 2022 at 21:36):

ok. I think I just need to look at this myself -- this stuff is literally a few days old and I'm still working out the kinks; debugging indirectly is an order of magnitude harder. But thank you for kicking the tires early :-)

view this post on Zulip Riccardo D'Ambrosio (Aug 04 2022 at 21:36):

no problem :+1:

view this post on Zulip Riccardo D'Ambrosio (Aug 04 2022 at 21:43):

oh wait, i just noticed, that should not be an iconst.i32 2, i need to check for v1 = iconst.i32 1, tho i dont think thats the root issue

view this post on Zulip Riccardo D'Ambrosio (Aug 04 2022 at 22:06):

seems like udiv stuff overall doesn't match, even with the existing x / 1 => x transform, odd

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 16:39):

@Chris Fallin i figured it out, imul maps to a Pure node because it cannot trap, while udiv/sdiv map to Instr because it could trap. So it is not matching the Instr node.

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 16:41):

a possible solution i noticed is adding Node::Instr to the ContextIter handler, though i am unsure if that's what you planned to do to handle such node types

view this post on Zulip Chris Fallin (Aug 05 2022 at 16:44):

Ah! That would do it. I suspect we might want to actually make the extractor iterate over all nodes instead (or Pure and Inst at least) for this reason, and perhaps have a way of marking a non-pure node as "deleted" when we can prove that it's safe, e.g. if we statically know the trap will never occur (this case)

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 16:45):

Yeah that would make sense

view this post on Zulip Chris Fallin (Aug 05 2022 at 16:45):

thanks for tracking this down; I've been working on integrating the alias analysis then I'll likely make this change

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 16:45):

:+1:

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 16:46):

kinda surprised you can integrate alias analysis without this change tho haha

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 16:47):

then again im not quite sure how an e-graph alias analysis would look like, since alias analysis requires tracking information in multiple passes and things like that

view this post on Zulip Chris Fallin (Aug 05 2022 at 16:52):

oh, I had been writing a special set of ctors/etors to make it work; and the analysis itself (last-store analysis) is separate

view this post on Zulip Chris Fallin (Aug 05 2022 at 16:52):

interestingly Redundant Load Elimination can actually just fall out of the node dedup, if one defines the node identity (Hash/Eq) carefully: based on addr eclass and offset (and load type), but not specific load instruction. Store-to-load forwarding is a bit harder :-)

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 16:54):

Chris Fallin said:

oh, I had been writing a special set of ctors/etors to make it work; and the analysis itself (last-store analysis) is separate

I see, what does the analysis pass run on/when? my confusion is that, wouldn't you need to run the pass at every iteration of the graph?

view this post on Zulip Chris Fallin (Aug 05 2022 at 16:55):

No, because stores are side-effecting so always exist and will not move; the analysis happens on the CLIF before egraph build and computes a property for each load

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 16:55):

Oh interesting

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 16:56):

i imagine that might get a bit more complex though once things like unrolling are added

view this post on Zulip Chris Fallin (Aug 05 2022 at 16:56):

If/when we want dead-store elimination that gets more complex, but DSE is quite tricky for Wasm at least because we have to get the state at any given trap point right

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 16:56):

I see

view this post on Zulip Chris Fallin (Aug 05 2022 at 16:56):

unrolling falls into "control-flow opt" which is outside the scope of the egraph-based work for now -- needs more thought!

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 16:57):

Yeah absolutely

view this post on Zulip Chris Fallin (Aug 05 2022 at 16:57):

there is certainly a fixpoint between at least inlining and egraph value opts (specifically: constant prop can lead to constant func ptr can lead to inlining can lead to more constant prop; important for dynamic-dispatch cases) so I do want to think more about control flow in the context of the egraph work eventually

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 16:59):

Yeah

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:00):

i imagine inling in an e-graph may work fairly well if the whole program is a single e-graph, since it could select between inlining and not inlining much better because it knows that inlining allowed for better optimization that outweighed the cost

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:01):

yep! and that was suggested by mwillsey over in the rfc thread actually too

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:01):

one could in theory have a bridge from the callsite to the function body also in the egraph, and let elaboration go transparently across that bridge

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:02):

Right, i noticed some talk about whether everything should be a single e-graph or not, but i forget what the final decision was/if there was one

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:02):

it gets a bit trickier once one wants to optimize though: fundamentally the different inlined bodies want to diverge sometimes. so some sort of cloning is needed

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:02):

for now it's a big architectural change (everything in Cranelift assumes per-function compilation) so it's more of a "future thoughts" sort of thing

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:03):

everything being one graph sounds awesome for memory usage since e-graphs are great at sharing

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:03):

but more complex as well

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:03):

Chris Fallin said:

for now it's a big architectural change (everything in Cranelift assumes per-function compilation) so it's more of a "future thoughts" sort of thing

Ah ok

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:08):

btw, does cranelift have a concept of NSW/NUW? i.e. ops which assume overflow does not happen. I know that condition is required for a significant amount of optimizations in llvm

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:09):

nope; that actually came up somewhere here a bit ago when I was looking at address-computation code

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:10):

the major issue with that is it introduces semantically undefined behavior, which is a notorious correctness footgun; it's better, all other things equal, to fully define the output for every set of inputs

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:10):

Yeah it's required for things like optimizing out bounds checks

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:10):

Yeah true

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:10):

I suspect that there may be cases where we can instead encode the knowledge in the types, or ... basically I want to find a way around adding it

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:11):

but this, too, needs more thought

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:11):

Riccardo D'Ambrosio said:

Yeah it's required for things like optimizing out bounds checks

well, to some degree, llvm has to also prove the condition does not overflow before optimizing it out

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:12):

This presentation by the llvm devs on how they optimize out math-based conditions was very interesting YouTube - 2021 LLVM Dev Mtg “A New Approach to Removing Redundant Conditions in LLVM”

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:12):

right, so for example a range-analysis could drive a rule that widens adds or removes uextends or whatever when the wraparound behavior can't be hit

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:12):

cool, I'll add that to my queue, thanks!

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:12):

I was wondering at some point if its possible to do it using e-graphs by tracking invariants using egg::Analysis-like things, but i never explored it a lot

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:13):

range analysis should definitely be possible to build using such a framework

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:13):

because it requires storing a variable amount of invariants for each node/eclass, which is difficult to manage, allocation-optimization-wise

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:13):

and it uses fourier-motzkin elimination to check the invariants, which is probably possible in a constructor, but it needs more thought

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:15):

if it doesn't explode the number of invariants kept, it's probably possible to store just an id into a bump-allocator or something

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:20):

hmm, I guess I've envisioned range analysis as something much simpler: if the analysis domain allows just a single integer range (open-ended or closed on both sides) then it's constant space per value, no?

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:20):

my concern is mainly that it might widely increase memory usage as optimization proceeds, because data is associated with every eclass, so you may end up having to allocate a lot of invariants, especially if you support a variable amount of range invariants per class

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:21):

this seems sufficient for something like eliminating overflow cases: if we can carry the fact that x \in [0, 4095) or whatever then we can use that, and we can propagate these through a number of binary operators too

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:22):

Yeah that's true, a simple single-range system might work fairly well as a starting experiment

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:22):

anything that involves solving more complex systems (variable number of constraints etc) I think would probably not be practical in our compilation-time envelope, though I would be curious to see what use-cases/applications of that you had in mind

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:22):

Yeah i think it's probably out of scope for cranelift

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:23):

Im writing a shader language, and naturally, the core of optimizing shaders is optimizing the math to hell and back, which is why im exploring e-graphs, since from simple experiments, they work unbelievably well for this task, i had this wild optimization my experiment did a while back:

2.0 - pow(x * 1.0 * -1.0, 2.0) + (y / z + 3.0) ----> fma(x, neg(x), fma(y, recip(z), 5))
Cost: 26.5 -> 12

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:24):

However, one intricacy about shader optimization is that some transforms only work on normalized vectors/floats, so i must track that invariant somehow

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:24):

interesting, that's pretty cool

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:25):

this is definitely a case where more expensive analysis is warranted

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:25):

i think a single range associated with each eclass may work, i probably won't have indexing conditions that require disjoint ranges

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:26):

Chris Fallin said:

this is definitely a case where more expensive analysis is warranted

To some degree i actually think using e-graphs vastly simplifies my job, since the e-graph can propagate invariants for me really easily. I just have to store the invariants efficiently and write rules that check them

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:28):

though i have an RVSDG-based IR also, so im not solely working off the e-graph, which allows me to do some analyses much easier, to then propagate those results into the egraph

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:31):

another analysis which might work very well with egraphs is scalar evolution (thats what llvm calls it), which builds a "chain of recurrences" to describe how a loop calculates its next result, i.e. how it evolves. An e-graph that knows about such info might be able to make really good inferences about moving out or transforming things inside the loop

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:31):

yup, that's a really good point

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:33):

its probably also possible to piggyback off the CR for LICM, any recurrence not directly associated with an induction variable can be moved out

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:34):

not sure if you are familiar with chains of recurrences, but this article breaks it down fairly well https://kristerw.blogspot.com/2019/04/how-llvm-optimizes-geometric-sums.html

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:35):

i'm unsure of how expensive of an analysis it actually is however

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:35):

yep, I've looked at the general idea before. there's certainly a lot we could do here

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:36):

one thing I want to be a little careful of avoiding is actually introducing loops into the egraph; right now blockparams (hence part of the path for any recurrence) are terminals (nodes without children)

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:36):

Yeah, i imagine they play a somewhat significant part in performance for some programs

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:36):

I see

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:36):

but perhaps there's a way of giving rules access to preds/inputs without actually wiring up the dataflow (a separate extractor, etc)

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:36):

could loops perhaps be special Loop nodes?

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:36):

so, this, too, "needs more thought" (sorry that's becoming my frequent refrain :-) )

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:37):

that gets hairy really quickly. I looked at more structured forms of graph IR (RVSDGs etc) but the issue is going into and out of it from the CFG world on either end

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:37):

hehe, yeah, i feel like this is mostly uncharted waters

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:37):

hmm, yeah

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:38):

so for now I keep the CFG in the "side-effect skeleton" and avoid completely any questions of reasoning about CFG-related patterns, but perhaps there's something we could do

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:38):

at least an overlay (a node representing "this is a loop and these are the recurrent inputs") that doesn't feed into the lowering-back-out but can be used to match

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:38):

couldn't loop outputs be treated as opaque variables? like variables used in more than one place

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:39):

inputs you mean? or specifically outputs?

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:40):

uhhh, mostly outputs, but maybe it could apply to inputs too? idk haha

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:40):

kind of like how RVSDG has input and output edges for loop nodes

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:41):

in general, what are the issues with treating structured control flow as special nodes if you know they fit a particular control flow pattern like looping or switching/branching?

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:41):

I guess the main issue I was trying to avoid was fitting all control flow into that form; we have to be able to accept arbitrary CFGs as input and I don't want a Relooper-like algorithm

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:42):

Yeah absolutely

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:42):

so then there's a question of how to mix the two; an overlay node for informational purposes only seems totally reasonable, as long as the control flow also exists in the current blockparam nodes and side-effecting skeleton

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:43):

Yeah it needs some way of combining both unstructured control flow, and structured control flow known to fit some pattern

view this post on Zulip Chris Fallin (Aug 05 2022 at 17:45):

ok, I am going to disappear from this conversation for a bit to get some hacking done :-) but thanks for all the ideas, I'll keep chewing on this

view this post on Zulip Riccardo D'Ambrosio (Aug 05 2022 at 17:45):

bye :wave:

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 16:49):

noticed PR seems to now support inst nodes, i'll try to add the div/rem transforms now

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 16:51):

most of the hard work seems to already be done from simple_preopt so i hope it won't be too bad

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 16:51):

also, has there been any thought on being able to mark certain ISLE rules as bidirectional? it's helpful for some rules

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 16:52):

for example, x * 2 <---> x >> 1 which could help in discovering other transforms based on the mul

view this post on Zulip Chris Fallin (Aug 08 2022 at 16:53):

the bidirectional thing is interesting but I would want to be a bit careful with this as it would be easy to produce a lot of blowup

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 16:53):

Yes absolutely, from my experiments it seems that rules that allow the optimizer to create random terms are really bad, not sure what i expected tho...

view this post on Zulip Chris Fallin (Aug 08 2022 at 16:54):

in general I think I need to nail down some "rules for rules" more precisely -- something like layering for forward-progress guarantees

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 16:54):

Yeah totally

view this post on Zulip Chris Fallin (Aug 08 2022 at 16:54):

I've found it not too hard to accidentally create a generative loop, and because we don't do full eq-sat we end up creating an infinite chain of union nodes instead

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 16:54):

associativity/commutativity rules are an interesting case specifically

view this post on Zulip Chris Fallin (Aug 08 2022 at 16:55):

but in limited cases ("either of these two nodes creates the other automatically") I agree some sort of syntax sugar for one rule to become two might be useful

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 16:55):

Yeah, and you dont do exponential backoff either, so it's a recipe for disaster haha

view this post on Zulip Chris Fallin (Aug 08 2022 at 16:55):

associativity/commutativity scare me a bit because it's easy to get combinatorial blowup

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 16:55):

yeah i never really decided on an approach to standardize how they would work so they dont end up dominating rewrite time

view this post on Zulip Chris Fallin (Aug 08 2022 at 16:56):

so for now what I'm doing is having two forms of each rule (in cprop for example) for commutativity, and some rules that try to reassociate in certain cases for cprop and licm

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 17:00):

makes sense, although i think such rewrite rules are very helpful once u have simplifications that can take multiple terms, such as FMA (a * b + c -> fma(a, b, c)). Though this might not be a problem for cranelift

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 17:00):

maybe doing some sort of marking to avoid one associative rewrite from happening infinitely many times, idk

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 17:05):

This is another problem with bidirectional rules, something like an FMA rewrite can result in infinite loops. And im not quite sure that its possible to avoid such a problem without having a way to make the rewriter realize that it already went through such a rewrite

view this post on Zulip Chris Fallin (Aug 08 2022 at 17:07):

right, the interning/dedup is important there. over the weekend I did some hacking on the aegraph implementation to do hash/eq on nodes based on a "canonical id" which is different from the "latest id" (pointing to the whole tree of union nodes) that should help in that regard...

view this post on Zulip Chris Fallin (Aug 08 2022 at 17:08):

one still has to be careful about truly generative rules of course

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 17:08):

Yeah

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 17:09):

it might also be possible to order the rules in a way where actually "meaningful" rewrites are tried first, but doing that automatically seems challenging

view this post on Zulip Chris Fallin (Aug 08 2022 at 17:13):

yeah, what I'm finding is that there's a good amount of tweaking/engineering that goes into this if one wants to make it performant on par with a handwritten pass

view this post on Zulip Chris Fallin (Aug 08 2022 at 17:13):

it's still better than actually writing a pass by hand! but I'm thinking we'll want to invest in tooling to show what's going on maybe, or at least write down some guidelines

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 17:18):

yeah, especially with no full eq-sat it might be easy for a contributor to accidentally create hard to debug infinite loops

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 20:03):

Another random thought i had, e-graphs might make it really easy to implement at least basic SLP vectorization, assuming the optimizer can figure out how to potentially reorder statements to allow this to happen

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 20:20):

Hmm seems like inst nodes still don't match, i think the issue now is that there is only a single constructor, and that is for pure enodes. Though im unsure of how to solve this since constructing an inst node requires a sourceloc as well, so the sourceloc value has to be either passed transparently through the rewrite somehow, or matched in the rule

view this post on Zulip Chris Fallin (Aug 08 2022 at 20:25):

Yeah, you're hitting a very "under construction" bit of code again

view this post on Zulip Chris Fallin (Aug 08 2022 at 20:26):

right now my focus has been on pure nodes as reasoning about merging and code motion with side-effects is quite a bit more complex

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 21:28):

makes sense

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 21:29):

i kind of wonder if it might be possible to do something similar to what RVSDG does for solving the side effect problem with graphs

view this post on Zulip Riccardo D'Ambrosio (Aug 08 2022 at 21:29):

where you have edges for every side-effectful op and they connect together to form an ordering constraint for what ops must run before/after

view this post on Zulip Jamey Sharp (Aug 10 2022 at 01:32):

I'm late to the conversation but Riccardo, your project sounds cool! I did experiments a few months ago on manipulating RVSDGs in e-graphs (https://github.com/jameysharp/optir), if you want to see a way you can do full control-flow inside your e-graph. Cranelift isn't going that direction because the trade-offs are different but I bet it'd expose more opportunities in a shader compiler. regarding the earlier conversation about inlining: I found it's especially neat in e-graphs because you can optimize call sites based on information you learn from inlining, but then still decide not to inline; that doesn't change the fact that Cranelift isn't designed for that, but I think it makes the potential future gains from this work bigger.

Compiler optimizer for arbitrary control flow based on equality saturation - GitHub - jameysharp/optir: Compiler optimizer for arbitrary control flow based on equality saturation

view this post on Zulip Riccardo D'Ambrosio (Aug 10 2022 at 03:12):

@Jamey Sharp oh hey i read optir a lot to check out ideas! awesome. I was a little confused in some areas but i got the general gist and it inspired me to try RVSDG + egraphs even more. Im interested in e-graphs mostly for superoptimizing math, since shaders are just blobs of math, and float math is criminally underoptimized, especially trigonometry. Inlining is something i'd like to explore a bit more fine-grained, my current (and potentially final) approach is to just recursively inline everything and optimize a single graph for each shader entry point. Reason being that shaders don't exactly have true function calls, and literally every single gpu compiler on the planet inlines everything it can get away with, even CUDA compilers. Especially when it comes to math and optimizing math across functions. But maybe it would be cool to let e-graphs drive cross-function discovery to choose whether to inline or decide to treat it as "opaque" for optimization purposes

view this post on Zulip Riccardo D'Ambrosio (Aug 10 2022 at 03:15):

the timing of cranelift's e-graphs RFC could not have been more perfect :smile:

view this post on Zulip Riccardo D'Ambrosio (Aug 10 2022 at 03:17):

in particular i think RVSDG + e-graphs could help tremendously for cases where the ordering of actual operations doesnt matter, only how they are connected together

view this post on Zulip Riccardo D'Ambrosio (Aug 10 2022 at 03:18):

and once u think about it, this fits so many optimizations, even something pretty complex like superword level parallelism

view this post on Zulip Riccardo D'Ambrosio (Aug 10 2022 at 03:19):

so i think it's the perfect combination once it can be explored more in depth, RVSDG excels at the "large bits", analyzing how things connect, things like that. And e-graphs excel at discovering the smaller bits and how they can be potentially rewritten to be better

view this post on Zulip Jamey Sharp (Aug 10 2022 at 19:40):

I think there are also interesting opportunities with noticing that different expressions of control-flow are equivalent. I didn't previously have any intraprocedural examples where e-graphs help with that, but the link you shared last week about "How LLVM optimizes power sums" is a good one: for small loop counts, the closed-form loop-free version may take longer than the simpler loop. doing that transformation on an e-graph means you can defer the decision about whether it's profitable until you've reached saturation, at which point your cost function may have more information to work with. like with inlining, it also allows surrounding code to be improved based on information learned from the power-sum analysis, even if the later decision is to emit a loop after all.
and yeah, I agree that there are a ton of interesting directions to go, various kinds of parallelism definitely included! I think the general criterion for "would an e-graph help here" is: are you sure this transformation is an improvement in all cases? if not, an e-graph is great, and I think most optimizations have this kind of uncertainty. (although I think it'd be nice if egg provided some better support for pruning alternatives if you do know they're always worse; the way egg's examples do constant-folding seems like it's a foot-gun without some more restricted API.)
on the other hand I think you're right that for shaders you should just always inline everything.

view this post on Zulip Riccardo D'Ambrosio (Aug 10 2022 at 21:03):

Yeah, shaders in particular make it easier to do certain things, for example, its rarely not profitable to unroll a loop unless its a really big loop.


Last updated: Nov 22 2024 at 16:03 UTC