Stream: rfc-notifications

Topic: rfcs / issue #27 Cranelift: Using E-Graphs for Verified, ...


view this post on Zulip RFC notifications bot (Jun 30 2022 at 04:01):

cfallin commented on issue #27:

cc @fitzgen @jameysharp @elliottt @abrown @avanhatt @mlfbrown

view this post on Zulip RFC notifications bot (Jun 30 2022 at 04:01):

cfallin commented on issue #27:

Current prototype is bytecodealliance/wasmtime#4249; it implements the e-graph roundtripping but not (yet) the rewriting described in this RFC.

view this post on Zulip RFC notifications bot (Jul 01 2022 at 18:35):

fitzgen commented on issue #27:

For example, the https://github.com/bytecodealliance/wasmtime/pull/1647 tooling included tools to "harvest" potential patterns to rewrite from real programs, and then use the superoptimization engine Souper to find rewrite rules offline. We could revisit such tooling and use it to build a significantly more effective body of rules semi-automatically.

FWIW, the harvester still exists and is independent of Peepmatic, it just extracts Souper IR from CLIF.

The only tool we are missing would be a Souper IR to ISLE(CLIF) translator for Souper's synthesized optimizations back to ISLE/this new framework.

view this post on Zulip RFC notifications bot (Jul 01 2022 at 19:05):

philzook58 commented on issue #27:

This is incredibly cool work and I am very interested in following it! I also have appreciated your very thorough blog posts in the past. I'm seeing if it useful for people to have a place for egraph discussion, so I made a nascent but fairly quiet so far zulip. https://egraphs.zulipchat.com/join/q7bjayghbdw66xsnzgwc2hjb/

view this post on Zulip RFC notifications bot (Jul 01 2022 at 19:10):

fitzgen commented on issue #27:

Super happy with how simple and obvious-after-the-fact your proposed e-graph destruction algorithm is (cycle breaking + scoped elaboration).

view this post on Zulip RFC notifications bot (Jul 01 2022 at 19:13):

fitzgen commented on issue #27:

We will thus need a system of "optimization fuel" that limits rewrite steps overall and possibly per-rule or per-eclass. It is not yet certain what policies will work best; this is a question that we will need to experiment with. We simply wish to acknowledge now that it will be a problem that we will need to solve.

Note that egg already has a few knobs we can experiment with before trying to create our own custom thing here. In addition to perform-at-most-N-rewrite-iterations, you can also give it optional limits on the number of e-classes and e-nodes, and halt equality saturation once those are reached. I think the latter will be important for limiting the e-graph's memory overhead.

view this post on Zulip RFC notifications bot (Jul 01 2022 at 19:18):

mwillsey commented on issue #27:

Hey all, egg author here. Very, very exciting stuff ahead! I will dive into the RFC soon and try to grok the details. I'm happy to help out as well!

view this post on Zulip RFC notifications bot (Jul 01 2022 at 19:32):

fitzgen commented on issue #27:

How will e-graph-based optimization affect compile time?

Have you run spidermonke.wasm through sightglass with and without your prototype that just does round tripping through an e-graph? That could give us a tiny taste of what compile time overheads might look like (also I imagine you'd disable the old GVN when it is subsumed by the e-graph round trip). I'd also expect that, in theory, execution time of the generated code would be unchanged, but that would also be a good thing to empirically check.

view this post on Zulip RFC notifications bot (Jul 01 2022 at 19:32):

fitzgen edited a comment on issue #27:

How will e-graph-based optimization affect compile time?

Have you run spidermonkye.wasm through sightglass with and without your prototype that just does round tripping through an e-graph? That could give us a tiny taste of what compile time overheads might look like (also I imagine you'd disable the old GVN when it is subsumed by the e-graph round trip). I'd also expect that, in theory, execution time of the generated code would be unchanged, but that would also be a good thing to empirically check.

view this post on Zulip RFC notifications bot (Jul 01 2022 at 19:33):

fitzgen edited a comment on issue #27:

How will e-graph-based optimization affect compile time?

Have you run spidermonkey.wasm through sightglass with and without your prototype that just does round tripping through an e-graph? That could give us a tiny taste of what compile time overheads might look like (also I imagine you'd disable the old GVN when it is subsumed by the e-graph round trip). I'd also expect that, in theory, execution time of the generated code would be unchanged, but that would also be a good thing to empirically check.

view this post on Zulip RFC notifications bot (Jul 01 2022 at 19:41):

cfallin commented on issue #27:

How will e-graph-based optimization affect compile time?

Have you run spidermonkey.wasm through sightglass with and without your prototype that just does round tripping through an e-graph? That could give us a tiny taste of what compile time overheads might look like (also I imagine you'd disable the old GVN when it is subsumed by the e-graph round trip). I'd also expect that, in theory, execution time of the generated code would be unchanged, but that would also be a good thing to empirically check.

Nope, not yet, but that's next on my plate! Before doing measurements I want to do some basic optimizations to the scheme that I know how to do -- mainly, reuse existing CLIF insts rather than repopulating -- which should dramatically lower overhead. If with that we are "at parity" or "not much worse" than LICM (assuming LICM tweaks here) + GVN today, then I'd be very happy, but I guess how much overhead to accept there (in exchange for having an optimization framework) is another good discussion point.

view this post on Zulip RFC notifications bot (Jul 01 2022 at 20:35):

fitzgen commented on issue #27:

I feel like this RFC is a little different from many of our other RFCs. Usually
we know we can implement the thing, and the RFC process is more about API design
and determining whether to implement the thing as described or in some
alternative way. Merging most RFCs is essentially committing to "yes we will do
this". This RFC, on the other hand, has relatively large open questions around
compilation overhead and whether we will actually solve phase ordering the way
we believe we should in theory. I think merging this RFC doesn't necessarily
mean "yes we will do this" because we still aren't 100% sure we can do
this. We think we can, but we need to actually build more to prove it out. So
I think merging this RFC should mean something more like "we will start landing
(cfg'd off) code in main and when we (hopefully) reach X, Y, and Z
acceptance criteria, we will stop hiding the code behind cfgs and remove the
old, non-egraph code paths."

And I would love this RFC to define and document our ahead-of-time consensus on
what exactly those acceptance criteria are so that we don't have moving goal
posts later on down the line when we are deciding whether to actually, fully,
100% commit to the e-graphs mid-end framework or not.


Just to spark discussion, I'll throw some stuff out there:

What do other Cranelift stakeholders think our acceptance criteria should be?

Does compile time/space and code quality cover everything we need to keep an eye
on as far as acceptance quality goes or am I missing something else?

view this post on Zulip RFC notifications bot (Jul 01 2022 at 20:46):

cfallin commented on issue #27:

These are great questions, thanks @fitzgen -- I should've included more details around what I think the actual decisions should be (now and later)!

I actually am somewhat optimistic that we're close (a-few-weeks-to-a-month-ish?) to a point where we can evaluate the prototype plus basic rewriting rules that have functional parity to today's GVN, LICM, and simple_preopt, and make a decision with this RFC (merge or not) that means "we will actually do this". And to be clear I do want to get that far in the prototype before I personally advocate for merging; I want to ensure there are no unknown-unknowns in the rewrite step. I'd personally favor an acceptance criteria that is something like:

  1. Replicates GVN, LICM, and simple_preopt (and clear path for alias analysis);
  2. Is clearly an improvement in ease of contributing further optimizations;
  3. Meets some compile-time threshold.

Open discussion how we balance items 2 and 3 above, but I hope we will have minimal overhead when we have only today's functionality. That would mean that additional compile-time cost is only a factor when actually adding interesting additional rules/optimizations.

On the flipside, I don't know if it necessarily makes sense to have a "something must speed up on initial merge" criterion, though I'm open to that if a bunch of other folks disagree! The reason is that I think once we have (i) a framework and (ii) a replica of today's rewrites, the future upside is clear enough that the work to come up with those additional optimizations can happen as a second step. But, I can see how we might want to require such a speedup if the compile-time overhead is also non-negligible.

Anyway all this is open to a bunch of discussion and, on that note, I'll put an item on the agenda for our next biweekly :-)

view this post on Zulip RFC notifications bot (Jul 01 2022 at 21:13):

cfallin edited a comment on issue #27:

These are great questions, thanks @fitzgen -- I should've included more details around what I think the actual decisions should be (now and later)!

I actually am somewhat optimistic that we're close (a-few-weeks-to-a-month-ish?) to a point where we can evaluate the prototype plus basic rewriting rules that have functional parity to today's GVN, LICM, and simple_preopt, and make a decision with this RFC (merge or not) that means "we will actually do this". And to be clear I do want to get that far in the prototype before I personally advocate for merging; I want to ensure there are no unknown-unknowns in the rewrite step. I'd personally favor acceptance criteria that is something like:

  1. Replicates GVN, LICM, and simple_preopt (and clear path for alias analysis);
  2. Is clearly an improvement in ease of contributing further optimizations;
  3. Meets some compile-time threshold.

Open discussion how we balance items 2 and 3 above, but I hope we will have minimal overhead when we have only today's functionality. That would mean that additional compile-time cost is only a factor when actually adding interesting additional rules/optimizations.

On the flipside, I don't know if it necessarily makes sense to have a "something must speed up on initial merge" criterion, though I'm open to that if a bunch of other folks disagree! The reason is that I think once we have (i) a framework and (ii) a replica of today's rewrites, the future upside is clear enough that the work to come up with those additional optimizations can happen as a second step. But, I can see how we might want to require such a speedup if the compile-time overhead is also non-negligible.

Anyway all this is open to a bunch of discussion and, on that note, I'll put an item on the agenda for our next biweekly :-)

view this post on Zulip RFC notifications bot (Jul 01 2022 at 21:13):

cfallin edited a comment on issue #27:

These are great questions, thanks @fitzgen -- I should've included more details around what I think the actual decisions should be (now and later)!

I actually am somewhat optimistic that we're close (a-few-weeks-to-a-month-ish?) to a point where we can evaluate the prototype plus basic rewriting rules that have functional parity to today's GVN, LICM, and simple_preopt, and make a decision with this RFC (merge or not) that means "we will actually do this". And to be clear I do want to get that far in the prototype before I personally advocate for merging; I want to ensure there are no unknown-unknowns in the rewrite step. I'd personally favor acceptance criteria that are something like:

  1. Replicates GVN, LICM, and simple_preopt (and clear path for alias analysis);
  2. Is clearly an improvement in ease of contributing further optimizations;
  3. Meets some compile-time threshold.

Open discussion how we balance items 2 and 3 above, but I hope we will have minimal overhead when we have only today's functionality. That would mean that additional compile-time cost is only a factor when actually adding interesting additional rules/optimizations.

On the flipside, I don't know if it necessarily makes sense to have a "something must speed up on initial merge" criterion, though I'm open to that if a bunch of other folks disagree! The reason is that I think once we have (i) a framework and (ii) a replica of today's rewrites, the future upside is clear enough that the work to come up with those additional optimizations can happen as a second step. But, I can see how we might want to require such a speedup if the compile-time overhead is also non-negligible.

Anyway all this is open to a bunch of discussion and, on that note, I'll put an item on the agenda for our next biweekly :-)

view this post on Zulip RFC notifications bot (Jul 06 2022 at 20:28):

cfallin commented on issue #27:

So I did some quick-and-dirty compile-time measurements. On SpiderMonkey.wasm, the overhead of roundtripping through the egraph adds about 23% to compile time (when opts are enabled). All current handwritten opts account for about 10% of compile time.

That's not terrible (not a doubling, for example) but it's not great either, so I dug into which parts were slow. It turns out that the extraction and elaboration take very little time (less than 10% of the egraph addition); most of the time is in the initial egraph build.

I suspect that we'll need to do some careful data-structure engineering to get things within acceptable overheads, but fortunately it looks like there's some relatively low-hanging fruit. Right now in egg the EClass struct keeps two Vecs, one for nodes and one for parents; the Id-to-EClass map is a HashMap while I would expect the Id space to be somewhat dense (maybe this is to clear out eclasses that have been merged?); and the Node interface forces us to keep args internally in separate lists (I use SmallVec currently for this). All of this could be refactored to use central Vecs in a pseudo-arena way (like Cranelift does for CLIF args/results currently), at the cost of passing around the "backing Vec".

I'll work on that for a bit, then return to hacking on the ISLE integration; once I prove out both of these, I'm happy to make a bunch more measurements!

view this post on Zulip RFC notifications bot (Jul 06 2022 at 21:03):

mwillsey commented on issue #27:

@cfallin Awesome! At first glance, 23% isn't so bad, and I think there's you're right: a lot that could be done to get that down. Some quick points:

Also, the way the e-graph is being used in the prototype is basically a worst-case scenario for performance; for every Func, you build an e-graph, do e-graph stuff, extract and throw away. One big way we've improved performance in other egg applications is by batching many optimization queries at once. That would obviously require re-architecting the way egg and clif interact, which is probably something to do down the road.

Overall, I think this means that there are some serious "high-level" ways to improve performance as well as the lower-level ways you mentioned.

view this post on Zulip RFC notifications bot (Jul 06 2022 at 23:08):

cfallin commented on issue #27:

Thanks @mwillsey! By the way I'm planning to continue hacking on my fork of egg but I plan to upstream everything in due time, once plans are stable...

Re: this point:

Also, the way the e-graph is being used in the prototype is basically a worst-case scenario for performance; for every Func, you build an e-graph, do e-graph stuff, extract and throw away. One big way we've improved performance in other egg applications is by batching many optimization queries at once. That would obviously require re-architecting the way egg and clif interact, which is probably something to do down the road.

I want to dig a bit deeper: it seems you're saying that one way of tolerating overheads is to amortize the construction and extraction over more use of the egraph. I guess that's true in a "fraction of time spent in construction+extraction" sense, but I'm thinking in an absolute sense here: the time spent building the egraph and extracting from it, with no work between the two, is the bare minimum in additional overhead, no? At least as long as we have a separate egraph per function, and given that the overall problem we are solving is "compile this CLIF to machine code"? Or it's possible I'm missing some other way of batching or amortization that you mean to suggest...

view this post on Zulip RFC notifications bot (Jul 06 2022 at 23:12):

cfallin commented on issue #27:

Oh also a quick calibration point re:

At first glance, 23% isn't so bad

In Cranelift context this would be a huge deal actually: the whole regalloc rewrite (0.5 engineer-years) bought us about 20% compile-time improvement, and I fight to not give up 0.5% regularly. This is an open question above but I'll venture a ballpark number on this now just to calibrate (or at least give my own opinion): maybe 5% would be OK, but 20% would require a large increase in generated-code quality to justify, I think.

view this post on Zulip RFC notifications bot (Jul 06 2022 at 23:40):

mwillsey commented on issue #27:

I'm planning to continue hacking on my fork of egg

Makes perfect sense. I'm happy to help hack on the fork if that's useful.

as long as we have a separate egraph per function

As long the e-graph optimization workflow is per-function, you can't really amortize. The batching I'm talking about is across functions: there's nothing stopping you from adding 1000 functions to the same e-graph (hopefully there's some sharing in there!) and optimizing them simultaneously. I suspect that even just adding 1000 functions to one e-graph is faster than adding 1000 functions to 1000 e-graphs because there is likely some sharing, but I could be wrong here.

Obviously one e-graph per function is cleaner for how the optimization workflow is right now. It just comes with the unfortunate overhead of basically copying your whole program twice: once into the e-graph and once out. Even outside of batching, another way of improving performance would be to eliminate one or both of these copies by increasing the part of the pipeline where you use e-graphs, ideally to some other "translation" boundary where you'd have to lower to another IR anyway. This is clearly a big task, though.

view this post on Zulip RFC notifications bot (Jul 06 2022 at 23:49):

cfallin commented on issue #27:

That's a really interesting idea; we can certainly play with it at some point!

We currently have a hard boundary between each function compilation, in order to allow for parallel compilation, but we'll need to break that boundary (optionally) eventually for inlining as well. And, actually, now that these two ideas are next to each other: putting the inlining and inlined function in the same egraph, linking up the dataflow, and then elaborating out of that is a really nice clean way of doing inlining. So, you've spawned some thoughts here, thanks!

One potentially complicating factor toward deduplicating efficiency between function bodies is that the "roots" (terminal nodes) will differ: both f and g may contain complex_expr(x, y, z) but where x, y, and z are different block params, or side-effecting ops (calls, loads), or whatever, in the two functions. I wonder if there is some sort of canonicalization that could help (in the extreme case this feels like a graph isomorphism problem though). In the egraph literature I mostly see expressions in terms of variable terminals x, y, etc and so this doesn't seem to be as big a deal. Any thoughts?

view this post on Zulip RFC notifications bot (Jul 08 2022 at 06:36):

cfallin commented on issue #27:

A brief update: I hacked up a custom egraph implementation designed to be more memory-efficient and allocation-light, and got the overhead down to 9%.

The key differences are in how it avoids ever cloning Nodes, using indexmap to allow both node-to-ID dedup and ID-to-node access and using these IDs in eclasses' list of nodes (the use of IDs to name nodes is itself different from egg too); and using Cranelift's EntityList to store lists of node IDs and eclass IDs in pooled storage; and using bumpalo (an arena allocator) to allocate the argument lists of eclass IDs. All of this is to implement the principles (i) O(1) memory allocations, i.e. we should grow single large containers rather than have nested containers per node or eclass (entity-component-system style) and (ii) avoid copies and store nodes only once. There's a little more we could do in theory with a custom hashmap that doesn't require Hash+Eq (namely, allows us to provide context for those; that would allow us to provide the backing pool separately and not store a fat-pointer slice for args and types in every Node, but only a u32 index).

I haven't built the merging or rule-application bits yet but the roundtripping (without any rule applications) actually never merges eclasses so this is enough to evaluate the overhead of this part of the pipeline.

Given this, and given that GVN+LICM add around 10% to compile time, I'm actually cautiously optimistic that we can slip into the hole left by removing these stages and be almost neutral when no rewrite rules are applied. That's the ideal case in my mind. Or at least have small enough overhead that this roundtrip is always on when opts are on. Later we can remove half the roundtrip by generating egraph nodes straight from the frontend/builder API, as noted in the last section of the RFC (and as you note as well @mwillsey) for further savings, but for now anchoring both ends of the transform in CLIF (which can be printed, manipulated, used for testing, etc) has substantial pragmatic benefits I think.

(I'm not sure if the path forward is to actually use cranelift-egraph or egg with a bunch of hacks; as a general principle I don't like to go all NIH and hack up my own versions of things; but the data structures and tradeoffs are different enough that this may be the right way, I'm not sure. Right now I just want to pathfind wrt "is this performance profile possible".)

After dropping this tidbit I'm now promptly going to disappear for a week of PTO (...sorry!) but wanted to get that data point out first. More later!

view this post on Zulip RFC notifications bot (Jul 08 2022 at 08:17):

bjorn3 commented on issue #27:

There's a little more we could do in theory with a custom hashmap that doesn't require Hash+Eq (namely, allows us to provide context for those; that would allow us to provide the backing pool separately and not store a fat-pointer slice for args and types in every Node, but only a u32 index).

hashbrown's RawTable might help with that.

view this post on Zulip RFC notifications bot (Jul 18 2022 at 03:47):

cfallin commented on issue #27:

There's a little more we could do in theory with a custom hashmap that doesn't require Hash+Eq (namely, allows us to provide context for those; that would allow us to provide the backing pool separately and not store a fat-pointer slice for args and types in every Node, but only a u32 index).

hashbrown's RawTable might help with that.

Indeed! I pushed a new implementation that does just that over here. (That could probably use a lot of cleanup, and is a bit lifetime-heavy, but proves out what I wanted to!) Thanks for the pointer to that.

view this post on Zulip RFC notifications bot (Aug 01 2022 at 20:17):

RDambrosio016 commented on issue #27:

This RFC is very interesting to me, i plan to use e-graphs too for a large part of the optimizer for a math-heavy language im writing. I considered using ISLE as well for writing patterns, i think it would work pretty well. However ran into a couple of open questions i could not resolve:

But overall this approach seems promising, and it would be awesome to share rulesets/ideas with cranelift.

view this post on Zulip RFC notifications bot (Aug 01 2022 at 20:19):

RDambrosio016 edited a comment on issue #27:

This RFC is very interesting to me, i plan to use e-graphs too for a large part of the optimizer for a math-heavy language im writing. I considered using ISLE as well for writing patterns, i think it would work pretty well. I however ran into a couple of open questions i could not resolve:

But overall this approach seems promising, and it would be awesome to share rulesets/ideas with cranelift.

view this post on Zulip RFC notifications bot (Aug 01 2022 at 22:54):

mwillsey commented on issue #27:

Hi @RDambrosio016,

egg doesn't use trie-based pattern matching, so you'd essentially have to re-implement e-matching to support this. I believe Z3 does do something like this (Efficient E-matching for SMT Solvers, section 4). We prototyped it at some point, and found it was slower because the rules we were using didn't significantly "overlap". In a high-performance setting, what I would try before tries is just running rules using the SimpleScheduler, but with a fixed, small number of iterations.

As far as using Analysis for tracking invariants, sure! Just be careful, an analysis is per e-class, so it depends on your notion of equivalence.

view this post on Zulip RFC notifications bot (Aug 01 2022 at 23:14):

RDambrosio016 commented on issue #27:

Interesting, i did a small experiment generating rules with my own DSL, and, while decently fast, it definitely seems like egg spends a lot of its time on rules that could never match if you have a ruleset that is somewhat larger as a consequence of duplicating rules for numerous numeric types. In this particular ruleset i have ~61 unique rules (including bidirectionality), which after being duplicated for multiple numeric types, result in 240 total rules, and this is how long egg takes on each step:

Number of rules: 240
Time from runner creation: 2.6857ms

Runner report
=============
  Stop reason: Saturated
  Iterations: 7
  Egraph size: 148 nodes, 23 classes, 177 memo
  Rebuilds: 4
  Total time: 0.0026048
    Search:  (0.82) 0.0021298000000000003
    Apply:   (0.17) 0.0004338
    Rebuild: (0.02) 0.0000397

2.0 - pow(x * 1.0 * -1.0, 2.0) + (y / z + 3.0) ----> fma(x, neg(x), fma(y, recip(z), 5))
Cost: 26.5 -> 12

This is on release with 1 CGUs and LTO on, on an i9-10850k. Search time making up the bulk of the time is not unexpected, but to me it seemed a little bit high considering how small the expression is. So i re-tried it but without duplicating rules, and i got these results:

Number of rules: 50
Time from runner creation: 1.0181ms

Runner report
=============
  Stop reason: Saturated
  Iterations: 7
  Egraph size: 148 nodes, 23 classes, 177 memo
  Rebuilds: 4
  Total time: 0.0009838
    Search:  (0.66) 0.0006475000000000001
    Apply:   (0.30) 0.0002992
    Rebuild: (0.04) 0.000035999999999999994

2.0 - pow(x * 1.0 * -1.0, 2.0) + (y / z + 3.0) ----> fma(x, neg(x), fma(y, recip(z), 5))
Cost: 26.5 -> 12

So it definitely seems like hierarchical matching of some sort would be required to avoid spending a ton of time searching the graph. It's odd that your experiment in trie matching ended up being slower, was it using generated rust code like ISLE creates, or was it dynamically matched at runtime?

view this post on Zulip RFC notifications bot (Aug 01 2022 at 23:26):

cfallin commented on issue #27:

@RDambrosio016 I'd be happy to talk more about sharing efforts once the Cranelift work here is more mature; for now there is still exploration going on. I suspect that a lot of what we do is going to be fairly Cranelift-specific: there is a large semantic gap between "x*1 == x" at the conceptual level, and the particular matchers, node formats, and idiosyncrasies of the framework. My experience in the past is that trying to paper over such details to give a nice high-level view can sometimes be counterproductive because it either creates subtle bugs, or sweeps efficiency concerns under the rug. But once we've built more, perhaps there's something to be used for your effort too :-)

In the meantime perhaps it's best if we keep this discussion thread scoped to the Cranelift effort? It's likely to get much longer in time anyway (I have some more updates cooking at the moment) but side-discussions about other projects might pull us a bit too far off course... thanks!

view this post on Zulip RFC notifications bot (Aug 01 2022 at 23:28):

RDambrosio016 commented on issue #27:

Absolutely :+1: , didnt mean to steer it into more minute details, i was just wondering if you considered/discussed this problem in the past, since i looked at the RFC but it did not seem to mention it.

view this post on Zulip RFC notifications bot (Aug 01 2022 at 23:34):

RDambrosio016 edited a comment on issue #27:

Absolutely :+1: , didnt mean to steer it into more minute details, i was just wondering if you considered/discussed this problem in the past, since i looked at the RFC but it did not seem to mention it. Anyways, im excited about this RFC, and interested in maybe contributing to implementing it/making it better.

view this post on Zulip RFC notifications bot (Aug 01 2022 at 23:38):

cfallin commented on issue #27:

For sure, performance is my main concern right now and I'm working to make the whole thing competitive with handwritten optimizations, so you'll here more about that in a bit!

view this post on Zulip RFC notifications bot (Aug 01 2022 at 23:50):

RDambrosio016 commented on issue #27:

Beating hand-written optimizations is definitely an achievable goal, considering the e-graph approach would have a much easier time knowing when to stop optimizing vs a standard pass-based optimizer. It will also be an interesting problem trying to figure out a good cost metric that maximizes generated code quality. Having a complete e-graph to choose from is great, but trying to select the ideal version from that will definitely prove to be a challenge.

view this post on Zulip RFC notifications bot (Aug 03 2022 at 18:35):

cfallin commented on issue #27:

As an update here, on the running prototype/experimental branch here, I've gotten overhead of Cranelift-with-egraphs (including elaboration that subsumes GVN and LICM, and rewrite rules using ISLE that encompass basic const propagation and algebraic simplifications) vs Cranelift-with-handwritten-opts down to ~1% slower on parallel compile time (and actually slightly faster in terms of total CPU time across cores):

cfallin@min:~/work/wasmtime % hyperfine --show-output --  "target/release/wasmtime compile --cranelift-set use_egraphs=true ../wasm-tests/spidermonkey.wasm"
Benchmark 1: target/release/wasmtime compile --cranelift-set use_egraphs=true ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.693 s ±  0.098 s    [User: 10.747 s, System: 0.234 s]
  Range (min  max):    1.621 s   1.951 s    10 runs

cfallin@min:~/work/wasmtime % hyperfine --show-output --  "target/release/wasmtime compile --cranelift-set use_egraphs=false ../wasm-tests/spidermonkey.wasm"
Benchmark 1: target/release/wasmtime compile --cranelift-set use_egraphs=false ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.671 s ±  0.035 s    [User: 10.888 s, System: 0.236 s]
  Range (min  max):    1.606 s   1.707 s    10 runs

(That's measuring on an M1 laptop with SpiderMonkey.wasm, a large Wasm module testcase.)

This is using a new acyclic-egraph approach that I'm calling the "aegraph" (or ægraph for those who love digraphs). I got the conventional egraph rewriting to work last week, and measured it and found compilation time was inflating significantly once rewrites occur -- something like 5x. I then had a series of realizations that the way that we're using egraphs here is more limited than perhaps a conventional full-equality-saturation approach, and had a series of realizations about simplifications we could make.

Basically an ægraph contains eclasses as nodes where each node contains an enode, an enode and a parent-eclass pointer, or two parent-eclass pointers. The union operation joins two eclasses together only by adding nodes (or as an optimization, replacing an enode node with an enode-and-parent node). An eclass node can refer only to earlier eclasses. We maintain the acyclicity invariant when adding enodes and unioning eclasses. Rule application occurs immediately when a new node is added. In a sense, eclasses are split into "levels" and a given level of an eclass refers only to the levels of argument eclasses that existed when it was created. It's thus a persistent immutable data structure. An eclass is represented by the id of its latest version (highest id), whereas the traditional union-find approach tends to push ids downward to the lowest id.

This data structure has some really compelling properties:

The tradeoff is that it's not at all a full equality saturation (see above re: no recanonicalization). But I think that's OK for our use-case: we mainly (i) introduce new expressions then (ii) apply rules that introduce alternate forms of those expressions, and union them together. In this case the ægraph is pretty ideal: it's like a streaming journal of nodes and we add a new node from the build, then add a few more nodes that are equivalent to it and union them in-place, then keep going, referring to the just-built expression by the latest id. So the ægraph is sort of a "multi-version IR" where every value can have multiple representations, combined with GVN.

(The case it misses that a traditional egraph with parent pointers, recanonicalization, and the rebuild step would get is where we introduce some node A that gets a new eclass, a rewrite rule adds a node B and unions it with A, and B happens to dedup to an already-existing eclass. The new ID for A will point to a union node that joins A and B's existing eclass, but other users of B will not see A because it's "newer". This seems to be of limited use for our specific rules because our rewrite rules tend to go from more expensive/complex expressions to simpler ones, so A is likely not better than B. That's not always the case and I'm sure there are cases where a full egraph does do better. But the multiple-version-representation and ability of rules to match on all versions seems like the key thing that we want from the data structure.)

Anyway, I'll write this up more properly but that's where I'm at now: a prototype that works and is fast. I'll add a bit more to it (integration with alias analysis! and other rules I haven't added yet) and then start to look at generated code quality too.

view this post on Zulip RFC notifications bot (Aug 03 2022 at 18:42):

cfallin edited a comment on issue #27:

As an update here, on the running prototype/experimental branch here, I've gotten overhead of Cranelift-with-egraphs (including elaboration that subsumes GVN and LICM, and rewrite rules using ISLE that encompass basic const propagation and algebraic simplifications) vs Cranelift-with-handwritten-opts down to ~1% slower on parallel compile time (and actually slightly faster in terms of total CPU time across cores):

cfallin@min:~/work/wasmtime % hyperfine --show-output --  "target/release/wasmtime compile --cranelift-set use_egraphs=true ../wasm-tests/spidermonkey.wasm"
Benchmark 1: target/release/wasmtime compile --cranelift-set use_egraphs=true ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.693 s ±  0.098 s    [User: 10.747 s, System: 0.234 s]
  Range (min  max):    1.621 s   1.951 s    10 runs

cfallin@min:~/work/wasmtime % hyperfine --show-output --  "target/release/wasmtime compile --cranelift-set use_egraphs=false ../wasm-tests/spidermonkey.wasm"
Benchmark 1: target/release/wasmtime compile --cranelift-set use_egraphs=false ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.671 s ±  0.035 s    [User: 10.888 s, System: 0.236 s]
  Range (min  max):    1.606 s   1.707 s    10 runs

(That's measuring on an M1 laptop with SpiderMonkey.wasm, a large Wasm module testcase.)

This is using a new acyclic-egraph approach that I'm calling the "aegraph" (or ægraph for those who love digraphs). I got the conventional egraph rewriting to work last week, and measured it and found compilation time was inflating significantly once rewrites occur -- something like 5x. I then had a series of realizations that the way that we're using egraphs here is more limited than perhaps a conventional full-equality-saturation approach, and there are simplifications we could make.

Basically an ægraph contains eclasses as nodes where each node contains an enode, an enode and a parent-eclass pointer, or two parent-eclass pointers. The union operation joins two eclasses together only by adding nodes (or as an optimization, replacing an enode node with an enode-and-parent node). An eclass node can refer only to earlier eclasses. We maintain the acyclicity invariant when adding enodes and unioning eclasses. Rule application occurs immediately when a new node is added. In a sense, eclasses are split into "levels" and a given level of an eclass refers only to the levels of argument eclasses that existed when it was created. It's thus a persistent immutable data structure. An eclass is represented by the id of its latest version (highest id), whereas the traditional union-find approach tends to push ids downward to the lowest id.

This data structure has some really compelling properties:

The tradeoff is that it's not at all a full equality saturation (see above re: no recanonicalization). But I think that's OK for our use-case: we mainly (i) introduce new expressions then (ii) apply rules that introduce alternate forms of those expressions, and union them together. In this case the ægraph is pretty ideal: it's like a streaming journal of nodes and we add a new node from the build, then add a few more nodes that are equivalent to it and union them in-place, then keep going, referring to the just-built expression by the latest id. So the ægraph is sort of a "multi-version IR" where every value can have multiple representations, combined with GVN.

(The case it misses that a traditional egraph with parent pointers, recanonicalization, and the rebuild step would get is where we introduce some node A that gets a new eclass, a rewrite rule adds a node B and unions it with A, and B happens to dedup to an already-existing eclass. The new ID for A will point to a union node that joins A and B's existing eclass, but other users of B will not see A because it's "newer". This seems to be of limited use for our specific rules because our rewrite rules tend to go from more expensive/complex expressions to simpler ones, so A is likely not better than B. That's not always the case and I'm sure there are cases where a full egraph does do better. But the multiple-version-representation and ability of rules to match on all versions seems like the key thing that we want from the data structure.)

Anyway, I'll write this up more properly but that's where I'm at now: a prototype that works and is fast. I'll add a bit more to it (integration with alias analysis! and other rules I haven't added yet) and then start to look at generated code quality too.

view this post on Zulip RFC notifications bot (Aug 03 2022 at 19:26):

RDambrosio016 commented on issue #27:

Very interesting! im curious to see if that specific limitation will end up limiting any optimization potential in the future. It may be worth it to try running generated code benchmarks both without rebuilding, and with. Although it's definitely possible that associativity rules will hide that limitation. Nevertheless, the performance on something as large as the spidermonkey wasm file is impressive! Did you happen to measure how much memory the e-graphs/program overall took?

Excited to try out this approach, i was thinking about perhaps working on algebraic identities for mul/div/rem mostly based on this paper. Would you accept a PR to your experimental branch to add such rules?

view this post on Zulip RFC notifications bot (Aug 03 2022 at 23:01):

cfallin commented on issue #27:

Sure; more opts are definitely welcome in a PR, thanks!

I haven't done any more detailed measurements; I'm polishing things wrt integrations now and once I have it in a state where I'm happy, I'll do a full comparison and set of measurements (with Sightglass) on compile- and runtime performance.

view this post on Zulip RFC notifications bot (Aug 10 2022 at 01:10):

cfallin commented on issue #27:

Hi all -- I have some updates. It's kind of a mixed bag -- tl;dr, code quality matches existing Cranelift (all optimizations have been reproduced), compilation speed is ~1-5% slower. Theoretical benefits of everything-in-one-fixpoint (GVN and alias analysis and LICM and algebraic simplifications all cooperating) work in filetests, but don't seem to do much to Wasm-based benchmarks in Sightglass.

So the current status of the WIP branch is:

All this works -- and IMHO the data structures and algorithms are pretty beautiful actually! -- but it's not any faster. It is nicer to update with more rewrite rules -- see cprop.isle and algebraic.isle -- and if it were completely at parity performance-wise, this would be an interesting benefit still IMHO. But there is still a small compile-time penalty, around 5% on average, with everything above incorporated.

I have some more thoughts in my running TODO list but most of the remaining ones are for compile speed, not code quality (runtime speed).

Given all that, I think I'm running up against a wall that's actually mostly supported by some other limits:

The first is just a fact of our benchmarking conditions (but we're oriented this way because we care very much about Wasm performance -- we can't just choose to focus on poorly-optimized CLIF instead and say "see look opts work!"). But the second is where a lot of my thoughts are going lately. In other words, from first principles, if we start with highly-optimized Wasm bytecode, where do we expect inefficiencies and redundancies to creep in? Probably one of the three areas:

My forehead is already dented from banging against item 3 (regalloc quality) for a long time, and item 1 (Wasm to CLIF) is largely addressed already by designing CLIF semantics to be close to those of Wasm; but item 2 (lowering) presents unharvested opportunities IMHO.

In looking at disassemblies of "hot blocks" when profiling runtime I see a lot of not-great code, and it comes in several varieties: redundant flags computations (cmp / setnp / cmp / sete for float comparisons is a common one); redundant trap conditions that cprop could optimize away, or GVN could combine between multiple ops; address-mode junk produced by our "match up the operand tree and then regenerate an add tree for the parts we can't incorporate" logic in both x64 and aarch64; a bunch of "spill x+1, x+3, x+10, x+15 to separate spillslots for use later" (hence my attempt at remat above); and others. A lot of this comes from the inability to optimize code once we lower a CLIF op into multiple MachInsts, or one pseudo-MachInst.

(Those who have been around a while might be screaming: legalization and post-opt! Yes, you're right. I think we threw out some of the value of those in exchange for the speed and simplicity of the single-pass lowering, and the real value that we do get from a two-level IR. But we lost something too.)

So: I'm taking a little time to think about what might make sense as a way to optimize VCode. I don't think this means throwing out the aegraph and ISLE-based mid-end framework; I think instead there may be a way to write some mid-end legalizations that do a lot of expansion that we currently do in lowering, and then let lowering be simpler; or in the extreme, even have a Node::MachInst { ... } in the aegraph, though that has potential efficiency issues. More thought needed.

Anyway, I'd love to hear more thoughts on this, and we can talk about it more in the next Cranelift meeting, but there's my report on what I've learned in the last few weeks :-)

view this post on Zulip RFC notifications bot (Aug 10 2022 at 01:55):

cfallin commented on issue #27:

... and in a typical show of my luck, I discover just after writing the above that my remat was doing something wrong (LICM was overriding its decision). I now have the result:

More measurements later but IMHO if I can get the compile time overhead a bit lower this should be compelling on its own (then the post-lowering optimization ideas above become the next step).

view this post on Zulip RFC notifications bot (Aug 11 2022 at 08:05):

cfallin commented on issue #27:

I did some optimization on compiler runtime tonight -- both low-level things like inline pragmas, refactoring some data structures to allow holding borrows rather than lookups, using dense rather than sparse maps in places, preallocating with correct sizes, and similar, and high-level things like algorithmic improvements to memoize more, trim the extraction search space (running upper bound and early exit/pruning), "subsume" all nodes in an eclass with one "final" version (for e.g. cprop), and the like.

In the end I got SpiderMonkey.wasm down to a ~1% compile-time regression (relative to baseline Cranelift), and bz2.wasm to a... 15% compile-time speedup (yeah, I know; it's wild; I was focusing entirely on SpiderMonkey and then just checked bz2 now, and would have fallen off my chair if gravity were slightly stronger):

[cfallin@xap]~/work/wasmtime% hyperfine  -L egraphs false,true 'target/release/wasmtime compile --cranelift-set use_egraphs={egraphs} ../wasm-tests/spidermonkey.wasm'
Benchmark 1: target/release/wasmtime compile --cranelift-set use_egraphs=false ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.106 s ±  0.051 s    [User: 15.034 s, System: 0.505 s]
  Range (min  max):    1.043 s   1.201 s    10 runs

Benchmark 2: target/release/wasmtime compile --cranelift-set use_egraphs=true ../wasm-tests/spidermonkey.wasm
  Time (mean ± σ):      1.115 s ±  0.033 s    [User: 15.915 s, System: 0.510 s]
  Range (min  max):    1.077 s   1.162 s    10 runs

Summary
  'target/release/wasmtime compile --cranelift-set use_egraphs=false ../wasm-tests/spidermonkey.wasm' ran
    1.01 ± 0.05 times faster than 'target/release/wasmtime compile --cranelift-set use_egraphs=true ../wasm-tests/spidermonkey.wasm'
[cfallin@xap]~/work/wasmtime% hyperfine -r 200  -L egraphs false,true 'target/release/wasmtime compile --cranelift-set use_egraphs={egraphs} ../wasm-tests/bz2.wasm'
Benchmark 1: target/release/wasmtime compile --cranelift-set use_egraphs=false ../wasm-tests/bz2.wasm
  Time (mean ± σ):      82.1 ms ±  10.8 ms    [User: 203.2 ms, System: 58.4 ms]
  Range (min  max):    71.4 ms  106.5 ms    200 runs

Benchmark 2: target/release/wasmtime compile --cranelift-set use_egraphs=true ../wasm-tests/bz2.wasm
  Time (mean ± σ):      71.4 ms ±  10.2 ms    [User: 191.0 ms, System: 58.7 ms]
  Range (min  max):    59.6 ms   96.3 ms    200 runs

Summary
  'target/release/wasmtime compile --cranelift-set use_egraphs=true ../wasm-tests/bz2.wasm' ran
    1.15 ± 0.22 times faster than 'target/release/wasmtime compile --cranelift-set use_egraphs=false ../wasm-tests/bz2.wasm'

And for completeness, yesterday's runtime numbers were on my M1 laptop, while today's work is on a Linux/x86-64 desktop (Ryzen 3900X); on this system, SpiderMonkey.wasm sees a 13% speedup, and bz2.wasm a 3% speedup:

[cfallin@xap]~/work/wasmtime% target/release/wasmtime compile --cranelift-set use_egraphs=false ../wasm-tests/spidermonkey.wasm -o base.cwasm
[cfallin@xap]~/work/wasmtime% target/release/wasmtime compile --cranelift-set use_egraphs=true ../wasm-tests/spidermonkey.wasm -o egraphs.cwasm
[cfallin@xap]~/work/wasmtime% hyperfine -L file base,egraphs 'target/release/wasmtime run --allow-precompiled --dir=. {file}.cwasm ./fib.js'
Benchmark 1: target/release/wasmtime run --allow-precompiled --dir=. base.cwasm ./fib.js
  Time (mean ± σ):      1.345 s ±  0.023 s    [User: 1.338 s, System: 0.005 s]
  Range (min  max):    1.317 s   1.384 s    10 runs

Benchmark 2: target/release/wasmtime run --allow-precompiled --dir=. egraphs.cwasm ./fib.js
  Time (mean ± σ):      1.195 s ±  0.010 s    [User: 1.189 s, System: 0.003 s]
  Range (min  max):    1.181 s   1.205 s    10 runs

Summary
  'target/release/wasmtime run --allow-precompiled --dir=. egraphs.cwasm ./fib.js' ran
    1.13 ± 0.02 times faster than 'target/release/wasmtime run --allow-precompiled --dir=. base.cwasm ./fib.js'
[cfallin@xap]~/work/wasmtime% target/release/wasmtime compile --cranelift-set use_egraphs=false ../wasm-tests/bz2.wasm -o base.cwasm
[cfallin@xap]~/work/wasmtime% target/release/wasmtime compile --cranelift-set use_egraphs=true ../wasm-tests/bz2.wasm -o egraphs.cwasm
[cfallin@xap]~/work/wasmtime% hyperfine -L file base,egraphs 'target/release/wasmtime run --allow-precompiled {file}.cwasm'
Benchmark 1: target/release/wasmtime run --allow-precompiled base.cwasm
  Time (mean ± σ):     428.3 ms ±   9.7 ms    [User: 425.3 ms, System: 1.7 ms]
  Range (min  max):   416.6 ms  441.6 ms    10 runs

Benchmark 2: target/release/wasmtime run --allow-precompiled egraphs.cwasm
  Time (mean ± σ):     414.5 ms ±  12.8 ms    [User: 411.5 ms, System: 1.9 ms]
  Range (min  max):   403.1 ms  440.7 ms    10 runs

Summary
  'target/release/wasmtime run --allow-precompiled egraphs.cwasm' ran
    1.03 ± 0.04 times faster than 'target/release/wasmtime run --allow-precompiled base.cwasm'

I think we're at the point where we can discuss this in the next Cranelift meeting, then hopefully, if consensus exists, I'll start cleaning this up and preparing it for merging (in small, more easily-digestible pieces!).

view this post on Zulip RFC notifications bot (Aug 11 2022 at 17:07):

mwillsey commented on issue #27:

@cfallin Amazing work! These are killer results! I think specializing to the ægraph (great name) is definitely the right decision for this use case; the trade-offs seem to be worth it. Although, it does make me wonder what it would take to make "normal" equality saturation usable in this setting. Exciting future work!

view this post on Zulip RFC notifications bot (Aug 11 2022 at 17:50):

oflatt commented on issue #27:

This is so cool!
Would be really interested to see how much better results are with different tweaks. Does full egraph-style matching do much good? I'm thinking about the case where an enode creates a new match in a parent eclass.

view this post on Zulip RFC notifications bot (Aug 11 2022 at 18:09):

cfallin commented on issue #27:

Thanks @mwillsey! I'm definitely curious what other design points may be possible here. I suspect that benefits of more strongly normalizing approaches (full recanonicalize/rebuild as in a traditional egraph or egg) will become more apparent the more we add interesting rules. Right now I have rules I wrote with my traditional-compiler-author brain, so mostly constant prop and the like where the right-hand side is unambiguously better. (There are a few cases of reassociation where I let the extractor do the right thing though, and also scoped elaboration can make things interesting when another expression forces evaluation of some otherwise-expensive node, shorting its cost to 0 for subsequent uses and altering the best choice. (This is why extraction is done lazily and memoized.) In those cases I suspect having multiple versions around actually helps.)

There are definitely some experiments I'd like to run, if I were to put on my old grad-student hat (it has collected some dust!) -- among them, putting together a full rebuild step as @oflatt suggests, and maybe also a "rewrites always subsume" mode (ie no multi-version capability) with the rules. My day-job means I need to focus on getting this "makes compiler faster and better" magic into production shape first, of course, but I do want to experiment some as time allows :-)

view this post on Zulip RFC notifications bot (Aug 11 2022 at 19:33):

remysucre commented on issue #27:

Hello! Yet another egg contributor here. @mwillsey and I had a great chat with Nico Bruno, who was a main engineer of SQLServer. It turns out one main difference between equality saturation and cascade-style DB query optimization is that the latter maintains the acyclicity of the e-graph, just like aegraph. Acyclicity enables many powerful optimizations like branch-and-bound, and usually runs very fast. I highly recommend checking out some of the literature there, as well as some modern implementations like cockroach and orca.

view this post on Zulip RFC notifications bot (Aug 11 2022 at 19:40):

cfallin commented on issue #27:

@remysucre thanks for the pointers! (As with many other topics in systems work, the database folks got there first... why am I not surprised?!). I'm happy to hear there are similar approaches in use and I'll read up more.

view this post on Zulip RFC notifications bot (Aug 15 2022 at 21:48):

cfallin commented on issue #27:

Hi all -- I've updated the RFC to refer to ægraphs and include the numbers above. A full writeup of how ægraphs work is IMHO out-of-scope here but I will include a bunch of detail somewhere in cranelift/codegen/docs/ if this merges. The important decision points here, IMHO, are:

I'd like to kick off a round of discussion and (hopefully!) consensus-building on that. Happy to talk more about this in the next Cranelift meeting but let's see what folks think here first. Thanks!

view this post on Zulip RFC notifications bot (Aug 17 2022 at 14:59):

cfallin commented on issue #27:

Thanks @avanhatt! Updated based on feedback.

view this post on Zulip RFC notifications bot (Aug 23 2022 at 16:49):

cfallin commented on issue #27:

Hi all -- after discussing this briefly yesterday it sounds like no one has any major objections; so given that, and the positive results we've found, I would like to make a

Motion to finalize with a disposition to merge

I've updated the list below to contain all stakeholders who have committed to Cranelift within the past three months, or participated in RFC discussions in the same timeframe.

Stakeholders sign-off

Arm

Cornell

Embark Studios

Fastly

IBM

Intel

Unaffiliated

view this post on Zulip RFC notifications bot (Aug 23 2022 at 16:49):

cfallin edited a comment on issue #27:

Hi all -- after discussing this briefly yesterday it sounds like no one has any major objections; so given that, and the positive results we've found, I would like to make a

Motion to finalize with a disposition to merge

I've updated the list below to contain all stakeholders who have committed to Cranelift within the past three months, or participated in RFC discussions in the same timeframe.

Stakeholders sign-off

Arm

Cornell

Embark Studios

Fastly

IBM

Intel

Unaffiliated

view this post on Zulip RFC notifications bot (Aug 23 2022 at 17:25):

cfallin edited a comment on issue #27:

Hi all -- after discussing this briefly yesterday it sounds like no one has any major objections; so given that, and the positive results we've found, I would like to make a

Motion to finalize with a disposition to merge

I've updated the list below to contain all stakeholders who have committed to Cranelift within the past three months, or participated in RFC discussions in the same timeframe.

Stakeholders sign-off

Arm

Cornell

Embark Studios

Fastly

IBM

Intel

Unaffiliated

view this post on Zulip RFC notifications bot (Aug 23 2022 at 17:45):

fitzgen edited a comment on issue #27:

Hi all -- after discussing this briefly yesterday it sounds like no one has any major objections; so given that, and the positive results we've found, I would like to make a

Motion to finalize with a disposition to merge

I've updated the list below to contain all stakeholders who have committed to Cranelift within the past three months, or participated in RFC discussions in the same timeframe.

Stakeholders sign-off

Arm

Cornell

Embark Studios

Fastly

IBM

Intel

Unaffiliated

view this post on Zulip RFC notifications bot (Aug 24 2022 at 17:31):

cfallin edited a comment on issue #27:

Hi all -- after discussing this briefly yesterday it sounds like no one has any major objections; so given that, and the positive results we've found, I would like to make a

Motion to finalize with a disposition to merge

I've updated the list below to contain all stakeholders who have committed to Cranelift within the past three months, or participated in RFC discussions in the same timeframe.

Stakeholders sign-off

Arm

Cornell

Embark Studios

Fastly

IBM

Intel

Unaffiliated

view this post on Zulip RFC notifications bot (Aug 24 2022 at 17:33):

cfallin commented on issue #27:

Given approvals from multiple stakeholder groups, we now enter the 10-day final comment period.

If no objections are registered by 2022-09-03, we will merge this RFC. Thanks all for the discussion!

view this post on Zulip RFC notifications bot (Aug 24 2022 at 18:11):

cfallin edited a comment on issue #27:

Hi all -- after discussing this briefly yesterday it sounds like no one has any major objections; so given that, and the positive results we've found, I would like to make a

Motion to finalize with a disposition to merge

I've updated the list below to contain all stakeholders who have committed to Cranelift within the past three months, or participated in RFC discussions in the same timeframe.

Stakeholders sign-off

Arm

Cornell

Embark Studios

Fastly

IBM

Intel

Unaffiliated

view this post on Zulip RFC notifications bot (Aug 24 2022 at 19:17):

jameysharp edited a comment on issue #27:

Hi all -- after discussing this briefly yesterday it sounds like no one has any major objections; so given that, and the positive results we've found, I would like to make a

Motion to finalize with a disposition to merge

I've updated the list below to contain all stakeholders who have committed to Cranelift within the past three months, or participated in RFC discussions in the same timeframe.

Stakeholders sign-off

Arm

Cornell

Embark Studios

Fastly

IBM

Intel

Unaffiliated

view this post on Zulip RFC notifications bot (Aug 25 2022 at 02:13):

elliottt edited a comment on issue #27:

Hi all -- after discussing this briefly yesterday it sounds like no one has any major objections; so given that, and the positive results we've found, I would like to make a

Motion to finalize with a disposition to merge

I've updated the list below to contain all stakeholders who have committed to Cranelift within the past three months, or participated in RFC discussions in the same timeframe.

Stakeholders sign-off

Arm

Cornell

Embark Studios

Fastly

IBM

Intel

Unaffiliated

view this post on Zulip RFC notifications bot (Aug 30 2022 at 12:09):

akirilov-arm edited a comment on issue #27:

Hi all -- after discussing this briefly yesterday it sounds like no one has any major objections; so given that, and the positive results we've found, I would like to make a

Motion to finalize with a disposition to merge

I've updated the list below to contain all stakeholders who have committed to Cranelift within the past three months, or participated in RFC discussions in the same timeframe.

Stakeholders sign-off

Arm

Cornell

Embark Studios

Fastly

IBM

Intel

Unaffiliated

view this post on Zulip RFC notifications bot (Aug 30 2022 at 14:08):

alexcrichton edited a comment on issue #27:

Hi all -- after discussing this briefly yesterday it sounds like no one has any major objections; so given that, and the positive results we've found, I would like to make a

Motion to finalize with a disposition to merge

I've updated the list below to contain all stakeholders who have committed to Cranelift within the past three months, or participated in RFC discussions in the same timeframe.

Stakeholders sign-off

Arm

Cornell

Embark Studios

Fastly

IBM

Intel

Unaffiliated

view this post on Zulip RFC notifications bot (Sep 07 2022 at 15:58):

cfallin commented on issue #27:

The FCP has now (more than) elapsed; with no objections, I will now merge this RFC! Thanks all for the spirited discussion and ideas. Hopefully soon now we can get a proper implementation into Cranelift; my next task is to clean up the prototype and start to submit pieces for review.


Last updated: Jan 24 2025 at 00:11 UTC