myunbin opened issue #13204:
The original problem was shown here: #13068.
Problem
After testing the fuzzer-generated input
foo.clif, we identified three rule groups introduced by #12926 (commit1578325) as candidates:
- A:
((x + y) - (x + z)) -> (y - z)- B:
((x - z) - (y - z)) -> (x - y)- C:
((x - y) - (x - z)) -> (z - y)Rule B and Rule C are referenced by nickname below.
The regression disappears as soon as both Rule B and Rule C are removed from
1578325.<details>
<summary>benchmark details</summary>
- Baseline commit:
1578325- Measurement command:
CRANELIFT_FILETESTS_THREADS=1 /usr/bin/time -p <clif-util> test <foo.clif>- Runs per scenario:
5
scenario removed groups median real (s) mean real (s) min max stdev stalled slow vs baseline minus_group_ab A + B 31.11 31.196 30.96 31.52 0.228 5 5 1.03x minus_group_ac A + C 31.21 31.230 31.02 31.63 0.218 5 5 1.03x minus_group_bc B + C 0.17 0.164 0.14 0.18 0.016 0 0 0.01x minus_group_abc A + B + C 0.15 0.156 0.14 0.18 0.016 0 0 0.00x Leaving either Rule B or Rule C active still blows up, so each of them is independently sufficient to trigger the regression. Group A has no measurable impact for this testcase.
</details>
Observation
foo.clifsummary;; --- foo.clif:124-136 (ishl chain sharing v214) --- v214 = ishl v211, v213 v215 = ishl v0, v214 v216 = ishl v215, v214 v217 = ishl v216, v214 v218 = ishl v217, v214 v219 = ishl v218, v214 v220 = ishl v219, v214 v221 = ishl v220, v214 v222 = ishl v221, v214 v223 = ishl v222, v214 v224 = ishl v223, v214 v225 = ishl v224, v214 v226 = ishl v225, v214 ;; --- foo.clif:155 (selects.isle:4 aliases v242 to v226) --- v242 = select v4, v226, v226 ;; --- foo.clif:186-242 (48-step isub(v, v) chain on i64x2) --- v247 = isub v242, v242 v248 = isub v247, v247 v249 = isub v248, v248 v250 = isub v249, v249 ... 43 more steps ... v293 = isub v292, v292 v294 = isub v293, v293The fuzzer combined two structural ingredients:
- an
ishlchain sharing a common shift amount (v214),- a long
isub.i64x2(v, v)chain where vector typing blocks the scalar-only short-circuit prior to #13063.Additional observations:
The regression was resolved after vector support in #13063 (
f2807a1), which makes this rule effectively work on vectors too:
isle ;; x - x --> 0 (rule (simplify (isub (ty_int ty) x x)) (subsume (iconst_u ty 0)))Once
isub.i64x2 v vcollapses to0at the root, Rule B/C lose their fuel.Under
1578325(Rule B/C active, pre-#13063 behavior), this rule fires ~3,000,000 times on a single rule:
isle ;; (x<<z) - (y<<z) --> (x-y)<<z (rule (simplify (isub ty (ishl ty x z) (ishl ty y z))) (ishl ty (isub ty x y) z))
Let us call Rule D.Causality
On the chain
v_{k+1} = isub(v_k, v_k)withv_k = isub(v_{k-1}, v_{k-1}), eachv_{k+1}unfolds to:
isub(isub(v_{k-1}, v_{k-1}), isub(v_{k-1}, v_{k-1}))That shape matches Rule B and Rule C, so each fire can union
v_k's leader intov_{k+1}'s e-class. Equivalently, each chain step absorbs its predecessor's e-class.Example
Take
v252 = v251 - v251after prior chain compression has already begun.Each
≡means "same e-class":v252 ≡ v251 - v251 ≡ (v250 - v250) - (v250 - v250) ; unfold v251 ≡ v250 - v250 ; Rule B (or C) ≡ v251 ; equivalent value already existsResult:
v251is added intov252's e-class. Repeating this at earlier steps pulls in more predecessors transitively.Because those absorbed e-classes already contain
ishl-form alternatives, Rule D can also fire on the same top-level:v252 ≡ v251 - v251 ≡ (A << v214) - (A << v214) ; pick ishl-form alternatives ≡ (A - A) << v214 ; Rule D ≡ ((v225 - v225) - (v225 - v225)) << v214 ≡ (v225 - v225) << v214 ; Rule B (or C) ≡ A << v214 ...This is the key feedback loop:
- Rule B/C at step
k+1absorbs stepk's e-class.- That gives Rule D more valid bindings at step
k+1.- Rule D creates more
ishl-form enodes, which are then absorbed at stepk+2by Rule B/C.So Rule B/C do not need many fires themselves; they expand downstream match pools for Rule D.
Without Rule B/C
Rule D can still fire, butv_{k+1} = isub(v_k, v_k)mostly stays single-path: stepk+1only sees alternatives created insidev_kitself, so each step quickly hits a local limit. Inductively,v_khas same single choice.With Rule B/C
Stepk+1can absorb stepk's e-class, sov_{k+1}carries not only its own forms but also alternatives already present inv_k(and then fromv_{k-1},v_{k-2}, ... through repeated absorption). That is why later nodes inherit earlier-node choices and must search a much larger space.Limits for e-graphs
REWRITE_LIMIT,ECLASS_ENODE_LIMIT, andMATCHES_LIMITare local caps (per recursion stack / per e-class / per ISLE call), not global per-function budgets.
ECLASS_ENODE_LIMITblocks additional unions for one e-class, but does not impose a global function-wide cap.MATCHES_LIMITtruncates returned values per call, but many calls still occur across 48 independent top-level chain nodes.We also noticed a off-by-one:
REWRITE_LIMIT = 5is checked asif rewrite_depth > 5before the increment and its initial value is0, so rewrites actually run up to6. If the intent is to cap at 5, the check should be>= 5.With no function-global or per-rule budget, each top-level can replay the same cascade and totals accumulate.
Per-top-level Rule D fire counts (first few chain steps):
top-level fires v247 6 v248 6 v249 44 v251 138 This growth pattern aligns with the observed 3.3M total fires.
Trace evidence
Trace evidence that
v247's leaderv830is absorbed into v248 and v249 e-classes (combined.log, trace-log build):1123: optimizing inst inst132 orig result v247 gave v830 1269: -> returned from ISLE: v248 -> [v830, v852] 1904: -> returned from ISLE: v249 -> [v830, v852, v932, v939, v944]
v830in both later return lists is direct evidence that later chain steps absorb prior chain e-classes.Open question
My current interpretation is that this blow-up is not only a bug in one rewrite rule, but may be an example of a more fundamental limitation of e-graph style rewriting: even if saturation would eventually be reached, the search space can become too large for compile-time use before that happens.
Rule B/C makes later
v_{k+1}nodes inherit alternatives from earlierv_knodes, and Rule D then explores those accumulated alternatives. The current limits bound local recursion, per-e-class growth, and per-call result count, but not the total number of rule applications across the function.Is this the right way to interpret the issue: an expected theoretical limitation of e-graph-based rewriting as applied in Cranelift, rather than only a bug in this specific rewrite rule?
myunbin added the bug label to Issue #13204.
myunbin added the cranelift label to Issue #13204.
cfallin commented on issue #13204:
Thanks, @myunbin, for the writeup.
A few high-level questions, as this writeup uses terminology I'm not familiar with:
- "Take v252 = v251 - v251 after prior chain compression has already begun." --> What does "chain compression" mean? This is not a term I'm familiar with nor one that we use in the aegraph implementation.
- You keep writing "absorb's X's e-class". This seems to suggest more of a classical or egg-style e-graph, where e-classes are merged and uses are rewritten (and a union-find redirects uses to the merged e-class). As you probably know, the aegraph implementation uses union nodes so we don't "absorb another e-class"; we define a new value that represents a merged e-class.
Then a meta-question: I haven't worked through all of the details above yet, but this an extremely verbose writeup, with unfamiliar terminology and some aspects above that sound like misunderstandings (or, at least, an odd way of describing things). I have to ask bluntly: did you use an LLM to produce this analysis? If so, could you please work through it as a human and try to condense it to something that shows understanding of our local terminology and addresses the above questions?
Thanks!
cfallin edited a comment on issue #13204:
Thanks, @myunbin, for the writeup.
A few high-level questions, as this writeup uses terminology I'm not familiar with:
- "Take v252 = v251 - v251 after prior chain compression has already begun." --> What does "chain compression" mean? This is not a term I'm familiar with nor one that we use in the aegraph implementation.
- You keep writing "absorbs X's e-class". This seems to suggest more of a classical or egg-style e-graph, where e-classes are merged and uses are rewritten (and a union-find redirects uses to the merged e-class). As you probably know, the aegraph implementation uses union nodes so we don't "absorb another e-class"; we define a new value that represents a merged e-class.
Then a meta-question: I haven't worked through all of the details above yet, but this an extremely verbose writeup, with unfamiliar terminology and some aspects above that sound like misunderstandings (or, at least, an odd way of describing things). I have to ask bluntly: did you use an LLM to produce this analysis? If so, could you please work through it as a human and try to condense it to something that shows understanding of our local terminology and addresses the above questions?
Thanks!
myunbin edited issue #13204:
The original problem was shown here: #13068.
Problem
After testing the fuzzer-generated input
foo.clif, we identified three rule groups introduced by #12926 (commit1578325) as candidates:
- rule A:
((x + y) - (x + z)) -> (y - z)- rule B:
((x - z) - (y - z)) -> (x - y)- rule C:
((x - y) - (x - z)) -> (z - y)The regression disappears as soon as both Rule B and Rule C are removed from
1578325.<details>
<summary>benchmark details</summary>
- Baseline commit:
1578325- Measurement command:
CRANELIFT_FILETESTS_THREADS=1 /usr/bin/time -p <clif-util> test <foo.clif>- Runs per scenario:
5
scenario removed groups median real (s) mean real (s) min max stdev stalled slow vs baseline minus_group_ab A + B 31.11 31.196 30.96 31.52 0.228 5 5 1.03x minus_group_ac A + C 31.21 31.230 31.02 31.63 0.218 5 5 1.03x minus_group_bc B + C 0.17 0.164 0.14 0.18 0.016 0 0 0.01x minus_group_abc A + B + C 0.15 0.156 0.14 0.18 0.016 0 0 0.00x Leaving either Rule B or Rule C active still blows up, so each of them is independently sufficient to trigger the regression. Group A has no measurable impact for this testcase.
</details>
Observation
foo.clifsummary;; --- foo.clif:124-136 (repeated ishl values sharing v214) --- v214 = ishl v211, v213 v215 = ishl v0, v214 v216 = ishl v215, v214 v217 = ishl v216, v214 v218 = ishl v217, v214 v219 = ishl v218, v214 v220 = ishl v219, v214 v221 = ishl v220, v214 v222 = ishl v221, v214 v223 = ishl v222, v214 v224 = ishl v223, v214 v225 = ishl v224, v214 v226 = ishl v225, v214 ;; --- foo.clif:155 (selects.isle:4 aliases v242 to v226) --- v242 = select v4, v226, v226 ;; --- foo.clif:186-242 (48 repeated isub(v, v) values on i64x2) --- v247 = isub v242, v242 v248 = isub v247, v247 v249 = isub v248, v248 v250 = isub v249, v249 ... 43 more steps ... v293 = isub v292, v292 v294 = isub v293, v293The fuzzer has two main structure:
- Repeated
ishlvalues sharing a common shift amount (v214),- Repeated
isub.i64x2(v, v)(48 times).Additional observations:
The regression was resolved after vector support in #13063 (
f2807a1):
;; x - x --> 0 (rule (simplify (isub (ty_int ty) x x)) (subsume (iconst_u ty 0)))
isub.i64x2 v vcollapses to0at the root.Under
1578325(Rule B/C active, pre-#13063 behavior), this rule fires ~3,000,000 times on a single rule:
;; (x<<z) - (y<<z) --> (x-y)<<z (rule (simplify (isub ty (ishl ty x z) (ishl ty y z))) (ishl ty (isub ty x y) z))
Let us call Rule D.Causality
From
v_{k+1} = isub(v_k, v_k), wherev_k = isub(v_{k-1}, v_{k-1}), one level of unfolding gives:v_{k+1} = isub(v_k, v_k) = isub(isub(v_{k-1}, v_{k-1}), isub(v_{k-1}, v_{k-1})) = isub(v_{k-1}, v_{k-1}) ; Rule BResult is the earlier expression,
v_k. This means that rules can return the earlier value as an equivalent value while optimizingv_{k+1}. This equivalence is represented by a Union value, unless the rewrite subsumes the original or hits the eclass-size limit.For example:
v252 = isub(v251, v251) v251 = isub(v250, v250) -- v252 = isub(v251, v251) = isub(isub(v250, v250), isub(v250, v250)) = isub(v250, v250) ; Rule B = v251Rule B/C can make a later repeated
isub(v, v)value equivalent to an earlier one.(rule (simplify (isub ty (ishl ty x z) (ishl ty y z))) (ishl ty (isub ty x y) z))Rule D matches when both operands can be viewed as
ishlexpressions with the same shift amount. For example,v247 = isub(v242, v242) = isub(v226, v226) = isub(ishl(v225, v214), ishl(v225, v214)) = ishl(isub(v225, v225), v214) ; Rule DB/C add the extra effect that a later value can become equivalent to an earlier repeated
isubvalue:v248 = isub(v247, v247) = isub(isub(v242, v242), isub(v242, v242)) = isub(v242, v242) ; Rule B = v247So later values can see both locally-created forms and forms already associated with earlier values such as
v247.
- Without Rule B/C, Rule D can still fire. But later repeated
isub(v, v)values do not get rewritten back to earlier repeatedisubvalues. Rule D is also bounded by the rewrite-depth limit.- With Rule B/C, later values can also be equivalent to earlier values, so Rule D gets many more valid bindings.
In short:
- Rule B/C can rewrite a later
isub(v, v)to a form already represented by an earlier value.- Rule D then has more equivalent forms to match against.
Limits for egraph rewriting
The existing limits are local:
REWRITE_LIMITbounds recursive optimization.ECLASS_ENODE_LIMITbounds the size of one tree.MATCHES_LIMITtruncates results from one ISLE call.Each of the 48
isub(v, v)values is optimized separately. Rule B can rewrite a later one back to the previousisub(v, v)value, so each local rewrite can match forms recorded for earlier values.Additional off-by-one:
REWRITE_LIMIT = 5is checked before incrementingrewrite_depth(which starts from0), so the maximum observed depth is6.Evidence
Per-top-level Rule D fire counts (first few repeated
isub(v, v)values):
top-level fires v247 6 v248 6 v249 44 v251 138 This growth pattern accumulates to observed 3.3M total fires.
Trace evidence that an earlier optimized value is returned again for later values(trace-log):
1123: optimizing inst inst132 orig result v247 gave v830 1269: -> returned from ISLE: v248 -> [v830, v852] 1904: -> returned from ISLE: v249 -> [v830, v852, v932, v939, v944]
v830appearing in later return lists shows that an earlier optimized value is returned again as an equivalent value for later inputs.Open question
Our interpretation is that this is not just a problem with one rewrite rule, but a compile-time risk in how e-graphs(and Cranelift's aegraph) expose accumulated equivalent alternatives during matching.
Does this interpretation sound reasonable, or would you think about this failure mode differently?
myunbin commented on issue #13204:
Hi @cfallin, first, I apologize for using unfamiliar terminology and vague concepts.
Some of the wording came from terms we use internally in my current project. Since English is not my first language, relying on an LLM for translation also seems to have made parts of the text less precise. I have now tried to make it as concise as possible.
Although I used an LLM while preparing this analysis, it was only for auxiliary tasks such as writing scripts and polishing text. The experiments and root-cause analysis were done by self.
If anything in my interpretation seems wrong, unclear, or incomplete, I would appreciate your feedback.
Thanks!
myunbin commented on issue #13204:
Oh, and regarding the questions:
- By "chain compression", I was referring to the repeated pattern
v_k = isub(v_{k-1}, v_{k-1})in the testcase.- By "absorbs", I meant that the newly created aegraph value can include the result of an earlier value. I have updated the wording to avoid that implication.
Again, sorry for the poor word choices.
Thanks!
myunbin edited a comment on issue #13204:
Hi @cfallin, first, I apologize for using unfamiliar terminology and vague concepts.
Some of the wording came from terms we use internally in my current project. Since English is not my first language, relying on an LLM for translation also seems to have made parts of the text less precise. I have now tried to make it as concise as possible.
Although I used an LLM while preparing this analysis, it was only for auxiliary tasks such as writing scripts and polishing text. The experiments and root-cause analysis were done by self.
I've updated the original writeup. If anything still seems wrong or worth discussing, I would appreciate your feedback.
Thanks!
cfallin commented on issue #13204:
I've read through this again and I think I understand the failure mode better now. Thanks for writing up the example.
My very-simplified view is: the recursive
x_k+1 = x_k - x_k, rewritingx_k+1as an alias ofx_kinstead of0, while valid, creates very large eclasses so the rules that match on these eclasses (such as the shift-rule) blow up their search time because they evaluate every combination of inputs.
subsumedoes its job once we introduce thex - x == 0simplification rule; but separately, this is a new growth dimension for eclasses. If I am reading this right, the chain of subtracts apparently creates an eclass larger than the eclass enode-count limit (ECLASS_ENODE_LIMIT)? Is that what you observed? Naively I would have expected the limit to kick in. I'll poke at this further.In any case, aside from use of
subsumewherever we have a clear "collapsing optimization" that deletes part of the program (e.g.x-x = 0), I wonder if we should implement some sort of fuel throughout the ISLE-generated matching logic. It's always valid not to optimize a given value, so we can quit at any point. That would put a hard bound on this kind of unexpected rule interaction.
Last updated: May 03 2026 at 22:13 UTC