Stream: git-wasmtime

Topic: wasmtime / issue #13204 Cranelift: follow-up analysis of ...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 27 2026 at 04:26):

myunbin opened issue #13204:

The original problem was shown here: #13068.

Problem

After testing the fuzzer-generated input foo.clif, we identified three rule groups introduced by #12926 (commit 1578325) as candidates:

Rule B and Rule C are referenced by nickname below.

The regression disappears as soon as both Rule B and Rule C are removed from 1578325.

<details>
<summary>benchmark details</summary>

scenario removed groups median real (s) mean real (s) min max stdev stalled slow vs baseline
minus_group_ab A + B 31.11 31.196 30.96 31.52 0.228 5 5 1.03x
minus_group_ac A + C 31.21 31.230 31.02 31.63 0.218 5 5 1.03x
minus_group_bc B + C 0.17 0.164 0.14 0.18 0.016 0 0 0.01x
minus_group_abc A + B + C 0.15 0.156 0.14 0.18 0.016 0 0 0.00x

Leaving either Rule B or Rule C active still blows up, so each of them is independently sufficient to trigger the regression. Group A has no measurable impact for this testcase.

</details>

Observation

foo.clif summary

    ;; --- foo.clif:124-136  (ishl chain sharing v214) ---
    v214 = ishl v211, v213
    v215 = ishl v0,   v214
    v216 = ishl v215, v214
    v217 = ishl v216, v214
    v218 = ishl v217, v214
    v219 = ishl v218, v214
    v220 = ishl v219, v214
    v221 = ishl v220, v214
    v222 = ishl v221, v214
    v223 = ishl v222, v214
    v224 = ishl v223, v214
    v225 = ishl v224, v214
    v226 = ishl v225, v214

    ;; --- foo.clif:155  (selects.isle:4 aliases v242 to v226) ---
    v242 = select v4, v226, v226

    ;; --- foo.clif:186-242  (48-step isub(v, v) chain on i64x2) ---
    v247 = isub v242, v242
    v248 = isub v247, v247
    v249 = isub v248, v248
    v250 = isub v249, v249

    ... 43 more steps ...

    v293 = isub v292, v292
    v294 = isub v293, v293

The fuzzer combined two structural ingredients:

  1. an ishl chain sharing a common shift amount (v214),
  2. a long isub.i64x2(v, v) chain where vector typing blocks the scalar-only short-circuit prior to #13063.

Additional observations:

  1. The regression was resolved after vector support in #13063 (f2807a1), which makes this rule effectively work on vectors too:

    isle ;; x - x --> 0 (rule (simplify (isub (ty_int ty) x x)) (subsume (iconst_u ty 0)))

    Once isub.i64x2 v v collapses to 0 at the root, Rule B/C lose their fuel.

  2. Under 1578325 (Rule B/C active, pre-#13063 behavior), this rule fires ~3,000,000 times on a single rule:

    isle ;; (x<<z) - (y<<z) --> (x-y)<<z (rule (simplify (isub ty (ishl ty x z) (ishl ty y z))) (ishl ty (isub ty x y) z))
    Let us call Rule D.

Causality

On the chain v_{k+1} = isub(v_k, v_k) with v_k = isub(v_{k-1}, v_{k-1}), each v_{k+1} unfolds to:

isub(isub(v_{k-1}, v_{k-1}), isub(v_{k-1}, v_{k-1}))

That shape matches Rule B and Rule C, so each fire can union v_k's leader into v_{k+1}'s e-class. Equivalently, each chain step absorbs its predecessor's e-class.

Example

Take v252 = v251 - v251 after prior chain compression has already begun.

Each means "same e-class":

v252 ≡ v251 - v251
     ≡ (v250 - v250) - (v250 - v250)          ; unfold v251
     ≡ v250 - v250                            ; Rule B (or C)
     ≡ v251                                   ; equivalent value already exists

Result: v251 is added into v252's e-class. Repeating this at earlier steps pulls in more predecessors transitively.

Because those absorbed e-classes already contain ishl-form alternatives, Rule D can also fire on the same top-level:

v252 ≡ v251 - v251
     ≡ (A << v214) - (A << v214)               ; pick ishl-form alternatives
     ≡ (A - A) << v214                         ; Rule D
     ≡ ((v225 - v225) - (v225 - v225)) << v214
     ≡ (v225 - v225) << v214                   ; Rule B (or C)
     ≡ A << v214
     ...

This is the key feedback loop:

  1. Rule B/C at step k+1 absorbs step k's e-class.
  2. That gives Rule D more valid bindings at step k+1.
  3. Rule D creates more ishl-form enodes, which are then absorbed at step k+2 by Rule B/C.

So Rule B/C do not need many fires themselves; they expand downstream match pools for Rule D.

Limits for e-graphs

REWRITE_LIMIT, ECLASS_ENODE_LIMIT, and MATCHES_LIMIT are local caps (per recursion stack / per e-class / per ISLE call), not global per-function budgets.

We also noticed a off-by-one:

With no function-global or per-rule budget, each top-level can replay the same cascade and totals accumulate.

Per-top-level Rule D fire counts (first few chain steps):

top-level fires
v247 6
v248 6
v249 44
v251 138

This growth pattern aligns with the observed 3.3M total fires.

Trace evidence

Trace evidence that v247's leader v830 is absorbed into v248 and v249 e-classes (combined.log, trace-log build):

1123:  optimizing inst inst132 orig result v247 gave v830
1269:  -> returned from ISLE: v248 -> [v830, v852]
1904:  -> returned from ISLE: v249 -> [v830, v852, v932, v939, v944]

v830 in both later return lists is direct evidence that later chain steps absorb prior chain e-classes.

Open question

My current interpretation is that this blow-up is not only a bug in one rewrite rule, but may be an example of a more fundamental limitation of e-graph style rewriting: even if saturation would eventually be reached, the search space can become too large for compile-time use before that happens.

Rule B/C makes later v_{k+1} nodes inherit alternatives from earlier v_k nodes, and Rule D then explores those accumulated alternatives. The current limits bound local recursion, per-e-class growth, and per-call result count, but not the total number of rule applications across the function.

Is this the right way to interpret the issue: an expected theoretical limitation of e-graph-based rewriting as applied in Cranelift, rather than only a bug in this specific rewrite rule?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 27 2026 at 04:26):

myunbin added the bug label to Issue #13204.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 27 2026 at 04:26):

myunbin added the cranelift label to Issue #13204.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 27 2026 at 04:59):

cfallin commented on issue #13204:

Thanks, @myunbin, for the writeup.

A few high-level questions, as this writeup uses terminology I'm not familiar with:

Then a meta-question: I haven't worked through all of the details above yet, but this an extremely verbose writeup, with unfamiliar terminology and some aspects above that sound like misunderstandings (or, at least, an odd way of describing things). I have to ask bluntly: did you use an LLM to produce this analysis? If so, could you please work through it as a human and try to condense it to something that shows understanding of our local terminology and addresses the above questions?

Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 27 2026 at 04:59):

cfallin edited a comment on issue #13204:

Thanks, @myunbin, for the writeup.

A few high-level questions, as this writeup uses terminology I'm not familiar with:

Then a meta-question: I haven't worked through all of the details above yet, but this an extremely verbose writeup, with unfamiliar terminology and some aspects above that sound like misunderstandings (or, at least, an odd way of describing things). I have to ask bluntly: did you use an LLM to produce this analysis? If so, could you please work through it as a human and try to condense it to something that shows understanding of our local terminology and addresses the above questions?

Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 27 2026 at 06:27):

myunbin edited issue #13204:

The original problem was shown here: #13068.

Problem

After testing the fuzzer-generated input foo.clif, we identified three rule groups introduced by #12926 (commit 1578325) as candidates:

The regression disappears as soon as both Rule B and Rule C are removed from 1578325.

<details>
<summary>benchmark details</summary>

scenario removed groups median real (s) mean real (s) min max stdev stalled slow vs baseline
minus_group_ab A + B 31.11 31.196 30.96 31.52 0.228 5 5 1.03x
minus_group_ac A + C 31.21 31.230 31.02 31.63 0.218 5 5 1.03x
minus_group_bc B + C 0.17 0.164 0.14 0.18 0.016 0 0 0.01x
minus_group_abc A + B + C 0.15 0.156 0.14 0.18 0.016 0 0 0.00x

Leaving either Rule B or Rule C active still blows up, so each of them is independently sufficient to trigger the regression. Group A has no measurable impact for this testcase.

</details>

Observation

foo.clif summary

    ;; --- foo.clif:124-136  (repeated ishl values sharing v214) ---
    v214 = ishl v211, v213
    v215 = ishl v0,   v214
    v216 = ishl v215, v214
    v217 = ishl v216, v214
    v218 = ishl v217, v214
    v219 = ishl v218, v214
    v220 = ishl v219, v214
    v221 = ishl v220, v214
    v222 = ishl v221, v214
    v223 = ishl v222, v214
    v224 = ishl v223, v214
    v225 = ishl v224, v214
    v226 = ishl v225, v214

    ;; --- foo.clif:155  (selects.isle:4 aliases v242 to v226) ---
    v242 = select v4, v226, v226

    ;; --- foo.clif:186-242  (48 repeated isub(v, v) values on i64x2) ---
    v247 = isub v242, v242
    v248 = isub v247, v247
    v249 = isub v248, v248
    v250 = isub v249, v249

    ... 43 more steps ...

    v293 = isub v292, v292
    v294 = isub v293, v293

The fuzzer has two main structure:

  1. Repeated ishl values sharing a common shift amount (v214),
  2. Repeated isub.i64x2(v, v) (48 times).

Additional observations:

  1. The regression was resolved after vector support in #13063 (f2807a1):

    ;; x - x --> 0 (rule (simplify (isub (ty_int ty) x x)) (subsume (iconst_u ty 0)))

    isub.i64x2 v v collapses to 0 at the root.

  2. Under 1578325 (Rule B/C active, pre-#13063 behavior), this rule fires ~3,000,000 times on a single rule:

    ;; (x<<z) - (y<<z) --> (x-y)<<z (rule (simplify (isub ty (ishl ty x z) (ishl ty y z))) (ishl ty (isub ty x y) z))
    Let us call Rule D.

Causality

From v_{k+1} = isub(v_k, v_k), where v_k = isub(v_{k-1}, v_{k-1}), one level of unfolding gives:

v_{k+1}
  = isub(v_k, v_k)
  = isub(isub(v_{k-1}, v_{k-1}), isub(v_{k-1}, v_{k-1}))
  = isub(v_{k-1}, v_{k-1})                    ; Rule B

Result is the earlier expression, v_k. This means that rules can return the earlier value as an equivalent value while optimizing v_{k+1}. This equivalence is represented by a Union value, unless the rewrite subsumes the original or hits the eclass-size limit.

For example:

v252 = isub(v251, v251)
v251 = isub(v250, v250)

--

v252 = isub(v251, v251)
     = isub(isub(v250, v250), isub(v250, v250))
     = isub(v250, v250)                           ; Rule B
     = v251

Rule B/C can make a later repeated isub(v, v) value equivalent to an earlier one.

(rule (simplify (isub ty (ishl ty x z) (ishl ty y z))) (ishl ty (isub ty x y) z))

Rule D matches when both operands can be viewed as ishl expressions with the same shift amount. For example,

v247 = isub(v242, v242)
     = isub(v226, v226)
     = isub(ishl(v225, v214), ishl(v225, v214))
     = ishl(isub(v225, v225), v214)              ; Rule D

B/C add the extra effect that a later value can become equivalent to an earlier repeated isub value:

v248 = isub(v247, v247)
     = isub(isub(v242, v242), isub(v242, v242))
     = isub(v242, v242)                          ; Rule B
     = v247

So later values can see both locally-created forms and forms already associated with earlier values such as v247.

In short:

  1. Rule B/C can rewrite a later isub(v, v) to a form already represented by an earlier value.
  2. Rule D then has more equivalent forms to match against.

Limits for egraph rewriting

The existing limits are local:

Each of the 48 isub(v, v) values is optimized separately. Rule B can rewrite a later one back to the previous isub(v, v) value, so each local rewrite can match forms recorded for earlier values.

Additional off-by-one: REWRITE_LIMIT = 5 is checked before incrementing rewrite_depth(which starts from 0), so the maximum observed depth is 6.

Evidence

Per-top-level Rule D fire counts (first few repeated isub(v, v) values):

top-level fires
v247 6
v248 6
v249 44
v251 138

This growth pattern accumulates to observed 3.3M total fires.

Trace evidence that an earlier optimized value is returned again for later values(trace-log):

1123:  optimizing inst inst132 orig result v247 gave v830
1269:  -> returned from ISLE: v248 -> [v830, v852]
1904:  -> returned from ISLE: v249 -> [v830, v852, v932, v939, v944]

v830 appearing in later return lists shows that an earlier optimized value is returned again as an equivalent value for later inputs.

Open question

Our interpretation is that this is not just a problem with one rewrite rule, but a compile-time risk in how e-graphs(and Cranelift's aegraph) expose accumulated equivalent alternatives during matching.

Does this interpretation sound reasonable, or would you think about this failure mode differently?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 27 2026 at 06:34):

myunbin commented on issue #13204:

Hi @cfallin, first, I apologize for using unfamiliar terminology and vague concepts.

Some of the wording came from terms we use internally in my current project. Since English is not my first language, relying on an LLM for translation also seems to have made parts of the text less precise. I have now tried to make it as concise as possible.

Although I used an LLM while preparing this analysis, it was only for auxiliary tasks such as writing scripts and polishing text. The experiments and root-cause analysis were done by self.

If anything in my interpretation seems wrong, unclear, or incomplete, I would appreciate your feedback.

Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 27 2026 at 06:43):

myunbin commented on issue #13204:

Oh, and regarding the questions:

Again, sorry for the poor word choices.
Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 27 2026 at 06:45):

myunbin edited a comment on issue #13204:

Hi @cfallin, first, I apologize for using unfamiliar terminology and vague concepts.

Some of the wording came from terms we use internally in my current project. Since English is not my first language, relying on an LLM for translation also seems to have made parts of the text less precise. I have now tried to make it as concise as possible.

Although I used an LLM while preparing this analysis, it was only for auxiliary tasks such as writing scripts and polishing text. The experiments and root-cause analysis were done by self.

I've updated the original writeup. If anything still seems wrong or worth discussing, I would appreciate your feedback.

Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2026 at 19:25):

cfallin commented on issue #13204:

I've read through this again and I think I understand the failure mode better now. Thanks for writing up the example.

My very-simplified view is: the recursive x_k+1 = x_k - x_k, rewriting x_k+1 as an alias of x_k instead of 0, while valid, creates very large eclasses so the rules that match on these eclasses (such as the shift-rule) blow up their search time because they evaluate every combination of inputs.

subsume does its job once we introduce the x - x == 0 simplification rule; but separately, this is a new growth dimension for eclasses. If I am reading this right, the chain of subtracts apparently creates an eclass larger than the eclass enode-count limit (ECLASS_ENODE_LIMIT)? Is that what you observed? Naively I would have expected the limit to kick in. I'll poke at this further.

In any case, aside from use of subsume wherever we have a clear "collapsing optimization" that deletes part of the program (e.g. x-x = 0), I wonder if we should implement some sort of fuel throughout the ISLE-generated matching logic. It's always valid not to optimize a given value, so we can quit at any point. That would put a hard bound on this kind of unexpected rule interaction.


Last updated: May 03 2026 at 22:13 UTC