wasmtime / PR #10471 Simplify aegraphs by removing union-... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #10471 Simplify aegraphs by removing union-...

Wasmtime GitHub notifications bot (Mar 26 2025 at 04:54):

cfallin requested fitzgen for a review on PR #10471.

Wasmtime GitHub notifications bot (Mar 26 2025 at 04:54):

cfallin requested wasmtime-compiler-reviewers for a review on PR #10471.

Wasmtime GitHub notifications bot (Mar 26 2025 at 04:54):

cfallin requested wasmtime-core-reviewers for a review on PR #10471.

Wasmtime GitHub notifications bot (Mar 26 2025 at 04:54):

cfallin opened PR #10471 from cfallin:new-aegraphs to bytecodealliance:main:

I was recently re-thinking through some of the core data structure design in our aegraph implementation, and wondered: do we really need the union-find data structure, the notion of "canonical" ID for an eclass separate from its latest ID (root of union-node tree), and the hashcons-key canonicalization using all of this? It's an awful lot of complexity and has led to some fairly subtle bugs (e.g., #6126), and is generally unsatisfying.

I had the realization: the only case where the distinction between canonical and latest ID matters is when we expand an eclass after its initial (eager) rewriting, which happens before we see its uses. If we hypothesize that this happens rarely, then it should be fine to canonicalize based only on latest ID -- we shouldn't lose much (and we can measure this loss empirically).

The chief case where this kind of "late eclass expansion" still happens is: if we have some expression E1 that eventually rewrites to E2 via some simplification, and E2 already exists earlier in the program, then E1 will join the eclass. If we then have some E3 := E1 + 1, and later E4 := E2 + 1, we need the union-find canonicalization for E4 to GVN to E3. Otherwise, the latest ID for the eclass that eventually contains E1 and E2 is different at the time that E3 uses it (and is GVN'd and rewritten) and when E4 does. Put another way: E3 captures a snapshot of its operand's eclass before a new node joins it, and is never reprocessed when that happens, so E3 remains distinct.

But if the E2 -> E1 rewrite is truly "directional" toward a better representation that we will always want to choose -- say, x + 0 -> x, or any constant-propagation in general -- then if the eager rewriting for E2 produces E1's eclass ID directly without adding E2 to its nodes, then all users will still canonicalize as before. This "only return the rewrite target, don't union with it" before is exactly our subsume operator.

Put another way: subsumption prevents growing eclasses later, so snapshots in time remain the latest, and everyone canonicalizes with the same ID. We move to a true immutable data structure, with simple hashconsing with no magic.

The rewrite semantics are then much simpler too: if any value is marked "subsume", we pick it (pick an arbitrary one if multiple) as our rewrite result; otherwise, create an eclass with the original rewrite input and all rewrite outputs (by creating union-nodes in the DFG). No auto-subsume or factoring in availability -- that's it. "Subsume" means: always pick the rewritten value, don't keep the original, and we use it whenever that's unambiguously true for better canonicalization.

Given that, it turns out that we can remove the union-find mechanism entirely. It also turns out that we can unify the scoped-hashmap for "effectful/idempotent ops" with the ordinary hashmap for pure ops; everything can be scoped. To get working LICM we do need to retain our "available block" mechanism, but only to insert values at a higher scope level in the scoped hashmap -- it is now heuristic, not load-bearing for correctness.

I suspect that the fixpoint loop computing analysis results can go away too, now that we truly don't update arguments -- we have a simple immutable data structure with everything snapshotted at creation -- but I haven't made that change yet.

This change appears to be "in the noise" in both runtime and compile time -- a Sightglass run on the default suite shows only
compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 551234.50 ± 514580.62 (confidence = 99%)

  new.so is 1.00x to 1.01x faster than old.so!

  [61669181 72513567.85 98139932] new.so
  [60991071 73064802.35 120044089] old.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  Δ = 232827.80 ± 204621.12 (confidence = 99%)

  old.so is 1.00x to 1.01x faster than new.so!

  [67208140 72812782.32 89996076] new.so
  [69531172 72579954.52 80530142] old.so
which seem like suitably small swings that are fine. Spot-checking the aegraph stats on the same function before-and-after shows the same optimizations happening in all functions I examined, and we see the compile-tests showing no movement except for a value renumbering in one case. So: no effect objectively, but deletes code and significantly simplifies the core algorithm.

Wasmtime GitHub notifications bot (Mar 26 2025 at 04:57):

cfallin updated PR #10471.

Wasmtime GitHub notifications bot (Mar 26 2025 at 15:26):

fitzgen submitted PR review:

Very nice!

Wasmtime GitHub notifications bot (Mar 26 2025 at 15:26):

fitzgen created PR review comment:

If this is used only in unit tests within this crate, then this can be:
    #[cfg(test)]

Wasmtime GitHub notifications bot (Mar 26 2025 at 16:53):

cfallin updated PR #10471.

Wasmtime GitHub notifications bot (Mar 26 2025 at 16:54):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Mar 26 2025 at 16:54):

cfallin created PR review comment:

Ah it turns out a later refactor causes this to actually be used again, so I've just removed the allow(dead_code).

Wasmtime GitHub notifications bot (Mar 26 2025 at 17:40):

cfallin merged PR #10471.

Wasmtime GitHub notifications bot (Apr 14 2025 at 14:51):

alexcrichton commented on PR #10471:

Looking through some open fuzz bugs this morning we've got two timeouts which I think may want to be investigated as a result of this. Typically I just ignore timeouts since we expect those to happen occasionally, but both timeouts show the "regression range" as https://github.com/bytecodealliance/wasmtime/compare/eafe743901f20eab0c4da3ffaa4fac3783e51be8...7e5a7e521ae121e239a8ad9e927653b26e474ac0 which notably has this PR as the main change. Additionally both timeouts were in the egraph passes which we typically don't time out in. Given all that, @cfallin would you be up for digging into these and seeing if the timeout is relevant? I think it's reasonable if the conclusion is "yep that's expected" but I think it'd be good to confirm ideally.

https://oss-fuzz.com/testcase-detail/4806924172591104

https://oss-fuzz.com/testcase-detail/5226667534778368

Wasmtime GitHub notifications bot (Apr 14 2025 at 15:29):

cfallin commented on PR #10471:

@alexcrichton yep I'm happy to take a look!

Last updated: Apr 18 2025 at 19:03 UTC