wasmtime / issue #12156 Cranelift: improve egraph cost fu... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #12156 Cranelift: improve egraph cost fu...

Wasmtime GitHub notifications bot (Dec 11 2025 at 19:11):

Note: I am forking off a new issue from https://github.com/bytecodealliance/wasmtime/issues/12106. See that issue's discussion for a description of the issue of enode cost saturating to infinity due to extraction not "understanding" shared structure and how this problem is equivalent to weighted-set cover and therefore NP-hard. +cc @cfallin @bongjunj

Chris and I were talking about extraction again yesterday. A couple things came out of it, which I will note down for our future selves and posterity:
We could divide/amortize the cost by the number of users of any given value. ... This maintains the phased aspect

I kind of tried to explain this point earlier but don't think I was very clear so I will try again: The problem with this algorithm is that the "number of users" == "number of incoming edges in the (a)egraph", and for this to be a good algorithm in practice, it must be the case that the number of incoming edges in the (a)egraph approximates the number of incoming edges in the set of actually-extracted expressions in practice. It is not clear to me that property is generally true. In fact, I suspect that the more we do "egraph-y" things like add more canonicalizations and neutral rewrites with the hope that they will unlock further rewrites that are beneficial, then the less this property will hold!
I had the realization that if we topo-sort values before computing best costs, then we will reach a fix point in a single iteration. This has some nice implications for the "If we aren't maintaining phasing" algorithm I sketched above. Bear with me for a second.

Our current algorithm to compute the best enode (specific value) for each eclass (canonical value) is a fixpoint loop that looks roughly like this

rust fn compute_best_values() { let mut keep_going = true; while keep_going { for value in dfg.all_values() { let orig_best_value = best[value]; best[value] = best_value(value, dfg); keep_going |= best[value] != orig_best_value; } } }

It will do n + 1 iterations before returning where n is the length of the longest chain of vNN -> vMM edges in the dataflow graph where NN < MM (and the + 1 is to determine that we did actually reach the fixpoint and nothing else is changing anymore).[^typical] Note that at most n == len(values) so we do a pass over all n values n times, leading to an O(n^2) worst case runtime. But at minimum, in the best case, it will do at least two passes over all n values: once to compute the best values and then another to verify that we reached the fixpoint and nothing else is changing.

Topologically sorting the values based on the DFG is also one pass over all n values. And then we can do one iteration of the inner loop from the previous algorithm, which is also one pass over all n values, and in this case we know that we will have reached a fixedpoint immediately because (a) the DFG is acyclic (we don't look through block params) and (b) we are processing uses before defs. So this gives us an O(n) worst case (and best case) runtime algorithm. Strictly better (in terms of worst case runtime) than what we do today!

```rust
fn compute_best_values() {
let sorted = topo_sort_values(dfg);
compute_best_values_already_sorted(sorted);
}

fn compute_best_values_already_sorted(sorted) {
for value in sorted {
best[value] = best_value(value, dfg);
}
}
```

But, this now plays nicely with incrementally updating costs to reflect certain sub-expressions getting cheaper because we've already extracted them! The following brings us back to O(n^2) but I think should give us a much better extraction algorithm that "understands" shared subexpressions:

```rust
let sorted = topo_sort_values(dfg);

for inst in skeleton {
// Compute the best values. If this is the first iteration of the loop,
// it is equivalent to what we had before. Otherwise, it will update the
// best values based the adjusted cost of values we have already
// extracted.
compute_best_values_already_sorted(sorted);
// Elaborate this instruction, extracting the best values for its
// operands and inserting their defining instructions back into the
// CFG's basic blocks.
elaborate_inst(inst);

// Update the cost for each value we extracted. It is ~free (modulo
// register pressure) to reuse this value, since we have already computed
// it and it is now available. The next iteration of the outer loop will
// recompute best values to reflect these updated costs.
for value in transitive_values_used_by(inst) {
    // Or could do something very low but non-zero, like `1`, to reflect
    // the cost of additional register pressure. Worth experimenting
    // with, but is also somewhat orthogonal...
    let new_cost = 0;
    set_cost(value, new_cost);
}
}
```

It could be that this is fast enough to just be what we do by default. If not, it might be something we only use when configured to higher optimization levels.
[^typical]: Typically n is fairly small; it is affected by how much we created new instructions and spliced their definitions into the middle of the DFG after the CLIF-producer gave us the initial CLIF. That is something we generally don't do a ton of, except in our legalization pass and NaN canonicalization and neither of them do it too much.

Wasmtime GitHub notifications bot (Dec 11 2025 at 19:11):

fitzgen added the cranelift label to Issue #12156.

Wasmtime GitHub notifications bot (Dec 11 2025 at 19:11):

fitzgen added the cranelift:goal:optimize-speed label to Issue #12156.

Wasmtime GitHub notifications bot (Dec 18 2025 at 23:37):

fitzgen commented on issue #12156:

Update: I implemented the topo sorting without incremental updates in #12177, which didn't have any statistically significant affect on compilation or execution and has since merged. Then I made two branches implementing the two main approaches we've been discussing:

**Incremental update and recompute:** In this branch, after we elaborate a value, we update its cost to zero and then recompute all values' costs again to reflect that values we have already elaborated are already available and therefore "free" to reuse.

This allows the cost function to understand shared substructure across operands to side-effectful skeleton instructions. It does not understand shared substructure within a single operand to a side-effectful skeleton instruction.
**Maintain the set of instructions required to compute a value:** In this branch, we track not just the best cost of each value, but the set of instructions that went into computing that best value. When computing new best costs based on operands, we only add the cost of instructions that aren't already present in the value's set.

This allows the cost function to understand shared substructure both across and within operands to side-effectful skeleton instructions (and within any expression or subexpression).

Illustrative CLIF Example

I've been using the following CLIF test case to see the effect of different cost functions:

function %f(i64) {
    fn0 = colocated %fn0(i64)

    block0(v0: i64):
        v1 = iadd v0, v0
        v2 = iadd v1, v1
        v3 = iadd v2, v2
        v4 = iadd v3, v3
        v5 = iadd v4, v4
        v6 = iadd v5, v5
        v7 = iadd v6, v6
        v8 = iadd v7, v7
        v9 = iadd v8, v8
        v10 = iadd v9, v9
        v11 = iadd v10, v10
        v12 = iadd v11, v11
        v13 = iadd v12, v12
        v14 = iadd v13, v13
        v15 = iadd v14, v14
        v16 = iadd v15, v15
        v17 = iadd v16, v16
        v18 = iadd v17, v17
        v19 = iadd v18, v18
        v20 = iadd v19, v19
        v21 = iadd v20, v20
        v22 = iadd v21, v21
        v23 = iadd v22, v22
        v24 = iadd v23, v23
        v25 = iadd v24, v24
        v26 = iadd v25, v25
        v27 = iadd v26, v26
        v28 = iadd v27, v27
        v29 = iadd v28, v28
        v30 = iadd v29, v29
        v31 = iadd v30, v30
        v32 = iadd v31, v31
        v33 = iadd v32, v32

        ;; With the old cost function, we should have `cost(v33) == infinite` at
        ;; this point. Of course, we *should* replace `v35`'s definition with
        ;; `ishl v33, 1`. But the (old) cost function cannot tell us to do that
        ;; because `x + infinity == y + infinity`, even when `x < y`. So it must
        ;; arbitrarily choose one of the two. In practice, it happens to choose
        ;; the `ishl` (annoyingly for those us who are trying to show the
        ;; benefits of a new cost function).

        v34 = iconst.i64 2
        v35 = imul v33, v34
        call fn0(v35)

        ;; And then this undoes the multiply, and we *should* see through that
        ;; via `x * 2 => x + x` and `(x + y) - y => x` and ultimately recognize
        ;; that `v36 == v33`.
        ;;
        ;; The old cost function fails to do this, however. This is because it
        ;; considers `x * 2 - x` and `x` to have equal cost when `cost(x) ==
        ;; infinity`.
        ;;
        ;; Both the proposed incremental-update and inst-set cost functions do
        ;; manage to simplify `x * 2 - x` to `x` in this case because they do
        ;; not do _not_ have `cost(v35) = infinity` at the time that we are
        ;; elaborating this second call's operands.
        ;;
        ;; However, if we comment out the first `call fn0(v35)`, then the
        ;; incremental-update cost function does no better than the old cost
        ;; function, and will lead us to extracting `(x << 2) - x` instead of
        ;; simply `x`, because it cannot "see" shared structure *within* a
        ;; side-effectful skeleton's operand, only *across* side-effectful
        ;; skeleton operands (because we only zero-and-update after elaborating
        ;; each side-effectful skeleton operand).
        ;;
        ;; The inst-set cost function, which does "see" shared structure
        ;; *within* a side-effectful skeleton instruction's operand, will still
        ;; report that `cost(x) < cost(x * 2 - 1)` in this case.

        v36 = isub v35, v33
        call fn0(v36)

        return
}

Here is the post-optimization CLIF for the current (old) cost function:

    ...
    v33 = iadd v32, v32
    v38 = iconst.i64 1
    v39 = ishl v33, v38  ; v38 = 1
    call fn0(v39)
    v36 = isub v39, v33
    call fn0(v36)
    return

Here it is with the incremental-update and inst-set cost function:

    ...
    v33 = iadd v32, v32
    v38 = iconst.i64 1
    v39 = ishl v33, v38  ; v38 = 1
    call fn0(v39)
    call fn0(v33)
    return

And if we comment out the first call, here is the CLIF for both the old cost function and the incremental-update cost function:

    ...
    v33 = iadd v32, v32
    v38 = iconst.i64 1
    v39 = ishl v33, v38  ; v38 = 1
    v36 = isub v39, v33
    call fn0(v36)
    return

While the inst-set cost function can still see through x * 2 - x:

    ...
    v33 = iadd v32, v32
    call fn0(v33)
    return

Will post some Sightglass benchmark results in a little bit.

Wasmtime GitHub notifications bot (Dec 19 2025 at 01:13):

fitzgen commented on issue #12156:

Initial Sightglass Results

I benchmarked compilation and execution times for

bz2
spidermonkey
pulldown-cmark
blake3-scalar
blake3-simd
shootout-keccak
shootout-switch

The raw results are below.

Given that our benchmarks' execution isn't affected too much by any of these but is generally leaning positive towards the inst-set cost function, we (surprisingly) get a compilation speedup with it, and that it accounts for the most substructure sharing in theory out the three options, I'm going to continue pursuing that approach.

Compilation

`incr-cost-update.so` vs. `inst-set.so`

compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 56736.35 ± 28906.20 (confidence = 95%)

  inst-set.so is 1.00x to 1.01x faster than incr-cost-update.so!

  [14348083 14424211.45 14508124] incr-cost-update.so
  [14285793 14367475.10 14484470] inst-set.so

compilation :: instructions-retired :: benchmarks/shootout/shootout-switch.wasm

  Δ = 7695.45 ± 5606.84 (confidence = 95%)

  inst-set.so is 1.00x to 1.01x faster than incr-cost-update.so!

  [2100829 2115908.85 2129910] incr-cost-update.so
  [2094815 2108213.40 2120720] inst-set.so

compilation :: instructions-retired :: benchmarks/blake3-scalar/benchmark.wasm

  Δ = 23876.40 ± 18942.83 (confidence = 95%)

  inst-set.so is 1.00x to 1.01x faster than incr-cost-update.so!

  [8074884 8159311.95 8217976] incr-cost-update.so
  [8068552 8135435.55 8181995] inst-set.so

compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  Δ = 9228.95 ± 4846.06 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than incr-cost-update.so!

  [3544513 3560461.75 3578035] incr-cost-update.so
  [3538490 3551232.80 3566901] inst-set.so

compilation :: instructions-retired :: benchmarks/blake3-simd/benchmark.wasm

  Δ = 3458.40 ± 2365.48 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than incr-cost-update.so!

  [1830921 1836239.40 1840748] incr-cost-update.so
  [1829117 1832781.00 1848816] inst-set.so

compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 808809.85 ± 280559.15 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than incr-cost-update.so!

  [492921665 494078167.90 494535116] incr-cost-update.so
  [492325966 493269358.05 493749431] inst-set.so

compilation :: instructions-retired :: benchmarks/shootout/shootout-keccak.wasm

  Δ = 615.45 ± 381.06 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than incr-cost-update.so!

  [680111 681389.50 683010] incr-cost-update.so
  [679956 680774.05 681474] inst-set.so

</details>

`main.so` vs. `inst-set.so`

compilation :: instructions-retired :: benchmarks/shootout/shootout-switch.wasm

  Δ = 7484.40 ± 5754.15 (confidence = 95%)

  inst-set.so is 1.00x to 1.01x faster than main.so!

  [2094815 2108213.40 2120720] inst-set.so
  [2097021 2115697.80 2133060] main.so

compilation :: instructions-retired :: benchmarks/blake3-scalar/benchmark.wasm

  Δ = 22612.80 ± 17853.16 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than main.so!

  [8068552 8135435.55 8181995] inst-set.so
  [8113305 8158048.35 8218755] main.so

compilation :: instructions-retired :: benchmarks/blake3-simd/benchmark.wasm

  Δ = 4569.35 ± 2860.23 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than main.so!

  [1829117 1832781.00 1848816] inst-set.so
  [1831835 1837350.35 1853297] main.so

compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 34711.95 ± 30446.18 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than main.so!

  [14285793 14367475.10 14484470] inst-set.so
  [14333253 14402187.05 14508930] main.so

compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 872967.45 ± 245452.83 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than main.so!

  [492325966 493269358.05 493749431] inst-set.so
  [493441547 494142325.50 494562345] main.so

compilation :: instructions-retired :: benchmarks/shootout/shootout-keccak.wasm

  Δ = 762.35 ± 366.77 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than main.so!

  [679956 680774.05 681474] inst-set.so
  [680707 681536.40 683187] main.so

compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [3538490 3551232.80 3566901] inst-set.so
  [3533856 3556067.10 3583550] main.so

</details>

`main.so` vs. `incr-cost-update.so`

compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [14348083 14424211.45 14508124] incr-cost-update.so
  [14333253 14402187.05 14508930] main.so

compilation :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [3544513 3560461.75 3578035] incr-cost-update.so
  [3533856 3556067.10 3583550] main.so

compilation :: instructions-retired :: benchmarks/blake3-simd/benchmark.wasm

  No difference in performance.

  [1830921 1836239.40 1840748] incr-cost-update.so
  [1831835 1837350.35 1853297] main.so

compilation :: instructions-retired :: benchmarks/shootout/shootout-keccak.wasm

  No difference in performance.

  [680111 681389.50 683010] incr-cost-update.so
  [680707 681536.40 683187] main.so

compilation :: instructions-retired :: benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [8074884 8159311.95 8217976] incr-cost-update.so
  [8113305 8158048.35 8218755] main.so

compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [492921665 494078167.90 494535116] incr-cost-update.so
  [493441547 494142325.50 494562345] main.so

compilation :: instructions-retired :: benchmarks/shootout/shootout-switch.wasm

  No difference in performance.

  [2100829 2115908.85 2129910] incr-cost-update.so
  [2097021 2115697.80 2133060] main.so

</details>

Execution

`incr-cost-update.so` vs. `inst-set.so`

execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 93581.75 ± 43.16 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than incr-cost-update.so!

  [20865519 20865542.75 20865749] incr-cost-update.so
  [20771935 20771961.00 20772154] inst-set.so

execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  Δ = 563843.20 ± 7.84 (confidence = 95%)

  incr-cost-update.so is 1.00x to 1.00x faster than inst-set.so!

  [221233136 221233145.30 221233181] incr-cost-update.so
  [221796977 221796988.50 221797031] inst-set.so

execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 2851242.00 ± 41511.20 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than incr-cost-update.so!

  [2112679982 2112745727.45 2112830546] incr-cost-update.so
  [2109813917 2109894485.45 2110054897] inst-set.so

execution :: instructions-retired :: benchmarks/blake3-simd/benchmark.wasm

  Δ = 63.10 ± 0.30 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than incr-cost-update.so!

  [1432847 1432847.35 1432848] incr-cost-update.so
  [1432784 1432784.25 1432785] inst-set.so

execution :: instructions-retired :: benchmarks/blake3-scalar/benchmark.wasm

  Δ = 12.05 ± 0.25 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than incr-cost-update.so!

  [1482967 1482967.20 1482968] incr-cost-update.so
  [1482955 1482955.15 1482956] inst-set.so

execution :: instructions-retired :: benchmarks/shootout/shootout-keccak.wasm

  No difference in performance.

  [77020130 77020136.05 77020159] incr-cost-update.so
  [77020130 77020133.90 77020151] inst-set.so

execution :: instructions-retired :: benchmarks/shootout/shootout-switch.wasm

  No difference in performance.

  [162345998 162346008.80 162346034] incr-cost-update.so
  [162345996 162346009.10 162346036] inst-set.so

</details>

`main.so` vs. `inst-set.so`

execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm

  Δ = 524604.40 ± 8.60 (confidence = 95%)

  main.so is 1.00x to 1.00x faster than inst-set.so!

  [221796977 221796988.50 221797031] inst-set.so
  [221272372 221272384.10 221272438] main.so

execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 1875739.10 ± 40441.13 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than main.so!

  [2109813917 2109894485.45 2110054897] inst-set.so
  [2111698119 2111770224.55 2111838249] main.so

execution :: instructions-retired :: benchmarks/blake3-simd/benchmark.wasm

  Δ = 65.10 ± 0.30 (confidence = 95%)

  inst-set.so is 1.00x to 1.00x faster than main.so!

  [1432784 1432784.25 1432785] inst-set.so
  [1432849 1432849.35 1432850] main.so

execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 711.95 ± 33.91 (confidence = 95%)

  main.so is 1.00x to 1.00x faster than inst-set.so!

  [20771935 20771961.00 20772154] inst-set.so
  [20771239 20771249.05 20771390] main.so

execution :: instructions-retired :: benchmarks/shootout/shootout-switch.wasm

  No difference in performance.

  [162345996 162346009.10 162346036] inst-set.so
  [162345994 162346015.55 162346081] main.so

execution :: instructions-retired :: benchmarks/shootout/shootout-keccak.wasm

  No difference in performance.

  [77020130 77020133.90 77020151] inst-set.so
  [77020130 77020134.90 77020160] main.so

execution :: instructions-retired :: benchmarks/blake3-scalar/benchmark.wasm

  No difference in performance.

  [1482955 1482955.15 1482956] inst-set.so
  [1482955 1482955.15 1482956] main.so

</details>

`main.so` vs. `incr-cost-update.so`

execution :: instructions-retired :: benchmarks/pulldown-c
[message truncated]

Wasmtime GitHub notifications bot (Dec 22 2025 at 06:20):

bongjunj commented on issue #12156:

The idea here is very interesting and the compilation speedup looks cool.

We need to check if the problem of cost saturation went away (https://github.com/bytecodealliance/wasmtime/issues/12106#issuecomment-3615043494).
If this can prevent the cost from saturating to infinity in the benchmark, we can bring back the removed canonicalizations.

I will look at some results with this strategy.

Wasmtime GitHub notifications bot (Dec 28 2025 at 08:02):

bongjunj commented on issue #12156:

The infinity cost problem went away in the inst-set version.

shootout-keccak
// skipped the first part
[2025-12-28T06:19:25Z TRACE cranelift_codegen::egraph::elaborate]  -> cost of value v6922 = Cost::Finite { op_cost: 13476, depth: 0 }
[2025-12-28T06:19:25Z TRACE cranelift_codegen::egraph::elaborate]  -> cost of value v6923 = Cost::Finite { op_cost: 13555, depth: 0 }
[2025-12-28T06:19:25Z TRACE cranelift_codegen::egraph::elaborate]  -> cost of value v6930 = Cost::Finite { op_cost: 13351, depth: 0 }
[2025-12-28T06:19:25Z TRACE cranelift_codegen::egraph::elaborate]  -> cost of value v8131 = Cost::Finite { op_cost: 13350, depth: 0 }
[2025-12-28T06:19:25Z TRACE cranelift_codegen::egraph::elaborate]  -> best of union(BestEntry(ImmCostMap { cost: Cost::Finite { op_cost: 13351, depth: 0 }, insts: {inst21, ... , inst5388} }, v8131)
[2025-12-28T06:19:25Z TRACE cranelift_codegen::egraph::elaborate]  -> cost of value v6931 = Cost::Finite { op_cost: 13470, depth: 0 }
[2025-12-28T06:19:25Z TRACE cranelift_codegen::egraph::elaborate]  -> cost of value v6932 = Cost::Finite { op_cost: 13548, depth: 0 }
When I brought back the DeMorgan rules which caused the performance regression due to the poor judgement caused by the cost saturation, the cost saturation disappeared.

All that remains is to check the perf regression from the no-opt version has been resolved with this.

Benchmark

bench inst-set v-base Speedup

blake3-scalar 316,666 319,718 0.96%

blake3-simd 307,564 316,571 2.93%

bz2 86,561,625 87,697,336 1.31%

pulldown-cmark 6,594,664 6,850,795 3.88%

regex 210,685,003 210,205,862 -0.23%

shootout-ackermann 7,759,800 8,549,359 10.17%

shootout-base64 351,116,296 382,052,837 8.81%

shootout-ctype 796,474,233 831,090,155 4.35%

shootout-ed25519 9,415,587,644 9,595,743,123 1.91%

shootout-fib2 3,012,767,106 3,015,971,015 0.11%

shootout-gimli 5,404,306 5,334,580 -1.29%

shootout-heapsort 2,378,935,527 2,378,745,072 -0.01%

shootout-keccak 21,137,129 25,168,432 19.07%

shootout-matrix 545,135,532 541,156,378 -0.73%

shootout-memmove 36,621,534 38,439,632 4.96%

shootout-minicsv 1,294,436,335 1,498,544,688 15.77%

shootout-nestedloop 673 956 41.94%

shootout-random 439,752,867 630,876,317 43.46%

shootout-ratelimit 39,836,311 39,279,479 -1.40%

shootout-seqhash 8,589,789,109 8,838,928,513 2.90%

shootout-sieve 841,599,406 906,693,706 7.73%

shootout-switch 153,847,745 139,731,817 -9.18%

shootout-xblabla20 2,918,569 3,015,435 3.32%

shootout-xchacha20 4,478,767 4,398,827 -1.78%

spidermonkey 642,585,002 639,089,855 -0.54%

shoout-keccak tells us that the performance regression induced by the cost saturation went away.
I think we can bring back the rules.

bench	inst-set	v-base	Speedup
blake3-scalar	316,666	319,718	0.96%
blake3-simd	307,564	316,571	2.93%
bz2	86,561,625	87,697,336	1.31%
pulldown-cmark	6,594,664	6,850,795	3.88%
regex	210,685,003	210,205,862	-0.23%
shootout-ackermann	7,759,800	8,549,359	10.17%
shootout-base64	351,116,296	382,052,837	8.81%
shootout-ctype	796,474,233	831,090,155	4.35%
shootout-ed25519	9,415,587,644	9,595,743,123	1.91%
shootout-fib2	3,012,767,106	3,015,971,015	0.11%
shootout-gimli	5,404,306	5,334,580	-1.29%
shootout-heapsort	2,378,935,527	2,378,745,072	-0.01%
shootout-keccak	21,137,129	25,168,432	19.07%
shootout-matrix	545,135,532	541,156,378	-0.73%
shootout-memmove	36,621,534	38,439,632	4.96%
shootout-minicsv	1,294,436,335	1,498,544,688	15.77%
shootout-nestedloop	673	956	41.94%
shootout-random	439,752,867	630,876,317	43.46%
shootout-ratelimit	39,836,311	39,279,479	-1.40%
shootout-seqhash	8,589,789,109	8,838,928,513	2.90%
shootout-sieve	841,599,406	906,693,706	7.73%
shootout-switch	153,847,745	139,731,817	-9.18%
shootout-xblabla20	2,918,569	3,015,435	3.32%
shootout-xchacha20	4,478,767	4,398,827	-1.78%
spidermonkey	642,585,002	639,089,855	-0.54%

Wasmtime GitHub notifications bot (Dec 29 2025 at 18:41):

fitzgen commented on issue #12156:

Cleaned up the inst-set branch and put up a PR: https://github.com/bytecodealliance/wasmtime/pull/12230

shoout-keccak tells us that the performance regression induced by the cost saturation went away.
I think we can bring back the rules.

Fantastic! Thanks for looking into that. Would you like to make the PR bringing the rules back?

Wasmtime GitHub notifications bot (Dec 30 2025 at 05:59):

bongjunj commented on issue #12156:

Sure thing! Thanks for the PR.

Wasmtime GitHub notifications bot (Dec 30 2025 at 07:58):

bongjunj edited a comment on issue #12156:

Sure thing! Thanks for the PR.
Will bring back the rules if your PR gets merged.

Last updated: Feb 24 2026 at 07:22 UTC

Stream: git-wasmtime

Topic: wasmtime / issue #12156 Cranelift: improve egraph cost fu...

Wasmtime GitHub notifications bot (Dec 11 2025 at 19:11):

Wasmtime GitHub notifications bot (Dec 11 2025 at 19:11):

Wasmtime GitHub notifications bot (Dec 11 2025 at 19:11):

Wasmtime GitHub notifications bot (Dec 18 2025 at 23:37):

Illustrative CLIF Example

Wasmtime GitHub notifications bot (Dec 19 2025 at 01:13):

Initial Sightglass Results

Compilation

incr-cost-update.so vs. inst-set.so

main.so vs. inst-set.so

main.so vs. incr-cost-update.so

Execution

incr-cost-update.so vs. inst-set.so

main.so vs. inst-set.so

main.so vs. incr-cost-update.so

Wasmtime GitHub notifications bot (Dec 22 2025 at 06:20):

Wasmtime GitHub notifications bot (Dec 28 2025 at 08:02):

shootout-keccak

Benchmark

Wasmtime GitHub notifications bot (Dec 29 2025 at 18:41):

Wasmtime GitHub notifications bot (Dec 30 2025 at 05:59):

Wasmtime GitHub notifications bot (Dec 30 2025 at 07:58):

`incr-cost-update.so` vs. `inst-set.so`

`main.so` vs. `inst-set.so`

`main.so` vs. `incr-cost-update.so`

`incr-cost-update.so` vs. `inst-set.so`

`main.so` vs. `inst-set.so`

`main.so` vs. `incr-cost-update.so`

`shootout-keccak`