Stream: git-wasmtime

Topic: wasmtime / issue #13470 GC: grow-vs-collect heuristic (#1...


view this post on Zulip Wasmtime GitHub notifications bot (May 24 2026 at 10:30):

gfx opened issue #13470:

Upgrading Wado from wasmtime 44 to 45 (default drc collector) made our GC-heavy benchmarks ~2x slower. I bisected it to #12942.

workload (drc, -O2) 44 45 ratio
json/canada 112 ms 229 ms 2.04
sqlite_parse 618 ms 1380 ms 2.23
syntax_highlight 971 ms 1962 ms 2.02

Pure-compute workloads (count_prime, mandelbrot, sieve) are unaffected.

The heuristic decides collect-vs-grow purely on live < capacity/2, with no notion of allocation rate. For high-allocation / small-live workloads the live set stays small, so a collection always frees enough and the heap never grows past its initial size — it collects on every heap-fill (GC thrashing). This is, ironically, close to the "lots of temporary garbage" case the PR cites as motivation. Collector::Null runs json/canada in ~15 ms vs ~148 ms for drc, so the cost is collection; gc_heap_reservation (even 1 GiB) has no effect.

I'd suggest reverting it for now: the grow-to-the-limit behavior in 44 was a reasonable default (memory is still bounded by the GC heap size limit), and the new heuristic regresses a common workload class.

Then, before re-landing a heuristic, it might help to add a GC throughput benchmark to wasmtime's CI so this kind of regression is caught up front. Here is a self-contained reproducer (a Wado-compiled wasi:cli/command component that allocates many short-lived GC objects) that could serve as a reference: https://gist.github.com/gfx/133a82db2817c160da6cbc221b0a4329 — on it, wasmtime 44 ≈ 1005 ms vs 45 ≈ 1950 ms.

view this post on Zulip Wasmtime GitHub notifications bot (May 24 2026 at 14:08):

cfallin commented on issue #13470:

Thanks for filing an issue with this data!

I think there is a deeper tradeoff here that's missing in the discussion. tl;dr is that I don't think we should revert. But in more detail:

Hopefully that makes clear why the change was made. Contrary to the framing above it's not a single-dimensional performance metric with a straightforward regression; it's a tradeoff space and we bought a new asymptotic bound.

I do think we could entertain an alternative (non-default) option that grows unconditionally up to the max heap size then starts collecting; that configuration makes more sense when there is only one instance, and the user has memory to burn.

cc @fitzgen for more thoughts as the main owner of GC (and much more of an expert on these things than me!)

view this post on Zulip Wasmtime GitHub notifications bot (May 24 2026 at 15:21):

alexcrichton added the wasm-proposal:gc label to Issue #13470.

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2026 at 00:07):

gfx commented on issue #13470:

Thanks, that explanation makes sense to me.

I understand the motivation for the new default. For high-concurrency or memory-constrained deployments, optimizing RSS relative to the live set seems important, and I can see why the previous grow-until-limit behavior may be undesirable there.

At the same time, I think changing this default is something to be careful about. For latency-sensitive embedders, this can be a clear regression rather than just a different point in the tradeoff space. In our case, the affected paths are roughly 2x slower, which is large enough to be production-impacting.

So my main request would be that this tradeoff should be configurable. The new default may be the right one for some environments, but embedders that have explicitly budgeted memory for an instance should have a supported way to prefer latency/throughput over minimizing RSS.

Longer-term, I also think it would be valuable to have continuous benchmarking around this area that tracks both sides of the tradeoff: RSS/heap growth and throughput/latency on allocation-heavy workloads. That would make changes like this easier to evaluate as intentional tradeoffs rather than surprising regressions after release.

Thanks again for the detailed context.

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2026 at 00:07):

gfx edited a comment on issue #13470:

Thanks, that explanation makes sense to me.

I understand the motivation for the new default. For high-concurrency or memory-constrained deployments, optimizing RSS relative to the live set seems important, and I can see why the previous grow-until-limit behavior may be undesirable there.

At the same time, I think changing this default is something to be careful about. For latency-sensitive embedders, this can be a clear regression rather than just a different point in the tradeoff space. In our case, the affected paths are roughly 2x slower, which is large enough to be production-impacting.

So my main request would be that this tradeoff should be configurable. The new default may be the right one for some environments, but embedders that have explicitly budgeted memory for an instance should have a supported way to prefer latency/throughput over minimizing RSS.

Longer-term, I also think it would be valuable to have continuous benchmarking around this area that tracks both sides of the tradeoff: RSS/heap growth and throughput/latency on allocation-heavy workloads. That would make changes like this easier to evaluate as intentional tradeoffs rather than surprising regressions after release.

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2026 at 02:31):

cfallin commented on issue #13470:

Sure, I think we'd be happy to review a PR to make the heuristic configurable.

Speaking philosophically for a second, re:

regression rather than just a different point in the tradeoff space

I don't want us to get into the space where current performance is "locked in" forever on every single axis. There are projects that operate like that (e.g., V8 on performance matters, from what I understand), but we are still in the space where we are figuring out the best designs and tradeoffs. And this is absolutely a tradeoff space on both sides: for a workload with say 128KiB of real GC live-set size peak, and 128MiB heap, that is a 500x-reduction in memory requirement to have the adaptive grow-vs-collect heuristic (guaranteed 2x-live worst-case bound).

For what it's worth, as well, GC is not yet tier-1, so anyone running it in production today does it "at their own risk"; we haven't yet committed to the kind of stability that might change expectations about "regressions after release". (We might soon, but all of this work comes before that change.) And e.g. we recently changed our default collector away from drc. It's great and valuable that you're running things in production and gaining experience + feeding it back, but just wanted to make sure that was explicitly said.

view this post on Zulip Wasmtime GitHub notifications bot (May 26 2026 at 18:37):

fitzgen commented on issue #13470:

+1 to everything Chris said about trade offs (and pretty much everything else).


@cfallin

I do think we could entertain an alternative (non-default) option that grows unconditionally up to the max heap size then starts collecting; that configuration makes more sense when there is only one instance, and the user has memory to burn.

I think a nice way to do this would be to make the grow-vs-collect ratio's denominator (or the log2 of the denominator) a tunable. Right now the ratio's denominator is 2 (ie the ratio is 1/2 and collect if the previous heap size was less than that, grow otherwise), but you could effectively get the old behavior by changing the denominator to 1 << 31 (ie making the ratio 1/2147483648) which would basically always choose growth instead of collection.

This is a nice way to phrase the problem because it wouldn't actually create any new branches to our existing logic.


@gfx fwiw you should probably experiment with using the copying collector instead of the DRC collector. It actually collects cycles, is much faster, and is now the default collector on main. It is also what we plan on using when enabling Wasm GC by default.

view this post on Zulip Wasmtime GitHub notifications bot (May 27 2026 at 06:21):

cfallin commented on issue #13470:

I think a nice way to do this would be to make the grow-vs-collect ratio's denominator (or the log2 of the denominator) a tunable.

Ah, I really like that! @gfx if you want to send a PR for this I'm happy to review it. Otherwise I can throw it on my to-do list and get to it at some point...

view this post on Zulip Wasmtime GitHub notifications bot (May 28 2026 at 03:55):

gfx commented on issue #13470:

Thanks, that makes sense. I’ll try benchmarking our workloads with the copying collector on main and report back with numbers for both throughput and memory.

I’m also interested in sending a PR for the tunable denominator/log2-denominator approach. The default can remain as-is, while embedders that explicitly want to trade memory for throughput can configure a much smaller collect tendency / old grow-first behavior.

I’ll take a look at where this should be exposed in Wasmtime’s config/tunables.

view this post on Zulip Wasmtime GitHub notifications bot (May 28 2026 at 13:40):

gfx commented on issue #13470:

Following up with the copying-collector numbers, as promised.

Workload: a syntax highlighter written in Wado that allocates lots of short-lived GC objects per run — i.e. exactly the high-allocation / small-live-set pattern this heuristic regresses. Standalone wasi:cli/command component, -O2, 100 iterations/run, on the 45.0.0 CLI with -C collector=…. Best of 10 runs per metric.

collector ms/iter vs drc peak RSS
drc 13.45 1.0× 38.9 MB
copying 2.34 ~5.7× faster 38.2 MB
null (never collects) 1.75 ~7.7× faster 205 MB (unbounded)

copying is ~5.7× faster than drc and nearly matches the null throughput ceiling — it removes almost all of the GC overhead the eager-collect heuristic imposes here — at the same peak RSS as drc. (The null figure confirms this is genuinely high-allocation / small-live-set.)

I see copying is already the default on main, so this is fully resolved from our side — we'll switch our embedding to copying. Thanks for the pointer, @fitzgen.

view this post on Zulip Wasmtime GitHub notifications bot (May 28 2026 at 15:13):

fitzgen commented on issue #13470:

Glad the copying collector works for you.

I’m also interested in sending a PR for the tunable denominator/log2-denominator approach. The default can remain as-is, while embedders that explicitly want to trade memory for throughput can configure a much smaller collect tendency / old grow-first behavior.

I’ll take a look at where this should be exposed in Wasmtime’s config/tunables.

The new tunable would be added somewhere around here:

https://github.com/bytecodealliance/wasmtime/blob/d22c2a01c65fdd839262f5e07259adb64dabce6d/crates/environ/src/tunables.rs#L154-L182

And then exposed as a Config method somewhere around here:

https://github.com/bytecodealliance/wasmtime/blob/d22c2a01c65fdd839262f5e07259adb64dabce6d/crates/wasmtime/src/config.rs#L2034-L2048

In general, if you just grep around for the existing GC heap tunables, you should see all the places you'd need to wire this up.


Last updated: Jun 01 2026 at 09:49 UTC