Stream: git-wasmtime

Topic: wasmtime / issue #11141 binary-trees gc benchmark runs re...


view this post on Zulip Wasmtime GitHub notifications bot (Jun 26 2025 at 15:47):

jrmuizel opened issue #11141:

Test Case

binary-trees-wasi-opt.wasm.zip

Steps to Reproduce

Actual Results

stretch tree of depth 21 check: 4194303
1048576 trees of depth 4 check: 32505856
262144 trees of depth 6 check: 33292288
65536 trees of depth 8 check: 33488896
16384 trees of depth 10 check: 33538048
4096 trees of depth 12 check: 33550336
1024 trees of depth 14 check: 33553408
256 trees of depth 16 check: 33554176
64 trees of depth 18 check: 33554368
16 trees of depth 20 check: 33554416
long lived tree of depth 20 check: 2097151

real    2m13.350s
user    2m12.736s
sys 0m0.454s

Expected Results

It should run much faster.
V8: 1.2s
Firefox: 4.8s
Safari: 5.9s

Extra Info

The benchmark is:

class TreeNode {
    TreeNode left;
    TreeNode right;

    TreeNode bottomUpTree(int depth) {
        if (depth > 0) {
            TreeNode leftChild = this.bottomUpTree(depth - 1);
            TreeNode rightChild = this.bottomUpTree(depth - 1);
            TreeNode node = new TreeNode();
            node.left = leftChild;
            node.right = rightChild;
            return node;
        } else {
            TreeNode node = new TreeNode();
            node.left = null;
            node.right = null;
            return node;
        }
    }

    int itemCheck() {
        if (this.left == null) {
            return 1;
        } else {
            return 1 + this.left.itemCheck() + this.right.itemCheck();
        }
    }
}

void main() {
    int minDepth = 4;
    int n = 20;  // Increase n to make it more interesting

    int maxDepth;
    if (minDepth + 2 > n) {
        maxDepth = minDepth + 2;
    } else {
        maxDepth = n;
    }
    int stretchDepth = maxDepth + 1;

    TreeNode stretchTree = new TreeNode();
    TreeNode temp = stretchTree.bottomUpTree(stretchDepth);
    int check = temp.itemCheck();
    print("stretch tree of depth ");
    print(stretchDepth);
    print(" check: ");
    println(check);

    TreeNode longLivedTree = new TreeNode();
    longLivedTree = longLivedTree.bottomUpTree(maxDepth);

    int depth = minDepth;
    while (depth <= maxDepth) {
        int iterations = 1 << (maxDepth - depth + minDepth);
        check = 0;

        int i = 1;
        while (i <= iterations) {
            TreeNode tree = new TreeNode();
            tree = tree.bottomUpTree(depth);
            check = check + tree.itemCheck();
            i = i + 1;
        }
        print(iterations);
        print(" trees of depth ");
        print(depth);
        print(" check: ");
        println(check);
        depth = depth + 2;
    }

    print("long lived tree of depth ");
    print(maxDepth);
    print(" check: ");
    println(longLivedTree.itemCheck());
}

compiled with https://github.com/jrmuizel/wasm-lang1

There's a browser version here: https://zingy-sorbet-48e9a5.netlify.app/binary-trees-opt.html

view this post on Zulip Wasmtime GitHub notifications bot (Jun 26 2025 at 15:47):

jrmuizel added the bug label to Issue #11141.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 26 2025 at 15:56):

alexcrichton added the wasm-proposal:gc label to Issue #11141.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 26 2025 at 17:41):

fitzgen commented on issue #11141:

Thanks for filing this issue with a test case and STR, @jrmuizel! Very appreciated.

First off, in general, when benchmarking Wasm execution in Wasmtime, we want to make sure that we aren't measuring compile time. Unlike browsers, Wasmtime doesn't start with a baseline compiler or interpreter and then dynamically tier up to an optimized compiler; we just (depending on the configuration) compile with our optimizing compiler immediately and then when the whole module is compiled, we start running it. Wasmtime is really more like an AOT compiler than a browser JIT, even though it can technically compile just-in-time. So given all that, in the future, it would be good to do something like this:

$ wasmtime compile binary-trees-wasi-opt.wasm -o binary-trees-wasi-opt.cwasm
$ time wasmtime run --allow-precompiled binary-trees-wasi-opt.cwasm

(And maybe replace time with hyperfine as necessary.)

Second, just to set expectations, you should know that we have thus far focused only on correctness and haven't put much effort into optimizing the GC implementation yet. Not at the level of the details of codegen nor at the algorithmic level. For example, the default DRC collector uses a fixed size bump chunk in part of its implementation, and whenever that gets full, we trigger a GC. We don't do any doubling of that bump chunk or anything like that based on the GC heap's size, and I suspect this could lead to somewhat pathological behavior. See also https://github.com/bytecodealliance/wasmtime/issues/9351 for a collection of GC optimizations that we haven't investigated yet.

Finally, if the test case doesn't allocate too much, it is also worth investigating whether using the null collector has a different performance profile than the default DRC collector. If the test case is hitting the potential pathological behavior previously described, then using the null collector will avoid that. That's of course not necessarily a solution for users, but is very useful information to include in issues. Basically, this helps us diagnose whether the issue is in our codegen or the collector implementation itself.

Anyways, that's all just some hopefully helpful background and context for you. I will investigate what's going on here in more detail in a little bit.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 26 2025 at 18:32):

fitzgen commented on issue #11141:

The benchmark doesn't complete with the null collector, but the output before it OOMs is coming much quicker, so this is probably related to something in the DRC collector itself.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 26 2025 at 20:18):

jrmuizel commented on issue #11141:

Pre-compiling doesn't seem to help:

$ time wasmtime run --allow-precompiled -W gc binary-trees-wasi-opt.cwasm
stretch tree of depth 21 check: 4194303
1048576 trees of depth 4 check: 32505856
262144 trees of depth 6 check: 33292288
65536 trees of depth 8 check: 33488896
16384 trees of depth 10 check: 33538048
4096 trees of depth 12 check: 33550336
1024 trees of depth 14 check: 33553408
256 trees of depth 16 check: 33554176
64 trees of depth 18 check: 33554368
16 trees of depth 20 check: 33554416
long lived tree of depth 20 check: 2097151

real    2m12.869s
user    2m12.303s
sys 0m0.467s

The test case allocates a lot. It's basically just this: https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-javavm-2.html

The other wasm implementations spend most of their time in GC as well.

Here's a profile of wasmtime: https://share.firefox.dev/4k8jCy0

and the Spidermonkey bug for the slowness compared to Chrome:
https://bugzilla.mozilla.org/show_bug.cgi?id=1971860

view this post on Zulip Wasmtime GitHub notifications bot (Jun 26 2025 at 21:52):

fitzgen commented on issue #11141:

https://github.com/bytecodealliance/wasmtime/pull/11144 fixes the bump chunk issue I was talking about, and cuts the test case's runtime in half for me

view this post on Zulip Wasmtime GitHub notifications bot (Jun 26 2025 at 22:09):

fitzgen commented on issue #11141:

15% of the remaining time is spent cloning Arcs inside MmapBase and MmapOffset just to get the base pointer for a VMMemoryDefinition :face_palm:

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2025 at 17:28):

fitzgen commented on issue #11141:

15% of the remaining time is spent cloning Arcs inside MmapBase and MmapOffset just to get the base pointer for a VMMemoryDefinition :face_palm:

And another 15% of that time is spent dropping those Arcs that we just cloned.

WIP patch for this in https://github.com/bytecodealliance/wasmtime/pull/11148

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2025 at 18:51):

alexcrichton commented on issue #11141:

In the end, @fitzgen this benchmark (or something similar?) might be a good addition to sightglass perhaps

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2025 at 19:01):

fitzgen commented on issue #11141:

In the end, @fitzgen this benchmark (or something similar?) might be a good addition to sightglass perhaps

Yeah I was thinking the same, although we'd need to get its runtime a little lower probably for that to be reasonable.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2025 at 18:50):

fitzgen commented on issue #11141:

See also https://github.com/bytecodealliance/wasmtime/issues/11159

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2025 at 19:53):

fitzgen commented on issue #11141:

See also https://github.com/bytecodealliance/wasmtime/issues/11162

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2025 at 20:05):

fitzgen commented on issue #11141:

See also https://github.com/bytecodealliance/wasmtime/issues/11163

view this post on Zulip Wasmtime GitHub notifications bot (Jun 30 2025 at 20:18):

fitzgen commented on issue #11141:

See also https://github.com/bytecodealliance/wasmtime/issues/11164


Last updated: Dec 06 2025 at 06:05 UTC