Stream: wasmtime

Topic: How to deal with "GC heap out of memory" errors?


view this post on Zulip Alex (Nov 18 2024 at 13:43):

I'm experimenting with Kotlin as a Wasm guest. It requires Wasm GC proposal to be implemented by the runtime. The only version that works currently is the prerelease one so I apologize if this is an issue related to the code that is WIP.

I'm embedding Wasmtime in Go (I've updated Wasmtime version in wasmtime-go manually) and running a benchmark that calls a simple guest function multiple times:

@OptIn(UnsafeWasmMemoryApi::class)
@WasmExport("test")
fun memSimple4IntsVoid(callId: ULong, dataSize: Int) {
    withScopedMemoryAllocator { allocator ->
            val ptr = allocator.allocate(dataSize)
            val result = requestHostToStoreArguments(callId, ptr.address)
           // ... error handling

            val x = ptr.loadInt()
            println("Received $x")
        }
}

After several successful calls of this function from the host, GC heap out of memory error happens which I assume is a Wasmtime error. Kotlin's withScopedMemoryAllocator is supposed to free all the memory allocated, instance.GetExport(store, "memory").Memory().Size(store) shows that at the time when the error happens there is still just 1 page allocated so memory doesn't grow during the test.

Memory section of the guest looks like this:

Memory[1]:
 - memory[0] pages: initial=1

Max memory is not limited in the module, nor do I use Wasmtime Limitter.

I'm not sure where exactly the error happens, I've attempted to catch(e: Throwable) in the guest and print it to stdout but it doesn't catch anything.

Is this a Kotlin issue, Wasmtime issue or this is some sort of bug/misconfiguration on my side? Thank you!

view this post on Zulip Alex (Nov 18 2024 at 16:00):

Ok, I see now that Wasm GC memory is supposed to be separate from linear memory according to the proposal. Is its size configurable?

view this post on Zulip Lann Martin (Nov 18 2024 at 16:05):

You might have to wait for @fitzgen (he/him) to get back from vacation to get much help on GC; there aren't many of us that understand Wasmtime GC yet as (afaik) it is still a work in progress.

view this post on Zulip Alex Crichton (Nov 18 2024 at 16:26):

Yes wasm gc memory is separate from linear memory. What'll need to happen here is either:

What you're running into is most likely an instance of the first here where a GC should be performed but it's not being performed automatically. (Nick would know more). The second point here isn't currently possible from Go I believe as the GC operation isn't exposed in the C API which is what wasmtime-go builds on.

view this post on Zulip Alex (Nov 18 2024 at 17:19):

Ok, got it, thank you!

view this post on Zulip Zalim Bashorov (Kotlin_, JetBrains) (Nov 19 2024 at 20:12):

GC heap out of memory

Looks like an issue on wasmtime side -- it seems like you are out of heap (GCed) memory.

Kotlin's withScopedMemoryAllocator uses Wasm linear memory, which is separate from the GC heap in terms of wasm.

I've attempted to catch(e: Throwable) in the guest and print it to stdout but it doesn't catch anything.

It's not an error from Kotlin, so you can't catch it.
Additionally, wasmtime doesn't support Exception Handling proposal yet, which is used for exceptions in Kotlin, so if you try to throw an exception from Kotlin code you will get a trap at that point at runtime, which basically means that execution of wasm module will be stopped.

view this post on Zulip Alex (Nov 20 2024 at 12:55):

Thanks, I'll retest when Wasm GC is officially released in Wasmtime.

view this post on Zulip CryZe (Nov 30 2024 at 10:07):

I'm also seeing these errors in the released wasmtime 27, so this is not resolved yet it seems. I'm running a small function about 120 times per second that creates like 3 small GC objects each time (not retained in any way). After about 2:15 minutes wasmtime runs out of memory. (To clarify, this is not the null GC and there are no cycles)
image.png

view this post on Zulip CryZe (Dec 01 2024 at 14:52):

So what seems to happen is that the GC heap runs out of memory, but before erroring out, it tries to collect all the garbage and tries allocating again. So it seems like the problem is that the GC (DRC) does not find all garbage properly (there's no cycles, so this should not happen).

view this post on Zulip fitzgen (he/him) (Dec 04 2024 at 17:19):

hey -- just got back from vacation, still working on catching up with my inbox and everything that's happened while I was off

view this post on Zulip fitzgen (he/him) (Dec 04 2024 at 17:20):

CryZe said:

So what seems to happen is that the GC heap runs out of memory, but before erroring out, it tries to collect all the garbage and tries allocating again. So it seems like the problem is that the GC (DRC) does not find all garbage properly (there's no cycles, so this should not happen).

FYI: the DRC collector still only does ref-counting operations shallowly (i.e. it will not transitively decrement reference counts) which means that even most acyclic garbage will lead to leaks (currently; this is not fundamental and not intended long term, this is all still just a very WIP implementation)

view this post on Zulip CryZe (Dec 04 2024 at 18:14):

I'm not sure that is necessarily it, because afair these objects were all shallow to begin with. But I guess we can check again after that's addressed.

view this post on Zulip fitzgen (he/him) (Dec 04 2024 at 18:32):

CryZe said:

I'm not sure that is necessarily it, because afair these objects were all shallow to begin with. But I guess we can check again after that's addressed.

if you can construct a small .wat that shows memory exhaustion under the DRC collector despite not needing transitive decrementing, that would be super useful

view this post on Zulip CryZe (Dec 08 2024 at 21:04):

I've reduced it to a minimal WAT file now... and indeed shallow objects are all you need. In fact, I'm not even sure it's collecting anything at all considering I can't even really make the reproduction much smaller than that: https://gist.github.com/CryZe/951bc9fa0dc265adbac3d883484e43a0

After around ~16500 calls to update it runs out of memory, which weirdly enough has consistently been exactly this number throughout the last 1.5 weeks of me using wasmtime with GC, almost regardless of how much I allocate on the GC heap. Could it be that there's some general per call garbage left behind that is independent of the actual struct.new that I do?

Wasmtime GC Heap Out Of Memory Reproduction. GitHub Gist: instantly share code, notes, and snippets.

view this post on Zulip CryZe (Dec 08 2024 at 21:09):

And yes it's the DRC GC as I'm also getting panics like these occasionally:
image.png

view this post on Zulip CryZe (Dec 08 2024 at 21:35):

Actually if the GC heap is 512 KiB then the ~16500 is probably actually 16384, which would mean each call to update leaks exactly 32 bytes of memory.

view this post on Zulip CryZe (Dec 08 2024 at 21:47):

I've just now tried different sizes and it seems like it always allocates in multiples of 32 bytes... and leaks them entirely. (The 32 bytes rounding also would explain why different types always all errored out after the same amount of update calls)

view this post on Zulip fitzgen (he/him) (Dec 10 2024 at 14:06):

Thanks for the reduction! these things help a ton

fwiw, it is probably 16383 since the first slot is always empty so that all GC refs are "non-null"

fyi, I probably won't get to this for a couple weeks, maybe not until the new year, since I need to write up Wasmtime's end-of-year recap blog post and all that kind of stuff

view this post on Zulip fitzgen (he/him) (Dec 10 2024 at 14:08):

and yes, we have minimum alignment and block size for all GC objects in the DRC: https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasmtime/src/runtime/vm/gc/enabled/free_list.rs#L15-L21

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

Last updated: Dec 23 2024 at 14:03 UTC