fitzgen opened issue #11159:
We use a very simple, first-fit free list for allocation in the DRC collector right now:
We should use something better, that has size classes and good hueristics and all that.
But also, we would ideally not write and maintain what is essentially our own
mallocimplementation ourselves, since getting all the details right and tuning them correctly for a variety of workloads is a lot of effort. The problem is that our malloc is outside the memory space that it is divvying up and parceling out: it is not inside the GC heap using and returning pointers within the GC heap, it is in the VM on the outside and is returning indices into the sandboxed region that makes up the GC heap, rather than pointers.But... why not just compile
dlmalloc(or something) to Wasm, havedlmallocuse the GC heap as its linear memory, and make GC allocations calldlmalloc's allocation and deallocation routines? That would give us a battle-hardened allocator that is most certainly faster than our free list pretty much For Free and with trivial long-term maintenance burden. Compilingdlmallocto Wasm, with the right flags and exported symbols, results in a ~5-6 KiB binary that doesn't use any Wasm globals orfuncreftables and uses only ~500 bytes of static data. It seems very doable.This will rely on a lot of the same kind of infrastructure we want to have for compile-time builtins.
Random kinks to work out:
- Do we compile
dlmallocinto every GC-type-using module, similar to trampolines? (Including modules that just useexternrefand not full Wasm GC.)
- This means there would be some duplicate copies of the code when you load multiple GC-type-using modules in the system, which is fine from a correctness POV, but is undesirable from a disk space usage POV, and a memory usage POV if you're on a system without virtual memory.
- Or do we try and have only a single copy of
dlmallocin a system that is running multiple modules?
- We could compile
dlmalloc-gc-heap.wasmon demand the first time a gc-type-using module is compiled, but that requires that wasmtime was built with a compiler itself.- We could compile
dlmalloc-gc-heap.wasmat Wasmtime build time andinclude_bytes!(...)it into the Wasmtime binary, but that means additional, bootstrap-y, cyclic dependencies where you need a wasmtime to build wasmtime.- We could require that configuring Wasmtime to use the DRC collector means you have to give a
wasmtime::Drcobject to thewasmtime::Config::collectormethod, and you can eitherwasmtime::Drc::new()when wasmtime is built with a compiler orwasmtime::Drc::from_serialized(...)otherwise. However, this raises questions aboutimpl Default for ConfigsinceConfigcontains the configured collector, which by default is the DRC collector. Does it no longer implDefaultwhen there is no compiler? Do we change the default collector when there is no compiler? Do we just remove thatDefaultimpl?
fitzgen added the performance label to Issue #11159.
fitzgen added the wasm-proposal:gc label to Issue #11159.
alexcrichton commented on issue #11159:
This will rely on a lot of the same kind of infrastructure we want to have for https://github.com/bytecodealliance/rfcs/pull/43.
I was thinking about this issue over the weekend I'm actually no longer certain that this is true. I believe that this can be implemented entirely independently of host builtins via a scheme such as:
- Add a new API requirement for Wasmtime that there's something along the lines of
Engine::prepare_gc_implementationwhich compilesdlmalloc.wasmor deserializes it or similar. Basically an intentional step executed once at startup, after you have anEngine, which enables use of the GC. Trying to do anything GC-related before this is called just fails at runtime saying "go call that method"- The
GcRuntimeimplementations would have aModulewhich corresponds todlmalloc.wasm. Thenew_gc_heapmethod would take a&mut StoreOpaquewhich could be used to instantiate theModuleinto the store, allocating the GC heap and such.- The
GcHeaptrait would either stay as-is or optionally change to returningTypedFunc<..., ...>for various accessors (e.g. alloc would return aTypedFunc, maybe gc-ing as well?- The
GcHeapwould store variousVM*data structures in its "gc-specific data area" which is accessible from wasm, and theGcCompilerwould use all of these.Basically I don't think that any new trampolines are necessary, nor editing the compilation of any module. I think we could get away with this entirely with a
Modulestored in each GC implementation. Over time I think implementations like the currentgc-or-maybe-grow logic would go entirely inGcHeapimplementations to allow customizing that per-GC as necessary.
Last updated: Dec 06 2025 at 06:05 UTC