Hey all! I'm having a problem the pooling allocator for wasmtime. What it boils down to is that we can never get the pooling allocator to work on a device that has less than 2GB-ish of memory. I tried using the defaults, lowering the defaults even further for all the options and every time it looked like it tried to allocate the same amount of memory, returning the error:
Error: failed to initialize host
Caused by:
0: failed to build runtime
1: failed to construct engine
2: failed to create stack pool mapping
3: mmap failed to allocate 0x7d3e8000 bytes
4: Cannot allocate memory (os error 12)
So no matter what I did it tried to allocate 0x7d3e8000 bytes. So my question is: Do any of the options actually help tweak this? And as a follow up, should we just be using the dynamic allocator on small systems? I keep trying to dig into the code, but it is a bit too low level for me and hard to follow what the actual impact on allocated memory is.
Just to be clear, here is what I've tried:
PoolingAllocationConfig::default()
pooling_config
.total_component_instances(500)
.total_memories(500)
.total_tables(500)
.linear_memory_keep_resident((10 * MB) as usize)
.table_keep_resident((10 * MB) as usize);
Anyone have any ideas here?
Also, it has failed in both the TablePool
allocation and the StackPool
allocation, depending on the settings
assuming this is a Linux machine -- have you tried tweaking the VM overcommit settings?
in general we're pretty free about mmap'ing large regions and they'll be only sparsely populated; the actual RSS should be close to the sum of all instances' heaps, tables, vmcontexts
Yeah the failures have been on linux. Let me try tweaking the overcommit settings
That did it
Setting it to 1 that is
The default heuristic didn't work
it's worth doing the math on maximum actual RSS too to make sure the default heuristic wasn't "onto something" (i.e. was actually reasonable)
ballpark math can be as simple as number-of-slots * max-heap-size
Yeah default overcommit (on _some_ distros) is to only allow overcommit up to actual physical ram size
So the follow up here is why it had issues even when I lowered all of the max number of components and memories. Am I not tweaking it right?
the default guard region size is 2GiB, so each linear memory slot the pooling allocator creates will reserve 6GiB of address space
Yeah, I knew that the guard region was configurable but that didn't help either without overcommit being changed
Would the proper thing be to lower the static_memory_maximum_size
to be something like 2GB instead of 4GB?
Well, that seemed to do it for me. Lowered memory size to 2GB and guard to 1GB
I'm curious about your other knobs, most particularly the number of slots in the pooling allocator -- if an individual module has a max memory size of 2GiB, and you have more than one slot in use, you'll exceed your system's 2GiB physical memory
(overcommit / optimistic underprovisioning is of course a thing, but at least in some contexts one wants to size for worst-case instead)
By slots do you mean "number of components" or something else? I figured with an overprovisioning here, I'd assume that it doesn't actually consume all that memory until it is actually used (since it is just in the virtual address space)
related, i'm not sure if a guard page size > 0 make sense for a static memory size < 4GB anyway as bounds checks will always be emitted (i think the pooling allocator might still reserve those pages even though they'll never be hit by an out-of-bounds memory access)
and by slots, I think it would be total_memories
in this case (there's also total_stacks
to think about for async)
Oh yep you're right:
For 32-bit wasm memories a 4GB static memory is required to even start removing bounds checks.
but to Chris' point, I'd expect any static memory size < 4GB in a pooling allocator on a device with constrained memory to be closer to a value that one would expect to be able to support 500 concurrently used linear memories for; like 4 MB or something
Yep, this is all starting to make some sense. Might document this or write a blog post so there are some more example out there
Thanks all for the help here. I think I have a semi-decent grasp on things now
Taylor Thomas has marked this topic as resolved.
Last updated: Dec 23 2024 at 14:03 UTC