fitzgen opened issue #6627:
Rather than our current free-list-of-structs representation.
That is, right now we have a single free list where each slot has space for the vmctx, table(s), memory(s), etc. This is wasteful in a component-model world, where each instance in the component is going to reserve a full pooling allocator slot of tables/memories/etc even if it is not using all of that state (e.g. the instance only imports memories).
If, instead, we had separate free lists for each of vmctxs, tables, memories, etc... then we can allocate the precise number of things that the component will use, without any slop.
This does mean that we will do
len(vmctxs) + len(tables) + len(memories) + ...
free list allocations rather than onlylen(instances)
. This is probably fine, however, because we can have a single mutex protecting all free lists rather than per-free-list locks, and the actual free list allocation is (probably?) pretty cheap in comparison to the locking involved. Note also that only the free list for memories cares about affinity, so we can use a simplified and optimized free list implementation for all other entities.
fitzgen labeled issue #6627:
Rather than our current free-list-of-structs representation.
That is, right now we have a single free list where each slot has space for the vmctx, table(s), memory(s), etc. This is wasteful in a component-model world, where each instance in the component is going to reserve a full pooling allocator slot of tables/memories/etc even if it is not using all of that state (e.g. the instance only imports memories).
If, instead, we had separate free lists for each of vmctxs, tables, memories, etc... then we can allocate the precise number of things that the component will use, without any slop.
This does mean that we will do
len(vmctxs) + len(tables) + len(memories) + ...
free list allocations rather than onlylen(instances)
. This is probably fine, however, because we can have a single mutex protecting all free lists rather than per-free-list locks, and the actual free list allocation is (probably?) pretty cheap in comparison to the locking involved. Note also that only the free list for memories cares about affinity, so we can use a simplified and optimized free list implementation for all other entities.
fitzgen labeled issue #6627:
Rather than our current free-list-of-structs representation.
That is, right now we have a single free list where each slot has space for the vmctx, table(s), memory(s), etc. This is wasteful in a component-model world, where each instance in the component is going to reserve a full pooling allocator slot of tables/memories/etc even if it is not using all of that state (e.g. the instance only imports memories).
If, instead, we had separate free lists for each of vmctxs, tables, memories, etc... then we can allocate the precise number of things that the component will use, without any slop.
This does mean that we will do
len(vmctxs) + len(tables) + len(memories) + ...
free list allocations rather than onlylen(instances)
. This is probably fine, however, because we can have a single mutex protecting all free lists rather than per-free-list locks, and the actual free list allocation is (probably?) pretty cheap in comparison to the locking involved. Note also that only the free list for memories cares about affinity, so we can use a simplified and optimized free list implementation for all other entities.
fitzgen labeled issue #6627:
Rather than our current free-list-of-structs representation.
That is, right now we have a single free list where each slot has space for the vmctx, table(s), memory(s), etc. This is wasteful in a component-model world, where each instance in the component is going to reserve a full pooling allocator slot of tables/memories/etc even if it is not using all of that state (e.g. the instance only imports memories).
If, instead, we had separate free lists for each of vmctxs, tables, memories, etc... then we can allocate the precise number of things that the component will use, without any slop.
This does mean that we will do
len(vmctxs) + len(tables) + len(memories) + ...
free list allocations rather than onlylen(instances)
. This is probably fine, however, because we can have a single mutex protecting all free lists rather than per-free-list locks, and the actual free list allocation is (probably?) pretty cheap in comparison to the locking involved. Note also that only the free list for memories cares about affinity, so we can use a simplified and optimized free list implementation for all other entities.
alexcrichton commented on issue #6627:
I think that concretely the resources to manage at this time are tables, memories, and stacks. The instances (vmctxs) were moved into
malloc
awhile back and are no longer managed by the pooling allocator. Stacks are actually already in the struct-of-free-list form where stacks have a separate index allocator from tables/memories. Tables and memories, however, are bundled together as one allocator.In terms of committed memory I don't think a struct-of-free-list approach will help much because deallocated slots have their memory decommitted. In terms of VM space I think it can improve, because if an embedder wants to allow a component to take up to M linear memories then supporting N concurrent instances may not require M*N slots of linear memory (assuming not all components take M linear memories).
In terms of performance I don't think this will be much of an issue though. The mutex around the free list has not been a bottleneck historically so splitting that into two mutexes I don't think will be much of an issue. This'll allow the affinity strategy to be used just for linear memories and tables could use a simpler allocation scheme too.
When we talked about this we were thinking that all we would need to add is an descriptive error of why instance allocation failed if it did, but thinking on it more I think we'll need to add accessors for how many resources each component/instance acquires. For example let's say there's an embedding that wants to support 1000 concurrent instances where some components could take up to 4 linear memories. The embedding doesn't want to reserve VM space for 4000 linear memories, so instead it reserves 1500 linear memory slots in the VM address space. In this situation the embedder may want to handle components that actually require 4 linear memories a little differently than other components. For example linear memories could be broken down as: 500 for 1-memory instances, 200 for 2-memory instances, 100 for 3-memory instances, and 75 for 4-memory instances. If a 4-memory component is being instantiated but all 75 slots are already taken the embedder may wish to queue up the instantiation independently of whether there's actually space for it.
Basically that's just a long-winded way of saying I think this is a good idea to implement, but I think we'll want to be able to, from a compiled
&Module
or&Component
, learn how many resources are going to be consumed. That plus a descriptive error when allocation fails I think should be good enough for now at least for embedders to integrate components better.
fitzgen assigned issue #6627 to fitzgen.
fitzgen commented on issue #6627:
Gonna start poking at this.
fitzgen commented on issue #6627:
This was fixed in https://github.com/bytecodealliance/wasmtime/pull/6835/
fitzgen closed issue #6627:
Rather than our current free-list-of-structs representation.
That is, right now we have a single free list where each slot has space for the vmctx, table(s), memory(s), etc. This is wasteful in a component-model world, where each instance in the component is going to reserve a full pooling allocator slot of tables/memories/etc even if it is not using all of that state (e.g. the instance only imports memories).
If, instead, we had separate free lists for each of vmctxs, tables, memories, etc... then we can allocate the precise number of things that the component will use, without any slop.
This does mean that we will do
len(vmctxs) + len(tables) + len(memories) + ...
free list allocations rather than onlylen(instances)
. This is probably fine, however, because we can have a single mutex protecting all free lists rather than per-free-list locks, and the actual free list allocation is (probably?) pretty cheap in comparison to the locking involved. Note also that only the free list for memories cares about affinity, so we can use a simplified and optimized free list implementation for all other entities.
Last updated: Dec 23 2024 at 12:05 UTC