jameysharp opened issue #5829:
Feature
The
EntityList
/ListPool
types declared incranelift/entity/src/list.rs
allocate arrays from a large pool of memory. Empty lists use no space in the pool. Otherwise, lists use a minimum of 4 elements of the pool. With a little cleverness, I believe we can arrange that lists of length 1 use no space in the pool either. Alternatively, we can at least reduce the minimum pool allocation to 2 elements instead of 4 and test whether that's an improvement.Benefit
If we have a lot of length-1
EntityList
s, then this could save a lot of memory. It might also save on memory bandwidth, reduce cache pressure, and avoid pipeline stalls from chasing indirections.Part of completing this issue is checking whether this change really does improve performance. It could be that the extra complexity of handling singleton lists specially takes more time than we save, or that most lists have at least two elements, or that I've misjudged the possible microarchitectural benefits.
That said, my guess is that we do have a lot of length-1 lists, especially now that
BlockCall
s are stored in theValueListPool
. (A branch to a block that takes no block-parameters is stored as a singleton list containing only theBlock
index.)Implementation
The basic idea is similar to smallvec.
EntityList
holds an index to the location of the list within theListPool
. For a singleton list, we can store the one list element inside theEntityList
, instead of interpreting it as an index into the pool.There are two details that make this worth considering:
All of our types that implement
EntityRef
areu32
internally, and so isEntityList
. If entities were larger this wouldn't be worth doing.Some bit-patterns in an
EntityList
are unused. A storage block within aListPool
always starts at an offset that is at least a multiple of 4, and the index stored inEntityList
is one past the start of the block. So currently, anEntityList
is either zero (representing an empty list), or its least-significant two bits are01
.And one detail that makes this tricky: An
EntityList
can return a slice, possibly mutable, of its contents. To make this work with singleton lists stored directly inside thestruct EntityList<T>
itself, we'd need to be able to borrow a&mut [T]
out of it without enabling callers to break our invariants.One way to do that is when someone asks for a mutable slice, promote a singleton list into a pool-backed list. This is bad if we need to do that for most lists, since it undoes any advantages we might get from making this change in the first place. For
BlockCall
specifically we can do better, because at any given time we only want to access either the first element or a slice of the remaining elements, and for singleton lists we can return&mut []
without needing any backing storage.Alternatives
The simplest alternative which might help is to reduce the minimum block allocation size from 4 to 2 in
ListPool
. If all our lists were singletons, that change would cut memory usage in these pools by 50%; counting the size of thestruct EntityList
it's a savings of 40%. That might be a better choice than all of the above due to its simplicity, but doesn't save as much memory.
jameysharp labeled issue #5829:
Feature
The
EntityList
/ListPool
types declared incranelift/entity/src/list.rs
allocate arrays from a large pool of memory. Empty lists use no space in the pool. Otherwise, lists use a minimum of 4 elements of the pool. With a little cleverness, I believe we can arrange that lists of length 1 use no space in the pool either. Alternatively, we can at least reduce the minimum pool allocation to 2 elements instead of 4 and test whether that's an improvement.Benefit
If we have a lot of length-1
EntityList
s, then this could save a lot of memory. It might also save on memory bandwidth, reduce cache pressure, and avoid pipeline stalls from chasing indirections.Part of completing this issue is checking whether this change really does improve performance. It could be that the extra complexity of handling singleton lists specially takes more time than we save, or that most lists have at least two elements, or that I've misjudged the possible microarchitectural benefits.
That said, my guess is that we do have a lot of length-1 lists, especially now that
BlockCall
s are stored in theValueListPool
. (A branch to a block that takes no block-parameters is stored as a singleton list containing only theBlock
index.)Implementation
The basic idea is similar to smallvec.
EntityList
holds an index to the location of the list within theListPool
. For a singleton list, we can store the one list element inside theEntityList
, instead of interpreting it as an index into the pool.There are two details that make this worth considering:
All of our types that implement
EntityRef
areu32
internally, and so isEntityList
. If entities were larger this wouldn't be worth doing.Some bit-patterns in an
EntityList
are unused. A storage block within aListPool
always starts at an offset that is at least a multiple of 4, and the index stored inEntityList
is one past the start of the block. So currently, anEntityList
is either zero (representing an empty list), or its least-significant two bits are01
.And one detail that makes this tricky: An
EntityList
can return a slice, possibly mutable, of its contents. To make this work with singleton lists stored directly inside thestruct EntityList<T>
itself, we'd need to be able to borrow a&mut [T]
out of it without enabling callers to break our invariants.One way to do that is when someone asks for a mutable slice, promote a singleton list into a pool-backed list. This is bad if we need to do that for most lists, since it undoes any advantages we might get from making this change in the first place. For
BlockCall
specifically we can do better, because at any given time we only want to access either the first element or a slice of the remaining elements, and for singleton lists we can return&mut []
without needing any backing storage.Alternatives
The simplest alternative which might help is to reduce the minimum block allocation size from 4 to 2 in
ListPool
. If all our lists were singletons, that change would cut memory usage in these pools by 50%; counting the size of thestruct EntityList
it's a savings of 40%. That might be a better choice than all of the above due to its simplicity, but doesn't save as much memory.
jameysharp labeled issue #5829:
Feature
The
EntityList
/ListPool
types declared incranelift/entity/src/list.rs
allocate arrays from a large pool of memory. Empty lists use no space in the pool. Otherwise, lists use a minimum of 4 elements of the pool. With a little cleverness, I believe we can arrange that lists of length 1 use no space in the pool either. Alternatively, we can at least reduce the minimum pool allocation to 2 elements instead of 4 and test whether that's an improvement.Benefit
If we have a lot of length-1
EntityList
s, then this could save a lot of memory. It might also save on memory bandwidth, reduce cache pressure, and avoid pipeline stalls from chasing indirections.Part of completing this issue is checking whether this change really does improve performance. It could be that the extra complexity of handling singleton lists specially takes more time than we save, or that most lists have at least two elements, or that I've misjudged the possible microarchitectural benefits.
That said, my guess is that we do have a lot of length-1 lists, especially now that
BlockCall
s are stored in theValueListPool
. (A branch to a block that takes no block-parameters is stored as a singleton list containing only theBlock
index.)Implementation
The basic idea is similar to smallvec.
EntityList
holds an index to the location of the list within theListPool
. For a singleton list, we can store the one list element inside theEntityList
, instead of interpreting it as an index into the pool.There are two details that make this worth considering:
All of our types that implement
EntityRef
areu32
internally, and so isEntityList
. If entities were larger this wouldn't be worth doing.Some bit-patterns in an
EntityList
are unused. A storage block within aListPool
always starts at an offset that is at least a multiple of 4, and the index stored inEntityList
is one past the start of the block. So currently, anEntityList
is either zero (representing an empty list), or its least-significant two bits are01
.And one detail that makes this tricky: An
EntityList
can return a slice, possibly mutable, of its contents. To make this work with singleton lists stored directly inside thestruct EntityList<T>
itself, we'd need to be able to borrow a&mut [T]
out of it without enabling callers to break our invariants.One way to do that is when someone asks for a mutable slice, promote a singleton list into a pool-backed list. This is bad if we need to do that for most lists, since it undoes any advantages we might get from making this change in the first place. For
BlockCall
specifically we can do better, because at any given time we only want to access either the first element or a slice of the remaining elements, and for singleton lists we can return&mut []
without needing any backing storage.Alternatives
The simplest alternative which might help is to reduce the minimum block allocation size from 4 to 2 in
ListPool
. If all our lists were singletons, that change would cut memory usage in these pools by 50%; counting the size of thestruct EntityList
it's a savings of 40%. That might be a better choice than all of the above due to its simplicity, but doesn't save as much memory.
Last updated: Dec 23 2024 at 12:05 UTC