wasmtime / issue #5829 cranelift-entity: Improve cost of ... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #5829 cranelift-entity: Improve cost of ...

Wasmtime GitHub notifications bot (Feb 18 2023 at 00:58):

jameysharp opened issue #5829:

Feature

The EntityList/ListPool types declared in cranelift/entity/src/list.rs allocate arrays from a large pool of memory. Empty lists use no space in the pool. Otherwise, lists use a minimum of 4 elements of the pool. With a little cleverness, I believe we can arrange that lists of length 1 use no space in the pool either. Alternatively, we can at least reduce the minimum pool allocation to 2 elements instead of 4 and test whether that's an improvement.

Benefit

If we have a lot of length-1 EntityLists, then this could save a lot of memory. It might also save on memory bandwidth, reduce cache pressure, and avoid pipeline stalls from chasing indirections.

Part of completing this issue is checking whether this change really does improve performance. It could be that the extra complexity of handling singleton lists specially takes more time than we save, or that most lists have at least two elements, or that I've misjudged the possible microarchitectural benefits.

That said, my guess is that we do have a lot of length-1 lists, especially now that BlockCalls are stored in the ValueListPool. (A branch to a block that takes no block-parameters is stored as a singleton list containing only the Block index.)

Implementation

The basic idea is similar to smallvec. EntityList holds an index to the location of the list within the ListPool. For a singleton list, we can store the one list element inside the EntityList, instead of interpreting it as an index into the pool.

There are two details that make this worth considering:

All of our types that implement EntityRef are u32 internally, and so is EntityList. If entities were larger this wouldn't be worth doing.

Some bit-patterns in an EntityList are unused. A storage block within a ListPool always starts at an offset that is at least a multiple of 4, and the index stored in EntityList is one past the start of the block. So currently, an EntityList is either zero (representing an empty list), or its least-significant two bits are 01.

And one detail that makes this tricky: An EntityList can return a slice, possibly mutable, of its contents. To make this work with singleton lists stored directly inside the struct EntityList<T> itself, we'd need to be able to borrow a &mut [T] out of it without enabling callers to break our invariants.

One way to do that is when someone asks for a mutable slice, promote a singleton list into a pool-backed list. This is bad if we need to do that for most lists, since it undoes any advantages we might get from making this change in the first place. For BlockCall specifically we can do better, because at any given time we only want to access either the first element or a slice of the remaining elements, and for singleton lists we can return &mut [] without needing any backing storage.

Alternatives

The simplest alternative which might help is to reduce the minimum block allocation size from 4 to 2 in ListPool. If all our lists were singletons, that change would cut memory usage in these pools by 50%; counting the size of the struct EntityList it's a savings of 40%. That might be a better choice than all of the above due to its simplicity, but doesn't save as much memory.

Wasmtime GitHub notifications bot (Mar 31 2023 at 00:46):

jameysharp labeled issue #5829:

Feature

The EntityList/ListPool types declared in cranelift/entity/src/list.rs allocate arrays from a large pool of memory. Empty lists use no space in the pool. Otherwise, lists use a minimum of 4 elements of the pool. With a little cleverness, I believe we can arrange that lists of length 1 use no space in the pool either. Alternatively, we can at least reduce the minimum pool allocation to 2 elements instead of 4 and test whether that's an improvement.

Benefit

If we have a lot of length-1 EntityLists, then this could save a lot of memory. It might also save on memory bandwidth, reduce cache pressure, and avoid pipeline stalls from chasing indirections.

Part of completing this issue is checking whether this change really does improve performance. It could be that the extra complexity of handling singleton lists specially takes more time than we save, or that most lists have at least two elements, or that I've misjudged the possible microarchitectural benefits.

That said, my guess is that we do have a lot of length-1 lists, especially now that BlockCalls are stored in the ValueListPool. (A branch to a block that takes no block-parameters is stored as a singleton list containing only the Block index.)

Implementation

The basic idea is similar to smallvec. EntityList holds an index to the location of the list within the ListPool. For a singleton list, we can store the one list element inside the EntityList, instead of interpreting it as an index into the pool.

There are two details that make this worth considering:

All of our types that implement EntityRef are u32 internally, and so is EntityList. If entities were larger this wouldn't be worth doing.

Some bit-patterns in an EntityList are unused. A storage block within a ListPool always starts at an offset that is at least a multiple of 4, and the index stored in EntityList is one past the start of the block. So currently, an EntityList is either zero (representing an empty list), or its least-significant two bits are 01.

And one detail that makes this tricky: An EntityList can return a slice, possibly mutable, of its contents. To make this work with singleton lists stored directly inside the struct EntityList<T> itself, we'd need to be able to borrow a &mut [T] out of it without enabling callers to break our invariants.

One way to do that is when someone asks for a mutable slice, promote a singleton list into a pool-backed list. This is bad if we need to do that for most lists, since it undoes any advantages we might get from making this change in the first place. For BlockCall specifically we can do better, because at any given time we only want to access either the first element or a slice of the remaining elements, and for singleton lists we can return &mut [] without needing any backing storage.

Alternatives

The simplest alternative which might help is to reduce the minimum block allocation size from 4 to 2 in ListPool. If all our lists were singletons, that change would cut memory usage in these pools by 50%; counting the size of the struct EntityList it's a savings of 40%. That might be a better choice than all of the above due to its simplicity, but doesn't save as much memory.

Wasmtime GitHub notifications bot (Mar 31 2023 at 00:46):

jameysharp labeled issue #5829:

Feature

The EntityList/ListPool types declared in cranelift/entity/src/list.rs allocate arrays from a large pool of memory. Empty lists use no space in the pool. Otherwise, lists use a minimum of 4 elements of the pool. With a little cleverness, I believe we can arrange that lists of length 1 use no space in the pool either. Alternatively, we can at least reduce the minimum pool allocation to 2 elements instead of 4 and test whether that's an improvement.

Benefit

If we have a lot of length-1 EntityLists, then this could save a lot of memory. It might also save on memory bandwidth, reduce cache pressure, and avoid pipeline stalls from chasing indirections.

Part of completing this issue is checking whether this change really does improve performance. It could be that the extra complexity of handling singleton lists specially takes more time than we save, or that most lists have at least two elements, or that I've misjudged the possible microarchitectural benefits.

That said, my guess is that we do have a lot of length-1 lists, especially now that BlockCalls are stored in the ValueListPool. (A branch to a block that takes no block-parameters is stored as a singleton list containing only the Block index.)

Implementation

The basic idea is similar to smallvec. EntityList holds an index to the location of the list within the ListPool. For a singleton list, we can store the one list element inside the EntityList, instead of interpreting it as an index into the pool.

There are two details that make this worth considering:

All of our types that implement EntityRef are u32 internally, and so is EntityList. If entities were larger this wouldn't be worth doing.

Some bit-patterns in an EntityList are unused. A storage block within a ListPool always starts at an offset that is at least a multiple of 4, and the index stored in EntityList is one past the start of the block. So currently, an EntityList is either zero (representing an empty list), or its least-significant two bits are 01.

And one detail that makes this tricky: An EntityList can return a slice, possibly mutable, of its contents. To make this work with singleton lists stored directly inside the struct EntityList<T> itself, we'd need to be able to borrow a &mut [T] out of it without enabling callers to break our invariants.

One way to do that is when someone asks for a mutable slice, promote a singleton list into a pool-backed list. This is bad if we need to do that for most lists, since it undoes any advantages we might get from making this change in the first place. For BlockCall specifically we can do better, because at any given time we only want to access either the first element or a slice of the remaining elements, and for singleton lists we can return &mut [] without needing any backing storage.

Alternatives

The simplest alternative which might help is to reduce the minimum block allocation size from 4 to 2 in ListPool. If all our lists were singletons, that change would cut memory usage in these pools by 50%; counting the size of the struct EntityList it's a savings of 40%. That might be a better choice than all of the above due to its simplicity, but doesn't save as much memory.

Last updated: Apr 18 2025 at 12:05 UTC