jameysharp opened issue #5879:
Feature
When we store a
Vec<T>
inside another data structure, but never change its length, we can instead store aBox<[T]>
.Benefit
The size of
Vec
in memory is three machine words, but a boxed slice is only two machine words long. This doesn't really matter if the array is only stored on the stack. But if it's part of another data structure, switching to boxed slices can reduce the amount of heap memory we allocate. That may improve cache locality and have other performance benefits.Implementation
I did a quick experiment along these lines several months ago in commit 68d2ac2a1471a096d021be7f2cb2734843648ab8. That commit has merge conflicts with
main
now, but it illustrates the basics of how to make this change for one type,Vec<ArgPair>
. I suspect there are many other uses ofVec
in Cranelift that could have equally simple changes. There may be some in Wasmtime as well, though I'm personally focused on cases in Cranelift.Pull requests addressing this issue should ideally include some measurements showing what effect the change has, because it's theoretically possible that this could make performance worse. I think it's more likely that there won't be much visible effect at all, but these changes shouldn't make the implementation any more complicated and could potentially improve performance. We won't know unless we try it and measure the results.
There are two kinds of measurements that would be great to do, with Wasmtime built in
--release
mode.
Run
valgrind --tool=dhat target/release/wasmtime compile
on some WebAssembly module, both with the version ofmain
you started from and with your patched version. I like to use our pulldown-cmark benchmark program for this because it's not too small but not too big, so even with the overhead of DHAT's instrumentation it doesn't take too long.Run our Sightglass benchmark suite to compare the speed of compiling some benchmark programs with and without your changes. See the instructions in that project's README. For testing this issue, the
--stop-after compilation
option is useful; these changes should have no effect on the generated code or execution time, so you don't need to measure those.Alternatives
It's fine to keep using
Vec
.There might be cases where either
SmallVec
, or ourListPool
type from cranelift-entity, are more appropriate than a boxed slice.SmallVec
is an especially good choice if most of the time the array would fit in about two machine words, but it occasionally needs to be bigger.ListPool
may be a good choice if we have a lot of the same type of array allocated at the same time, and the array elements implementEntityRef
, which generally means they're 32-bit indices into an array somewhere else.
jameysharp labeled issue #5879:
Feature
When we store a
Vec<T>
inside another data structure, but never change its length, we can instead store aBox<[T]>
.Benefit
The size of
Vec
in memory is three machine words, but a boxed slice is only two machine words long. This doesn't really matter if the array is only stored on the stack. But if it's part of another data structure, switching to boxed slices can reduce the amount of heap memory we allocate. That may improve cache locality and have other performance benefits.Implementation
I did a quick experiment along these lines several months ago in commit 68d2ac2a1471a096d021be7f2cb2734843648ab8. That commit has merge conflicts with
main
now, but it illustrates the basics of how to make this change for one type,Vec<ArgPair>
. I suspect there are many other uses ofVec
in Cranelift that could have equally simple changes. There may be some in Wasmtime as well, though I'm personally focused on cases in Cranelift.Pull requests addressing this issue should ideally include some measurements showing what effect the change has, because it's theoretically possible that this could make performance worse. I think it's more likely that there won't be much visible effect at all, but these changes shouldn't make the implementation any more complicated and could potentially improve performance. We won't know unless we try it and measure the results.
There are two kinds of measurements that would be great to do, with Wasmtime built in
--release
mode.
Run
valgrind --tool=dhat target/release/wasmtime compile
on some WebAssembly module, both with the version ofmain
you started from and with your patched version. I like to use our pulldown-cmark benchmark program for this because it's not too small but not too big, so even with the overhead of DHAT's instrumentation it doesn't take too long.Run our Sightglass benchmark suite to compare the speed of compiling some benchmark programs with and without your changes. See the instructions in that project's README. For testing this issue, the
--stop-after compilation
option is useful; these changes should have no effect on the generated code or execution time, so you don't need to measure those.Alternatives
It's fine to keep using
Vec
.There might be cases where either
SmallVec
, or ourListPool
type from cranelift-entity, are more appropriate than a boxed slice.SmallVec
is an especially good choice if most of the time the array would fit in about two machine words, but it occasionally needs to be bigger.ListPool
may be a good choice if we have a lot of the same type of array allocated at the same time, and the array elements implementEntityRef
, which generally means they're 32-bit indices into an array somewhere else.
bjorn3 commented on issue #5879:
Note that converting from Vec to boxed slice may reallocate to shrink capacity, which may dwarf the benefit.
jameysharp commented on issue #5879:
Yeah, that's one reason why measuring the results is important.
Whether the reallocation occurs depends on how they're constructed. I believe that using
Iterator::collect
to produce a boxed slice won't reallocate as long as the iterator gives exact results fromsize_hint
, which I think is fairly common in Cranelift. If someone picks up this issue I'll look at specific cases and help figure out whether boxed slices are likely to be a good fit on a case-by-case basis.Also, I don't know how expensive it is to reallocate to a smaller layout with Rust's default allocator. My mental model of general-purpose heap allocators is that realloc to a smaller size doesn't copy the memory, just updates metadata in O(1) time. But that's not based on looking at any modern allocators and could be completely wrong. Do you know off-hand?
bjorn3 commented on issue #5879:
Most allocators divide the heap in different size classes and use mmap directly for very large allocations. If resizing brings you to a smaller size class I wouldn't be surprised if it is necessary to copy the data to a chunk in this new size class. If it doesn't go to a smaller size class, it won't save any memory (except for the capacity field missing) at all as allocation sizes are rounded to the nearest size class. For mmap backed allocations shrinking will perform an expensive munmap syscall. When collecting an iterator directly into a boxed slice where the size hint is correct, collecting it into a vec instead doesn't save any space apart from the capacity field.
bjorn3 edited a comment on issue #5879:
Most allocators divide the heap in different size classes and use mmap directly for very large allocations. If resizing brings you to a smaller size class I wouldn't be surprised if it is necessary to copy the data to a chunk in this new size class. (At least one allocator I looked at didn't save any size information in the allocation header, but instead derives it from the size class of the chunk in which the allocation is stored) If it doesn't go to a smaller size class, it won't save any memory (except for the capacity field missing) at all as allocation sizes are rounded to the nearest size class. For mmap backed allocations shrinking will perform an expensive munmap syscall. When collecting an iterator directly into a boxed slice where the size hint is correct, collecting it into a vec instead doesn't save any space apart from the capacity field.
OLUWAMUYIWA commented on issue #5879:
Hi @jameysharp, I'd like to pick this up. First time here, but I'm up to it.
Last updated: Dec 23 2024 at 12:05 UTC