fitzgen opened PR #8590 from fitzgen:decommit-queue
to bytecodealliance:main
:
This introduces a
DecommitQueue
for batching decommits together in the pooling allocator:
Deallocating a memory/table/stack enqueues their associated regions of memory for decommit; it no longer immediately returns the associated slot to the pool's free list. If the queue's length has reached the configured batch size, then we flush the queue by running all the decommits, and finally returning the memory/table/stack slots to their respective pools and free lists.
Additionally, if allocating a new memory/table/stack fails because the free list is empty (aka we've reached the max concurrently-allocated limit for this entity) then we fall back to a slow path before propagating the error. This slow path flushes the decommit queue and then retries allocation, hoping that the queue flush reclaimed slots and made them available for this fallback allocation attempt. This involved defining a new
PoolConcurrencyLimitError
to match on, which is also exposed in the public embedder API.It is also worth noting that we always use this new decommit queue now. To keep the existing behavior, where e.g. a memory's decommits happen immediately on deallocation, you can use a batch size of one. This effectively disables queuing, forcing all decommits to be flushed immediately.
The default decommit batch size is one.
This commit, with batch size of one, consistently gives me an increase on
wasmtime serve
's requests-per-second versus its parent commit, as measured bybenches/wasmtime-serve-rps.sh
. I get ~39K RPS on this commit compared to ~35K RPS on the parent commit. This is quite puzzling to me. I was expecting no change, and hoping there wouldn't be a regression. I was not expecting a speed up. I cannot explain this result at this time.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
fitzgen requested alexcrichton for a review on PR #8590.
fitzgen requested wasmtime-fuzz-reviewers for a review on PR #8590.
fitzgen requested wasmtime-core-reviewers for a review on PR #8590.
fitzgen requested wasmtime-default-reviewers for a review on PR #8590.
fitzgen updated PR #8590.
fitzgen updated PR #8590.
alexcrichton submitted PR review:
Thanks and looks good! I've got some thoughts on refactorings below, but otherwise can you additionally add some tests that exercise the if-empty-then-flush-and-try-again path? Basically a test that fails if that path isn't there but succeeds if it is.
This is quite puzzling to me. I was expecting no change, and hoping there wouldn't be a regression. I was not expecting a speed up. I cannot explain this result at this time.
I think I've seen things to this effect historically where very-close-together
madvise
s are slightly more optimal sometimes. The theory is that the kernel didn't have a chance to run any other threads between two calls tomadvise
so the second one can skip IPIs since the kernel dynamically knows that no other cores need to be shot down, but that's only a guess.
alexcrichton submitted PR review:
Thanks and looks good! I've got some thoughts on refactorings below, but otherwise can you additionally add some tests that exercise the if-empty-then-flush-and-try-again path? Basically a test that fails if that path isn't there but succeeds if it is.
This is quite puzzling to me. I was expecting no change, and hoping there wouldn't be a regression. I was not expecting a speed up. I cannot explain this result at this time.
I think I've seen things to this effect historically where very-close-together
madvise
s are slightly more optimal sometimes. The theory is that the kernel didn't have a chance to run any other threads between two calls tomadvise
so the second one can skip IPIs since the kernel dynamically knows that no other cores need to be shot down, but that's only a guess.
alexcrichton created PR review comment:
Given that this happens under a lock and needs to happen as quickly as possible could this avoid the
sysconf
on each call?
alexcrichton created PR review comment:
Given the size of
DecommitQueue
can a newtype be made forlibc::iovec
which has these impls?
alexcrichton created PR review comment:
Can this API fail? And if so could
batch_size
be used as-is?
alexcrichton created PR review comment:
Instead of passing around a closure here and in the
*_pool.rs
files below would you be ok adding a small abstraction which has&mut DecommitQueue
internally along with thebool
thatpooling.rs
is tracking?
alexcrichton created PR review comment:
This is probably going to need #[cfg] or move into the sys module to handle windows/macos/miri as well
alexcrichton created PR review comment:
Could you add a helper method for this? Something like:
fn allocate_and_maybe_decommit<T>(&self, f: impl Fn() -> Result<T>) -> Result<T> { ... }
to avoid copying this in a few places? Also it might be worth having
flush_decommit_queue
return whether it actually did anything and skip the re-allocate if the flush didn't flush.
alexcrichton created PR review comment:
Hm actually on second thought I have a different idea. The duplication in the three deallocate methods in
pooling.rs
is pretty subtle and feels sort of hard to get right. It also has the subtelty of theMutex<DeallocationQueue>
is held across function calls, such as this one, and it's getting to the point where it's easy to get into issues of lock ordering issues or deadlocks. For example indeallocate_fiber_stack
the deallocation queue lock may be held acrossself.stacks.deallocate
which internally acquires a different lock.This leads me to a different way to model this which might be a bit less error-prone. This
decommit
argument could be replaced by a structure withSmallVec
s inside of it where it's pushed onto. Once this step has completed there's a general method for "clear out this thing with small vectors or, if empty, do the deallocation immediately" or something like that. That would keep the critical section as small as possible while I think also making it a bit clearer how to reuse/shape code using it perhaps?
fitzgen updated PR #8590.
jameysharp submitted PR review.
jameysharp created PR review comment:
On glibc,
sysconf(_SC_IOV_MAX)
returns a compile-time constant, so I don't think we should worry about its speed: https://github.com/bminor/glibc/blob/dd5f891c1ad9f1b43b9db93afe2a55cbb7a6194e/sysdeps/posix/sysconf.c#L412-L418
github-actions[bot] commented on PR #8590:
Subscribe to Label Action
cc @fitzgen
<details>
This issue or pull request has been labeled: "fuzzing", "wasmtime:api", "wasmtime:config"Thus the following users have been cc'd because of the following labels:
- fitzgen: fuzzing
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
github-actions[bot] commented on PR #8590:
Label Messager: wasmtime:config
It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:
[ ] If you added a new
Config
method, you wrote extensive documentation for
it.<details>
Our documentation should be of the following form:
```text
Short, simple summary sentence.More details. These details can be multiple paragraphs. There should be
information about not just the method, but its parameters and results as
well.Is this method fallible? If so, when can it return an error?
Can this method panic? If so, when does it panic?
Example
Optional example here.
```</details>
[ ] If you added a new
Config
method, or modified an existing one, you
ensured that this configuration is exercised by the fuzz targets.<details>
For example, if you expose a new strategy for allocating the next instance
slot inside the pooling allocator, you should ensure that at least one of our
fuzz targets exercises that new strategy.Often, all that is required of you is to ensure that there is a knob for this
configuration option in [wasmtime_fuzzing::Config
][fuzzing-config] (or one
of its nestedstruct
s).Rarely, this may require authoring a new fuzz target to specifically test this
configuration. See [our docs on fuzzing][fuzzing-docs] for more details.</details>
[ ] If you are enabling a configuration option by default, make sure that it
has been fuzzed for at least two weeks before turning it on by default.[fuzzing-config]: https://github.com/bytecodealliance/wasmtime/blob/ca0e8d0a1d8cefc0496dba2f77a670571d8fdcab/crates/fuzzing/src/generators.rs#L182-L194
[fuzzing-docs]: https://docs.wasmtime.dev/contributing-fuzzing.html
<details>
To modify this label's message, edit the <code>.github/label-messager/wasmtime-config.md</code> file.
To add new label messages or remove existing label messages, edit the
<code>.github/label-messager.json</code> configuration file.</details>
jameysharp submitted PR review.
jameysharp created PR review comment:
I think the least-code way to do what you're suggesting is to pass an empty
DecommitQueue
into e.g.image.clear_and_remain_ready
and unconditionally callpush_memory
on it; then pass it to a common method.That method should check if its raw iovec queue is empty, and if so, immediately flush it, which will call
self.memories.deallocate
on the only memory pushed into it. Otherwise, take the lock and append its contents to the globalDecommitQueue
, and then if that queue is full, immediately flush that.Having the ability to append one
DecommitQueue
into another might be nice for maintaining thread-local queues or something too, someday.Also it might make sense to move the
batch_size
field toPoolingInstanceAllocator
. We don't need to check it duringenqueue_raw
, especially with this proposed change, so we might as well store it alongside all the other data we need access to during flush.
jameysharp submitted PR review.
jameysharp created PR review comment:
I gather you're suggesting the closure parameter would be, for example:
|| self.memories.allocate(request, memory_plan, memory_index)
That makes sense to me!
fitzgen submitted PR review.
fitzgen created PR review comment:
We have to handle the case where the batch size is larger than
IOV_MAX
anyways, since a single wasm memory can push two memory regions to decommit and so we can always blow our target batch size. Therefore, I think we should just remove this bit, and allow the batch decommit to break things up into smaller chunks if it must, since it has to have that logic anyways.
fitzgen updated PR #8590.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Sounds good to me :+1:
fitzgen updated PR #8590.
fitzgen updated PR #8590.
fitzgen commented on PR #8590:
@alexcrichton I think this is ready for re-review!
fitzgen edited a comment on PR #8590:
Label Messager: wasmtime:config
It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:
[x] If you added a new
Config
method, you wrote extensive documentation for
it.<details>
Our documentation should be of the following form:
```text
Short, simple summary sentence.More details. These details can be multiple paragraphs. There should be
information about not just the method, but its parameters and results as
well.Is this method fallible? If so, when can it return an error?
Can this method panic? If so, when does it panic?
Example
Optional example here.
```</details>
[ ] If you added a new
Config
method, or modified an existing one, you
ensured that this configuration is exercised by the fuzz targets.<details>
For example, if you expose a new strategy for allocating the next instance
slot inside the pooling allocator, you should ensure that at least one of our
fuzz targets exercises that new strategy.Often, all that is required of you is to ensure that there is a knob for this
configuration option in [wasmtime_fuzzing::Config
][fuzzing-config] (or one
of its nestedstruct
s).Rarely, this may require authoring a new fuzz target to specifically test this
configuration. See [our docs on fuzzing][fuzzing-docs] for more details.</details>
[ ] If you are enabling a configuration option by default, make sure that it
has been fuzzed for at least two weeks before turning it on by default.[fuzzing-config]: https://github.com/bytecodealliance/wasmtime/blob/ca0e8d0a1d8cefc0496dba2f77a670571d8fdcab/crates/fuzzing/src/generators.rs#L182-L194
[fuzzing-docs]: https://docs.wasmtime.dev/contributing-fuzzing.html
<details>
To modify this label's message, edit the <code>.github/label-messager/wasmtime-config.md</code> file.
To add new label messages or remove existing label messages, edit the
<code>.github/label-messager.json</code> configuration file.</details>
fitzgen edited a comment on PR #8590:
Label Messager: wasmtime:config
It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:
[x] If you added a new
Config
method, you wrote extensive documentation for
it.<details>
Our documentation should be of the following form:
```text
Short, simple summary sentence.More details. These details can be multiple paragraphs. There should be
information about not just the method, but its parameters and results as
well.Is this method fallible? If so, when can it return an error?
Can this method panic? If so, when does it panic?
Example
Optional example here.
```</details>
[x] If you added a new
Config
method, or modified an existing one, you
ensured that this configuration is exercised by the fuzz targets.<details>
For example, if you expose a new strategy for allocating the next instance
slot inside the pooling allocator, you should ensure that at least one of our
fuzz targets exercises that new strategy.Often, all that is required of you is to ensure that there is a knob for this
configuration option in [wasmtime_fuzzing::Config
][fuzzing-config] (or one
of its nestedstruct
s).Rarely, this may require authoring a new fuzz target to specifically test this
configuration. See [our docs on fuzzing][fuzzing-docs] for more details.</details>
[ ] If you are enabling a configuration option by default, make sure that it
has been fuzzed for at least two weeks before turning it on by default.[fuzzing-config]: https://github.com/bytecodealliance/wasmtime/blob/ca0e8d0a1d8cefc0496dba2f77a670571d8fdcab/crates/fuzzing/src/generators.rs#L182-L194
[fuzzing-docs]: https://docs.wasmtime.dev/contributing-fuzzing.html
<details>
To modify this label's message, edit the <code>.github/label-messager/wasmtime-config.md</code> file.
To add new label messages or remove existing label messages, edit the
<code>.github/label-messager.json</code> configuration file.</details>
fitzgen edited a comment on PR #8590:
Label Messager: wasmtime:config
It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:
[x] If you added a new
Config
method, you wrote extensive documentation for
it.<details>
Our documentation should be of the following form:
```text
Short, simple summary sentence.More details. These details can be multiple paragraphs. There should be
information about not just the method, but its parameters and results as
well.Is this method fallible? If so, when can it return an error?
Can this method panic? If so, when does it panic?
Example
Optional example here.
```</details>
[x] If you added a new
Config
method, or modified an existing one, you
ensured that this configuration is exercised by the fuzz targets.<details>
For example, if you expose a new strategy for allocating the next instance
slot inside the pooling allocator, you should ensure that at least one of our
fuzz targets exercises that new strategy.Often, all that is required of you is to ensure that there is a knob for this
configuration option in [wasmtime_fuzzing::Config
][fuzzing-config] (or one
of its nestedstruct
s).Rarely, this may require authoring a new fuzz target to specifically test this
configuration. See [our docs on fuzzing][fuzzing-docs] for more details.</details>
[x] If you are enabling a configuration option by default, make sure that it
has been fuzzed for at least two weeks before turning it on by default.[fuzzing-config]: https://github.com/bytecodealliance/wasmtime/blob/ca0e8d0a1d8cefc0496dba2f77a670571d8fdcab/crates/fuzzing/src/generators.rs#L182-L194
[fuzzing-docs]: https://docs.wasmtime.dev/contributing-fuzzing.html
<details>
To modify this label's message, edit the <code>.github/label-messager/wasmtime-config.md</code> file.
To add new label messages or remove existing label messages, edit the
<code>.github/label-messager.json</code> configuration file.</details>
alexcrichton submitted PR review:
Nice!
jameysharp commented on PR #8590:
Looks like the reason CI is failing is that we can only use
libc::iovec
oncfg(unix)
. Elsewhere I guess we should define an alternativestruct iovec
with the same fields? We need it to match system ABI if we use it in syscalls but on non-Unix (non-Linux, even) any pair of pointer and length will do.
fitzgen updated PR #8590.
fitzgen has enabled auto merge for PR #8590.
fitzgen updated PR #8590.
fitzgen has enabled auto merge for PR #8590.
fitzgen updated PR #8590.
fitzgen has enabled auto merge for PR #8590.
fitzgen updated PR #8590.
fitzgen has enabled auto merge for PR #8590.
fitzgen updated PR #8590.
fitzgen updated PR #8590.
fitzgen updated PR #8590.
fitzgen merged PR #8590.
Last updated: Jan 24 2025 at 00:11 UTC