alexcrichton opened PR #5207 from keep-resident-options
to main
:
When new wasm instances are created repeatedly in high-concurrency environments one of the largest bottlenecks is the contention on kernel-level locks having to do with the virtual memory. It's expected that usage in this environment is leveraging the pooling instance allocator with the
memory-init-cow
feature enabled which means that the kernel level VM lock is acquired in operations such as:
- Growing a heap with
mprotect
(write lock)- Faulting in memory during usage (read lock)
- Resetting a heap's contents with
madvise
(read lock)- Shrinking a heap with
mprotect
when reusing a slot (write lock)Rapid usage of these operations can lead to detrimental performance especially on otherwise heavily loaded systems, worsening the more frequent the above operations are. This commit is aimed at addressing the (2) case above, reducing the number of page faults that are fulfilled by the kernel.
Currently these page faults happen for three reasons:
- When memory is first accessed after the heap is grown.
- When the initial linear memory image is accessed for the first time.
- When the initial zero'd heap contents, not part of the linear memory image, are accessed.
This PR is attempting to address the latter of these cases, and to a lesser extent the first case as well. Specifically this PR provides the ability to partially reset a pooled linear memory with
memset
rather thanmadvise
. This is done to have the same effect of resetting contents to zero but namely has a different effect on paging, notably keeping the pages resident in memory rather than returning them to the kernel. This means that reuse of a linear memory slot on a page that was previouslymemset
will not trigger a page fault since everything remains paged into the process.The end result is that any access to linear memory which has been touched by
memset
will no longer page fault on reuse. On more recent kernels (6.0+) this also means pages which were zero'd bymemset
, made inaccessible withPROT_NONE
, and then made accessible again withPROT_READ | PROT_WRITE
will not page fault. This can be common when a wasm instances grows its heap slightly, uses that memory, but then it's shrunk when the memory is reused for the next instance. Note that this kernel optimization requires a 6.0+ kernel.This same optimization is furthermore applied to both async stacks with the pooling memory allocator in addition to table elements. The defaults of Wasmtime are not changing with this PR, instead knobs are being exposed for embedders to turn if they so desire. This is currently being experimented with at Fastly and I may come back and alter the defaults of Wasmtime if it seems suitable after our measurements.
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
peterhuene submitted PR review.
alexcrichton has enabled auto merge for PR #5207.
alexcrichton merged PR #5207.
Last updated: Jan 24 2025 at 00:11 UTC