abrown opened PR #7603 from abrown:pku-optimize
to bytecodealliance:main
:
While experimenting with the limits of MPK-protected memory pools, @alexcrichton and I discovered that the current slab layout calculations were too conservative. This meant that the memory pool could not pack in as many memories as it should have been able: we were expecting, but not seeing, ~15x more memory slots over non-MPK memory pools.
The fix ends up being simpler than the original: we must maintain the codegen constraints that expect a static memory to be inaccessible for OOB access within a
static_memory_maximum_size + static_memory_guard_size
region (calledexpected_slot_bytes + guard_bytes
inmemory_pool.rs
). By dividing up that region between the stripes, we still guarantee that the region is inaccessible by packing in other MPK-protected stripes. And we still need to make sure that thepost_slab_guard_bytes
add up to that region. These changes fix the memory inefficiency issues we were seeing.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
abrown updated PR #7603.
abrown has marked PR #7603 as ready for review.
abrown requested pchickey for a review on PR #7603.
abrown requested wasmtime-core-reviewers for a review on PR #7603.
alexcrichton commented on PR #7603:
Looks good to me!
I'd be happy to approve and land this as-is, but we talked about some other somewhat-unrelated things to this PR which I think you're going to include here as well.
I'll also note that I have one point of note which I'll emphasize shouldn't be addressed in this PR because I think it's not worth it. That being said I think that this strategy is a tiny bit wasteful in terms of pages because the slot size is unlikely to be a multiple of the number of stripes meaning something will be rounded up to a page boundary. For example with a 4G memory, a 2G guard, and a 512M max memory size that'll support up to 12 stripes (6G/512M == 12). If however only 11 stripes were available then what should in theory happen is that one 6G slot would have 11 memories of max size 512M packed together with some leftover guard space at the end. That would mean 11 memories would be packed into a 6G region.
With this PR, however, what would happen is that the 6G region would be divided into 11 chunks of size 0x22e8ba2e. That'd then be rounded up to 0x22e8c000 for a single slot, which if that's multiplied back by 11 yields 0x180004000 bytes. This means that with "ideal logic" a 6G region could hold 11 memories but this calculation instead requires a 6G+16K region to hold the 11 memories.
This is a very minor loss, however (6G vs four pages) however and accounting for some guard regions interspersed with all the slots would require deeper refactoring. That's why I want to note this, but I don't think it's worth changing.
alexcrichton submitted PR review.
abrown merged PR #7603.
Last updated: Dec 23 2024 at 12:05 UTC