abrown opened PR #7072 from abrown:pku-rebased-again
to bytecodealliance:main
:
This change implements [ColorGuard] in Wasmtime's pooling allocator. The gory details are covered in the documentation to the code, but, at a very high-level, this allows fitting more memory slots into the same address space used by Wasmtime's pooling allocator. This is done by "striping" the memory slots with different memory protection keys (MPK) and using the faults given by OOB accesses (
SIGSEGV
) to regions protected by other protection keys.[ColorGuard]: https://plas2022.github.io/files/pdf/SegueColorGuard.pdf
This is still at an early stage and will require follow-on PRs. The key goal of this PR is to introduce stripes and refactor the
MemoryPool
logic to allow this. MPK will be off by default for now: turning it on can still cause issues so use at your own risk! Eventually, once these issues are worked out (e.g., via fuzzing), the feature will be set to auto-enable so that the MPK part of this only is active if MPK is available on the system. This feature is mainly relevant to users on x86_64 Linux; a noop implementation fills in the API gaps for other environments.
abrown submitted PR review.
abrown created PR review comment:
Guess I should not upgrade this package. It results in a huge vet certification and the addition of
unarray
which has unsafe code to audit.wiggle
uses version 1.0.0 which is declared exempt assafe-to-deploy
. I think we should only have to audit this assafe-to-run
since it is a test dependency but once declaredsafe-to-deploy
it would seem that we need to vet at the higher level.
abrown updated PR #7072.
abrown updated PR #7072.
abrown updated PR #7072.
abrown updated PR #7072.
abrown updated PR #7072.
abrown updated PR #7072.
abrown updated PR #7072.
abrown updated PR #7072.
abrown updated PR #7072.
abrown updated PR #7072.
abrown updated PR #7072.
alexcrichton submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
s/allocated memory/allocated virtual memory/
alexcrichton created PR review comment:
Could this enum be
pub use
'd here in this module to avoid users from needing to depend onwasmtime-runtime
?
alexcrichton created PR review comment:
If you're ok with it I'd prefer to leave out mention of
wasmtime_runtime
since that's ideally a crate no one depends on but Wasmtime.
alexcrichton created PR review comment:
Could the
.unwrap()
be removed to ensure that the "should panic" part is the internals ofpkey_alloc
?
alexcrichton created PR review comment:
I think this can be replaced with
io::Error::last_os_error()
perhaps? (and returningio::Error
instead ofString
)
alexcrichton created PR review comment:
One thing I might recommend is to investigate the
call
benchmark withperf
to see if there's a perf regression on x86_64. These single-instruction functions for example are good candidates for#[inline]
and will likely be required to get good perf.
alexcrichton created PR review comment:
Here this would actually best be represented as
pub enum ProtectionKey {}
which indicates that it's not only zero-sized but it actually can't exist at all. This would optmizeOption<ProtectionKey>
into a zero-size structure for example on unsupported platforms.
alexcrichton created PR review comment:
s/0/ALLOW_ACCESS/
alexcrichton created PR review comment:
This is something I think that we'll need to watch out for perf-wise in the
call
benchmarks. In theory I was hoping that this would bepkru::write(mask.0)
.
alexcrichton created PR review comment:
This may be best as
trace!
lest it be otherwise pretty verbose
alexcrichton created PR review comment:
Could these two be
unreachable!()
to panic if they're accidentally called?
alexcrichton created PR review comment:
Should
Self
perhaps store the wholelayout
and maybeconstraints
while we're at it? (reducing duplication and keeping things close to their source of truth)
alexcrichton created PR review comment:
I think these docs are now duplicated with the memory pooling ones perhaps?
alexcrichton created PR review comment:
We talked about this in person as well but I think that
limits.memory_pages
andtunables.static_memory_bound
should be communicated separately to theSlabConstraints
.
alexcrichton created PR review comment:
Generally I'm pretty wary of
as
casts related to memory code. It's not really an issue for 64-bit platforms but it's one where it can be an issue for 32-bit platforms possibly. Could the fields inSlabConstraints
have the same types as the inputs here to avoid casts though?
alexcrichton created PR review comment:
This is one check which I think that we'll want to change. By taking into account the two variables I mentioned above I think this may need to change.
Rather I think the rough invariant we want to test here is that for all slots in the layout there are guard bytes before, there's
static_memory_bound + guard
bytes after it to the next slot in the stripe. This I think may be best modeled with extra methods on the layout such asfn slot_offset(&self, stripe: usize, slot: usize) -> usize
or something like that. This loop could then check:for stripe in 0..stripes { for slot in 0..slots { assert_slot_is_valid(&layout, stripe, slot); } }
more-or-less
alexcrichton created PR review comment:
Since there's a method
as_unstriped_slot_index
could there perhaps be a sibiling method for converting in this direction instead of inlining here? Mostly in the hopes that keeping the two near each other would be helpful for future modifications
alexcrichton created PR review comment:
I might approach the calculation a bit differently here, along the lines of:
- There are two main invariants for each slot:
- From the base pointer up to
limits.memory_pages
must be mappable as committed memory to be owned by the slot. AKA this is the minimum size of the slot.- The region from
limits.memory_pages * PAGE_SIZE .. static_memory_bound + guard
must all be guaranteed to be inaccessible by the current slot, AKA it must either be a guard region or a different stripe- The maximum number of possible stripes is then
(static_memory_bound + guard) / (limits.memory_pages * PAGE_SIZE)
. For example with 4G bound, 2G guard, and 1G limit of growth, we can only use 6 stripes.- If we have more stripes than the maximum needed, then everything is adjacent.
- If we have less stripes than the maximum, then each slot is inflated. For example with a 4G bound and a 2G guard and a 256MB limit, that's 24 maximum stripes but we can't ever have that many. If we got 8 stripes then I think each stripe is 6G/8 big, ~750M. This means each slot is 750M and will only ever use up to 256M of the beginning of it.
Whether or not that's simpler or not than this current calculation I'm not sure, but this is at least how I've been thinking of it and how the difference between the memory limit and memory bound I think should be factored in.
alexcrichton created PR review comment:
copy/pasted comment
alexcrichton created PR review comment:
I'd be a bit hesitant about doing some of the above calculations and then page-aligning afterwards. Could the inputs all be page-aligned at the beginning as appropriate instead?
abrown submitted PR review.
abrown created PR review comment:
Just to confirm: we believe these are impossible because an engine configured with an on-demand allocator will never assign a pkey to one of the stores it creates?
abrown updated PR #7072.
abrown updated PR #7072.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Correct yeah, or at least that's the theory
abrown updated PR #7072.
abrown submitted PR review.
abrown created PR review comment:
I was tempted to do that but then worried that it would take up a few extra bytes. If you're ok with that, I'll move those in and maybe remove some of the fields that they duplicate.
abrown submitted PR review.
abrown created PR review comment:
Well, it is very convenient to (1) have all these values use the same type so we can do the arithmetic directly without distracting conversions and (2) so we can interact with all the
std
stuff that just usesusize
. Since they at some point need to be converted to the same type to do the arithmetic, would you prefer if I turned them all intou64
here and then made conversions tousize
in all the places we interact with slices/ranges/etc.?
abrown submitted PR review.
abrown created PR review comment:
Alternately, I could just panic here with
.try_into().unwrap()
? (Or return an error... maybe 32-bit systems don't want to deal with memory sizes they can't address?)
abrown updated PR #7072.
abrown updated PR #7072.
abrown updated PR #7072.
abrown updated PR #7072.
abrown submitted PR review.
abrown created PR review comment:
The more I read your comment, the more I think this logic is essentially saying the same thing. Ignoring the fact that maybe
static_memory_bound
isn't handled yet, the other differences are minor:
- I'm using
guard_bytes
forstatic_memory_bound + guard
andmax_memory_byes
forlimits.memory_pages * PAGE_SIZE
; I think we need to settle on a single variable (bikeshed the name) to contain these values to avoid the mental juggling of "oh, we mean how big the memory is by this calculation"- we need at least 2 stripes here, see
needed_num_stripes
;guard_bytes / max_memory_bytes
is essentially what you wrote up above, but we need to handle the case where this doesn't divide evenly (usize::from(...)
)- we need to minimize the number of stripes, see
num_stripes
: we may have fewer pkeys or memory slots than stripes needed.- to handle the slot-inflation case, I calculate
needed_guard_bytes
--I'm considering this the entire region after a memory that needs to be inaccessible, a.k.a the MPK-reduced original guard size
abrown submitted PR review.
abrown created PR review comment:
max_memory_bytes
should be Wasm-page aligned but we need to realign after all the division, subtraction, etc.
fitzgen submitted PR review.
fitzgen created PR review comment:
usize::try_from(...).unwrap()
would be my preferred route, if conversions are necessary and we can't just make them go away.
fitzgen submitted PR review.
fitzgen created PR review comment:
Haven't finished reading through stuff but: will
pkeys.len()
andlayout.num_stripes
ever be different? If so, I find that pretty surprising. What's the motivation for that?
fitzgen created PR review comment:
If you want:
self.stripes().iter().all(|s| s.allocator.is_empty())
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
I like these typed conversions between index types.
fitzgen submitted PR review.
fitzgen created PR review comment:
"Comprehends" reads a little weird here?
/// The size of each slot in the memory pool; this contains the maximum
fitzgen submitted PR review.
fitzgen created PR review comment:
Proposed alternative API to mpk from our discussion the other day:
struct PKeys(u32); // mask of pkey bits we are allowed to use impl PKeys { pub fn new() -> PKeys; // do this in a OnceLock, allocates pkeys from OS and creates the mask pub fn all_keys(&self) -> PKeySet; // Get the set of all of our pkeys // following methods have debug asserts that we only ever have pkeys that fit our mask pub fn activate(&self, pkey: PKey); // activate one pkey pub fn activate_many(&self, set: PKeySet); // activate many pkeys at the same time pub fn deactivate(&self, pkey: PKey); // deactivate one pkey pub fn deactivate_many(&self, set: PKeySet); // deactivate many pkeys at the same time pub fn protect(&self, pkey: PKey, region: ...); // do the pkey_mprotect thing on this region, associating it with the given pkey } impl Drop for PKeys { // return our allocated pkeys to the OS } struct PKey(u32); // just the bits for this pkey, must be within the PKeys mask struct PKeySet(u32); // bit set of multiple pkeys, all within the owning PKeys mask impl PKeySet { pub fn empty() -> Self; // no bits set }
fitzgen submitted PR review.
fitzgen created PR review comment:
We might want to name this something a little more specific to MPK.
MpkEnabled
?Alternatively, I can imagine other scenarios where we will want tri-state, should-we-enable enums like this and we could keep this name but then make its documentation not mention MPK.
fitzgen submitted PR review.
fitzgen created PR review comment:
Can we export a
PoolingAllocationConfig::mpk_available() -> bool
static method? We have that internally anyways, and it seems like it would be useful to embedders.
fitzgen submitted PR review.
fitzgen created PR review comment:
/// By default this value is `disabled`, but may become 'auto' in future releases.
fitzgen edited PR review comment.
abrown submitted PR review.
abrown created PR review comment:
Yes: imagine the case where we (for some odd reason) only want a pooling allocator to have five memory slots. We can only have a maximum of five stripes. If we have fifteen keys available, we need to pick a subset of them for striping.
abrown updated PR #7072.
abrown updated PR #7072.
abrown created PR review comment:
I agonized over this suggestion when I was refactoring this PR, trying to make it work. I _would_ like to have a more clear holding structure for the protection keys, especially in case we need to drop them at some point. But once we store things away in the
mpk
global state (OnceLock
) there's no point to managing the lifetime with a separate struct; plus, it requires more code iterate over the keys, etc. What I settled on wasProtectionMask
, which incorporates part of this, and then I just kept top-level functions (keys
,allow
, etc.) to make it more clear that the module itself is containing the state, not some struct. I do think we could re-factor this a bit as things move along; I left aTODO
in there about this.
abrown submitted PR review.
abrown edited PR review comment.
abrown updated PR #7072.
abrown updated PR #7072.
abrown submitted PR review.
abrown created PR review comment:
I think
wasmtime_runtime::mpk::is_available()
should be public already which is in the same crate?
abrown updated PR #7072.
abrown submitted PR review.
abrown created PR review comment:
Renamed to
MpkEnabled
.
fitzgen submitted PR review.
fitzgen created PR review comment:
wasmtime_runtime
is an implementation detail that isn't available to users of the publicwasmtime
API. So I am imagining that we would expose that functionality in the publicwasmtime
API, and as a static method onPoolingAllocationConfig
seemed like a nice place but I am happy to bikeshed.This is also something that can be added after the initial MPK support lands.
abrown submitted PR review.
abrown created PR review comment:
I realized as I went back over this that we don't ever need to iterate over slots: the layout has the sizes for every slot, so we just need to check that the layout is correct once:
assert!(s.slot_bytes * s.num_stripes >= c.expected_slot_bytes + c.guard_bytes)
.
abrown updated PR #7072.
abrown submitted PR review.
abrown created PR review comment:
Ok, I think I'm doing this more correctly in 52b245a7d. I'm internally using
expected_slot_bytes
as the name since it is more descriptive (to me) thanstatic_memory_bound
.
abrown updated PR #7072.
abrown updated PR #7072.
abrown updated PR #7072.
abrown updated PR #7072.
alexcrichton submitted PR review:
Nice I like how the calculation ended up :+1:
Could this still have a test though which is along the lines of "check the guard/inaccessible property for the first slot of each stripe, the last, and one in the middle"? Currently there's "Check that the memory-striping will not allow OOB access" but if you're ok I'd like to be a bit more pedantic and assert individually for each slot in the stripe (assuming all "middle slots" are the same and only the first/last/middle are the interesting ones)
alexcrichton submitted PR review:
Nice I like how the calculation ended up :+1:
Could this still have a test though which is along the lines of "check the guard/inaccessible property for the first slot of each stripe, the last, and one in the middle"? Currently there's "Check that the memory-striping will not allow OOB access" but if you're ok I'd like to be a bit more pedantic and assert individually for each slot in the stripe (assuming all "middle slots" are the same and only the first/last/middle are the interesting ones)
alexcrichton created PR review comment:
as a "neat trick" you can replace the body of this method with
match *self {}
which is a way to code-document "this isn't reachable" and it's compiler-checked where if it becomes reachable that becomes an error one day. Same with the one below, and it avoids the question of "what if this is accidentally called?"
alexcrichton created PR review comment:
One thing we talked about which I wanted to write down here (but doesn't have to be handled in this PR) -- if a newly spawned thread disallows all but protection key 0 then that's a problem for us since we only frob the mask in/out of wasm and new threads may not do that. (e.g. think like a hostcall that gets a chunk of wasm memory and uses rayon to operate on it)
alexcrichton created PR review comment:
Mind updating these variable names with their
layout.$name
equivalent?
alexcrichton created PR review comment:
It might be worth expanding this to say that the compression and fitting things in the guard region is what's theoretically happening but representation-wise we're shrinking the slot instead and allocating more slots.
alexcrichton created PR review comment:
This I don't think is quite right because the
bound
here is actually a function of the distance from one stripe's slot to the next slot in the same stripe. I thinkslot_bytes
is the per-slot size which doesn't account for this?
alexcrichton created PR review comment:
I saw the
.try_into()
above to avoid a lossyas
, and sorry you may have already mentioned this, but could the inputs here beu64
instead ofusize
? (or whatever the original type was for each variable)Otherwise this is a portability risk to 32-bit where a 64-bit size is requested causing the
.try_into().unwrap()
above to panic instead of return an error. Since many calculations below already handle overflow I figure it'd be easier to handle it there then above.
alexcrichton created PR review comment:
Instead of returning a
Result
here could theResult
be pushed tocalculate
instead? That way this function itself would be infallible
alexcrichton created PR review comment:
Also reading further I see now that this is infallible if
calculate
returnsOk
.In that case I think it's fine to leave this as-is, I see how it's hard to deduplicate otherwise.
alexcrichton created PR review comment:
Could this return
(u32, Self)
to move inference of the stripe fromindex
here too?
alexcrichton created PR review comment:
Could this additionally check that the guard sizes are also page aligned?
abrown submitted PR review.
abrown created PR review comment:
If some 32-bit user requests some kind memory configuration requiring 64-bit-sized addresses, how does that even work on a 32-bit machine? Don't we want to fail them early? Perhaps with a more descriptive error than an
unwrap()
?
abrown submitted PR review.
abrown created PR review comment:
(Part of the reason this is readable is because of this "use
usize
everywhere" choice. When I switch tou64
there are 40+ other places that will need to be fixed up withtry_into
oras
).
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Definitely making a request on a 32-bit machine that can't be satisfied should return an error, but IMO it should be a normal error instead of a panic. Also to clarify I don't think everything should change to
u64
, only this structure. TheSlotLayout
structure should definitely useusize
as it's tailored for the current machine, and my hope would be thatcalculate
is the only location whereu64
is translated tousize
, fallibly
abrown updated PR #7072.
abrown submitted PR review.
abrown created PR review comment:
Ok, I misunderstood... for some reason I thought we were trying to refactor SlabLayout. Let me try this...
abrown updated PR #7072.
abrown has marked PR #7072 as ready for review.
abrown requested fitzgen for a review on PR #7072.
abrown requested wasmtime-core-reviewers for a review on PR #7072.
abrown requested wasmtime-default-reviewers for a review on PR #7072.
abrown updated PR #7072.
abrown has enabled auto merge for PR #7072.
abrown merged PR #7072.
Last updated: Nov 22 2024 at 16:03 UTC