cfallin opened PR #3697 from memfd-cow
to main
:
Add a pooling allocator mode based on copy-on-write mappings of memfds.
As first suggested by Jan on the Zulip here [1], a cheap and effective
way to obtain copy-on-write semantics of a "backing image" for a Wasm
memory is to mmap a file withMAP_PRIVATE
. Thememfd
mechanism
provided by the Linux kernel allows us to create anonymous,
in-memory-only files that we can use for this mapping, so we can
construct the image contents on-the-fly then effectively create a CoW
overlay. Furthermore, and importantly,madvise(MADV_DONTNEED, ...)
will discard the CoW overlay, returning the mapping to its original
state.By itself this is almost enough for a very fast
instantiation-termination loop of the same image over and over,
without changing the address space mapping at all (which is
expensive). The only missing bit is how to implement
heap growth. But here memfds can help us again: if we create another
anonymous file and map it where the extended parts of the heap would
go, we can take advantage of the fact that ammap()
mapping can
be larger than the file itself, with accesses beyond the end
generating aSIGBUS
, and the fact that we can cheaply resize the
file withftruncate
, even after a mapping exists. So we can map the
"heap extension" file once with the maximum memory-slot size and grow
the memfd itself asmemory.grow
operations occur.The above CoW technique and heap-growth technique together allow us a
fastpath ofmadvise()
andftruncate()
only when we re-instantiate
the same module over and over, as long as we can reuse the same
slot. This fastpath avoids all whole-process address-space locks in
the Linux kernel, which should mean it is highly scalable. It also
avoids the cost of copying data on read, as theuffd
heap backend
does when servicing pagefaults; the kernel's own optimized CoW
logic (same as used by all file mmaps) is used instead.There are still a few loose ends in this PR, which I intend to tie up
before merging:
There is no
InstanceAllocationStrategy
yet that attempts to
actually reuse instance slots; that should be added ASAP. For testing
so far, I have just instantiated the same one module repeatedly (so
reuse naturally occurs).The guard-page strategy is slightly wrong; I need to implement the
pre-heap guard region as well. This will be done by performing
another mapping once, to reserve the whole address range, then
mmap'ing the image and extension file on top at appropriate
offsets (2GiB, 2GiB plus image size).Thanks to Jan on Zulip (are you also @koute from #3691?) for the initial
idea/inspiration! This PR is meant to demonstrate my thoughts on how
to build the feature and spawn discussion; now that we see both approaches
hopefully we can work out a way to meet the needs of both of our use-cases.
cfallin requested alexcrichton for a review on PR #3697.
cfallin requested peterhuene for a review on PR #3697.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin has marked PR #3697 as ready for review.
alexcrichton submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Personally with conditional compilation things my goal would be to minimize the number of
#[cfg]
necessary, so to that end I think this might be a bit nicer have the body ofis_memfd_with_image
conditionally defined instead of consumers being conditionally defined.Otherwise afterwards if we're worried about performance here I think it's best to use
if cfg!(...)
to handle performance issues. Without#[cfg]
we can subvert lints like thedrop(module)
below and also ensure that this compiles all the time instead of in only one configuration.
alexcrichton created PR review comment:
This isn't applicable to this location in particular, but is there any reason to have this feature conditional? Naturally it's Linux-specific no matter what but I'm not sure why we wouldn't want to turn it on by default on Linux. Or otherwise this ideally seems like a
Config
option which is checked at runtime with compile-time abilities to compile it out if truly necessary. (I'm not sure if it's required we get this to an able-to-be-compiled-out-state though, we don't have really any restrictions on our runtime footprint at this time)
alexcrichton created PR review comment:
Would you be ok splitting the benchmark bits out to a separate PR?
alexcrichton created PR review comment:
Mind folding this into the block above with s/Test uffd functionality on Linux/Test Linux-specific functionality/
alexcrichton created PR review comment:
I think you'll want to reset the
info
field a line or two above this line as well.
alexcrichton created PR review comment:
For this with now multiple-methods of creating memories I think it might be best to have something like:
match &memory.kind { #{cfg(feature = "uffd")] MemoryKind::Uffd { something } => something.reset_guard_pages(), #{cfg(feature = "memfd-allocator")] MemoryKind::Memfd { something } => { something.clear_and_remain_ready(); continue; } MemoryKind::Mmap => /* ... */ }
(or something along those lines)
alexcrichton created PR review comment:
Could you add some docs to what these fields are?
alexcrichton created PR review comment:
I'm sure you were planning on already doing this, but to ensure there's a note here as well I think it'd be best to split out the
VMCallerCheckedAnyfunc
changes to a separate PR
alexcrichton created PR review comment:
I think Rust-convention-wise this would be
Fd
rather thanFD
(e.g.Http
instead ofHTTP
)
alexcrichton created PR review comment:
The
UnsafeCell
here and theunsafe impl Send
are a bit worrisome to be.How would you feel about a scheme where this was something like
AtomicPtr<MemFDSlot>
and we transferred ownership dynamically between a pool and theMemory
that holds it? That way we could have what I think would be pretty cheap runtime asserts that the ownership here all works out and no risk of accidental races on this.
alexcrichton created PR review comment:
I think a better name for this might be
unwrap_mmap
to signal the panic, but if this is dead code it's probably safe to remove to (or is the#[allow]
just for a#[cfg]
?)
alexcrichton created PR review comment:
FWIW it's methods like this that make me pretty uncomfortable about the
UnsafeCell
and how I think it would be better to have anAtomicPtr
or similar which is always used to transfer ownership to a single owner at a time. Promiting from a&self
to&mut MemFDSlot
is pretty scary here and easy to get lost when reading other code.
alexcrichton created PR review comment:
I think I've mentioned this before, but personally I feel like this would be better to not have so different between memfd-and-not. It seems like it might be simpler to still have one giant mmap for the entire memory pool and then memfd just does fancy stuff internally with everything as
MAP_FIXED
.If instead we think it's better to have an mmap-per-slot then I think that the pooling allocator today to switch to that strategy. Basically I think it might be a bit more consistent to have this be the same across memfd and not.
alexcrichton created PR review comment:
Personally I've found that it's a pretty annoying thing to keep perfect-dead-code-warnings with lots of platform-specific code. I typically solve it by having "esoteric" configurations have this at the top of the crate:
#![cfg_attr(feature = "memfd-allocator", allow(dead_code))]
to deal with it at the beginning and not worry about it later. Basically I don't think we care much about dead code in configured situations, only in the default case do we largely care about removing dead code.
alexcrichton created PR review comment:
I think this method is used to guard against initialization loops during instantiation, but this feels like they're asking/answering different questions. I believe that
image
is none if the heap is empty, right? I think this may want a different method name or the callers use a different scheme for calling this or something like that. I suspect that the end-result is the same (if you have an empty heap you probably have 0 initializers), but it feels a bit confusing reading this now seeing what might be a disconnect.
alexcrichton created PR review comment:
Personally I wouldn't mind the duplication in this method and
from_open_file
. I feel like seeing themmap
call is a bit more readable without having to have comments for all the parameters.
alexcrichton created PR review comment:
I think this may be able to be removed now?
alexcrichton created PR review comment:
Looking through this I don't think there's a method for removing from
MemFdRegistry
? I think it'd be pretty easy to add though and just needs a hook inDrop for ModuleInner
or w/e we have inModule
alexcrichton created PR review comment:
This method and logic feels really similar to the
to_paged
method onMemoryInitialization
, could some of this be offloaded toimpl MemoryInitialization { .. }
to try to share some bits and pieces if possible?
alexcrichton created PR review comment:
Personally I like to use:
let defined_memory = match ... { Some(e) => e, None => continue, }; // ...
to avoid an extra level of indentation.
alexcrichton created PR review comment:
I'm wary of casts like this but I feel like this sort of validation has already happened elsewhere when we loaded the module. I think though that if this were moved to
impl MemoryInitialization
it may make it easier to share this with other pieces and comment in one place that the cast is fine.
alexcrichton created PR review comment:
Given that this set is almost always 1 could this perhaps be
PrimaryMap<Index, bool>
to be a bit more efficient?
alexcrichton created PR review comment:
I think this ends up not being used nowadays, right? If so I think it'd be fine to remove it.
alexcrichton created PR review comment:
I've since forgotten what
base
is so reading this I'm sort of naively assuming that it's aglobal.get
or something like that. Could this loop have some more comments about why paraticular segments are skipped?
alexcrichton created PR review comment:
I think this might be a good helper method to add on
MemoryPlan
or similar to avoid casts/multiplies inline here
alexcrichton created PR review comment:
It seems like the purpose of this loop is to calculate what to pass in to
set_len
, but is that the only purpose of the loop? If so I thought thatwrite_at
would automatically resize the file for us? Otherwise could we keep track of the current size and automatically callset_len
before awrite_at
below to avoid having this double loop?
alexcrichton created PR review comment:
To confirm, this is cloexec-by-default, right?
alexcrichton created PR review comment:
Why do we deal with seals on the memfd?
alexcrichton created PR review comment:
Is this required? (I'm not sure of mmap's behavior of non-page-aligned files or if it's like more fast this way)
alexcrichton created PR review comment:
This actually brings up an interesting maintainability hazard though. We're sort of manually destroying things here but I think it might actually be best to
std::ptr::drop_in_place
if we can followed by astd::ptr::write
if you're up for trying out such a refactoring.
alexcrichton created PR review comment:
I realize though that some of my questions are equally applicable to uffd which is a compile time feature as well. I don't think we have the best compile-time story there either, so it's not necessarily required to solve this here and now.
cfallin submitted PR review.
cfallin created PR review comment:
The main reason for conditionals at all is that the memfd allocator essentially takes over the pooling allocator, and the pooling allocator also works (IIUC) at least on macOS, maybe on other platforms too.
We could definitely do all of the conditional switching dynamically, of course; that's probably the right answer long-term. I think I was just mainly following uffd's lead here. I'm happy to do this refactor as a followup if you'd like!
cfallin created PR review comment:
Done (#3733)
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
My thinking was mainly as an additional security mitigation layer: this image is the starting point for all instantiations, so if we somehow had a bug that e.g. allowed an instance to directly write to it, or if some other security vulnerability allowed an attacker to direct a write to an arbitrary open fd, then all new instances would be affected. As long as the API exists, why not use it to make the thing read-only?
(I'm happy to add a comment noting this reasoning!)
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Done! (#3736)
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Fixed, good point!
cfallin submitted PR review.
cfallin created PR review comment:
Ah, yes. I went looking for
info
and realized this was part of the lazy-anyfunc changes that have been split off. But that just proves the point more: it's entirely unobvious in the other PR that this would need to change, and a review of that PR could have easily missed it.I think the invariant of keeping
Instance
s initialized-but-empty and "hot-patching" the fields was probably intended to keep thePrimaryMap
s from being freed/allocated on each instantiation -- at least that's one benefit I see -- but I agree that the risk here is probably greater than the reward, and actually aligns with some of the points above re: instantiation from scratch vs. erasing state to reuse, just on a micro scale.I went ahead and refactored so that the instances remain uninitialized memory until instantiation, and are
drop_in_place()
'd on termination. Good catch!
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
Added!
cfallin submitted PR review.
cfallin created PR review comment:
Renamed.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Done.
cfallin submitted PR review.
cfallin created PR review comment:
Yup, that's exactly it (content depending on globals, which could vary based in imported modules). Added a comment.
cfallin submitted PR review.
cfallin created PR review comment:
Done.
cfallin submitted PR review.
cfallin created PR review comment:
Yep! Re-enabled test.
cfallin submitted PR review.
cfallin created PR review comment:
The other purpose of the loop is to look for any reasons we can't use a memfd (dynamically-placed segments, out-of-bounds segments) before we commit to one. I suppose we could allocate one and then throw it away if we see an error... actually I'll try that, it seems like a sensible optimization.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Added comment saying basically the above.
cfallin submitted PR review.
cfallin created PR review comment:
Indeed, it appears so.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Just refactored to make this the case again -- much simpler now. Thanks for pushing in this direction!
cfallin submitted PR review.
cfallin created PR review comment:
Yup, I agree. FWIW my original thinking was that we already had the same mutable-access-to-slot thing going on implicitly, because the pooling and uffd allocators were making syscalls to mutate individual slots and relying on higher-level instance mutual exclusion for safety. The difference is just that we didn't have an actual Rust object embodying the slot and actions on it previously.
In any case, the "take ownership and return it" pattern is much nicer; I feel better with this additional level of checking/safety. I ended up doing this with a mutex per slot, just for simplicity and to avoid gratuitous unsafe code and manual box<->raw handling; that's probably acceptable overhead but I'm happy to refactor to actual atomic-exchange magic if not.
cfallin submitted PR review.
cfallin created PR review comment:
I added a comment here to explain what's going on -- I had actually already forgotten the subtleties and had to debug a wee bit, but it turns out that this is important for handling invalid data initializers (e.g. out-of-bounds) properly. The invariant is that the memfd-construction code may reject creating a memfd for certain reasons, e.g. dynamically-placed segments, or out-of-bounds segments, or whatever else we need to reject. The MemFD mechanism may still handle the memory, and it may be initialized normally by writing into the anon-mmap memory; so really what we want is exactly "are we using an already-prepared image" and if not, we do the initializers. Added a better doc-comment here to explain.
cfallin submitted PR review.
cfallin created PR review comment:
Good point, I had forgotten about that! That would be a nice surprise in a long-running process. Added a drop-handler that removes from the registry as suggested.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
I think it's probably a good idea; IIRC the Linux semantics are that an mmap of a non-page-aligned-size file sees zeroes for the remainder of the last page, but that may be weird/nonportable behavior.
cfallin submitted PR review.
cfallin created PR review comment:
Fixed as per comment below!
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
I put some of the bounds checks in
MemoryInitializer
; a lot of the other logic is sort of just-different-enough to make it tricky to share. Maybe with more thought a "visitor" sort of interface to the image implied by memory initializers could be built; that feels slightly out of scope here though, to me at least (happy to push more on it if you'd like).
cfallin created PR review comment:
Yup, good idea, I moved the bounds checks to helpers on the MemoryInitializer.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
OK, so I refactored this and it's not significantly simpler, but it is at least a little more straightforward.
cfallin submitted PR review.
cfallin created PR review comment:
Yup, good idea!
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
I like that approach, but I think it's a significant refactor of the code, including uffd code, because it's often not just a simple "approach A does this block, approach B does that block", but e.g. uffd or memfd adds just one variation.
I think this might make sense to do in a followup "remove all the static config by feature flags" PR -- I could both put uffd and memfd under dynamic config flags, and clean up the conditional compilation stuff at the same time. Would you mind if I punted until that?
alexcrichton submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Could this be merged with
new_static
and it grows various arguments to match the contents ofMemory::Static
?IIRC the
let base = match maximum { ... }
was non-trivial and I forgot it the first time I wrote something awhile back, so having it not-duplicated I think would be best.If there's issues with
MemFdSlot
being conditionally defined I think it would be reasonable to have something like:#[cfg(not(feature = "memfd-allocator"))] enum MemFdSlot {}
which should make
Option<MemFdSlot>
a 0-sized struct and statically assert we don't try to actually create one.
alexcrichton created PR review comment:
Could this use
u64::try_from(...).unwrap()
?I realize it's much wordier but when it comes to dealing with memory I personally prefer to use fallible conversions for everything that's not necessarily immediately trivially obvious that it's a successful conversion (ideally having no
as
conversions at all but that's not always possible)
alexcrichton created PR review comment:
I think we can actually avoid saving this internally since initialization below should I think take into account the specific module being instantiated. For example there's a call to
PrimaryMap::with_capacity(self.module_limits.memories as usize)
but that's not necessary if the module being instantiated has fewer memories than the limits.I think this could probably be achieved by extracting this logic to a helpe method and using that instead of the
initialize
method below perhaps?
alexcrichton created PR review comment:
Along the same lines as the seals elsewhere, perhaps this could have a seal for writing to guarantee it's always zero? (IIRC the seal for writing doesn't affect shrinking/growth)
alexcrichton created PR review comment:
The comment here about "Try to unregister" feels a bit misleading because this seems like there's less of an attempt here and more of a "this had better be gone" sort of feeling.
Additionally I'm not sure now that I think about it that this is the right place for this destructor. And actually now that I think for a moment I'm a bit confused about this design.
This design currently has a "registry" in an engine of memfds-per-module, but it seems like an alternative (and perhaps simpler?) design would be to store memfd information here in the
Module
itself. I see now that during the first instantiation of a module we create the memfd image if memfd is enabled, but it seems like that work would be best done at module-creation-time if memfd is enabled.WIth per-module memfd information I think that would remove the need for a registry entirely?
alexcrichton created PR review comment:
While I don't think this is really an issue this feels morally to me like the size here shouldn't include the guard size. I naively expect that the
MemFdSlot
is only responsible for the linear memory, and the guard pages are concerns of other layers of the stack.
alexcrichton created PR review comment:
I commented below about error cases, but I think this is one error we're definitely not going to want to ignore. If the
madvise
fails for whatever reason I think that would definitely leak contents of memory between module instantiations which we don't want. On error here it seems like we should either panic to assert it's not actually happening or we discard the slot so the mappings are entirely redone on reuse.
alexcrichton created PR review comment:
I understand the theoretical design behind this, but out of curiosity have you been able to benchmark the difference between
ftruncate
and just-map-anonymous-memory?
alexcrichton created PR review comment:
FWIW you can avoid the "syntax salt" here with:
if let Some(existing_image) = &self.image {
alexcrichton created PR review comment:
One possible simplification is that we could remove this field and have the extension file always mapped for the entire size of the wasm heap. That way we wouldn't need to do subtraction or anything in the
set_heap_limit
method and could instead simply forward to.set_len()
?
alexcrichton created PR review comment:
Reading over this one concern I just had was "what happens if an error is returned and this flag isn't set?"
It looks like the
MemFdSlot
structure will be discarded and recreated the next time that it's requested (becauseMemFdSlot
creation is lazy). To confirm, is that intended?One possible idea I had is there could be a
Drop for MemFdSlot
which checksself.dirty
and unmaps the information if it's there (and also settingself.dirty = true
at the top of this function instead of at the end). I'm not sure if that really buys much benefit though.In any case I was curious if you'd thought about the error case here and the impact it might have?
alexcrichton created PR review comment:
I had a big comment about how this didn't seem to handle trailing zeros after the image, only leading zeros, which I ended up deleting because I believe that this does handle it and I just missed it in the documentation.
Could this explicitly call out the orderings the mmaps are done to indicate how overlapping regions are handled and the cow image will naturally be flanked by zero'd memory as necessary?
alexcrichton created PR review comment:
If possible this is another place that I think would be best to share with the preexisting
set_instance_memories
method. It seems like this wants to share a fair bit of the logic and only tweaks things by acquiring a memfd slot, and that seems like it could be done in a more surgical #[cfg]'d fashion instead of duplicating the whole method.
alexcrichton created PR review comment:
Could this use
memory_plans.iter()
and ... I forget the method name but I'm sure we have one to convert fromMemoryIndex
toDefinedMemoryIndex
... to avoid usingfrom_u32
?
alexcrichton created PR review comment:
That all sounds like really solid reasoning to me, thanks for the comment!
alexcrichton created PR review comment:
Reading over this again I realize that this feels a bit out-of-sync with uffd because uffd also wants to skip memory initialization but this "skip the init" check is in a different location than the uffd one.
Could the check for uffd get moved over to https://github.com/bytecodealliance/wasmtime/blob/921fb6c73e4443ac534e9d4477d9f8cab318c166/crates/runtime/src/instance/allocator/pooling.rs#L1299-L1326 so "skip memory init" is only in one location?
alexcrichton created PR review comment:
Oh also one thing I forgot from earlier, I feel like this wants to do the same thing as uffd ("just call madvise") but it's done in a different fashion. I think that this location (which currently madvise's manually and then
continue
s to skip further logic) would be a good place to try to better unify what uffd and memfd are doing. Ideally this could fall through to the shared call todecommit_memory_pages
or something like that, or otherwise make it more clear that uffd and memfd are doing roughly similar things.
alexcrichton created PR review comment:
I actually think that having this conditionally-defined type would be useful for other things as well, it would remove all the
#[cfg]
below with something like:impl MemFdSlot { fn has_image(&self) -> bool { match self {} } fn set_heap_limit(&mut self) -> Result<()> { match self {} } }
(etc)
alexcrichton created PR review comment:
One other benefit of moving information out of the
Engine
is that it would avoid bouncing on a lock every time an instantiation is made which can be somewhat expensive
alexcrichton created PR review comment:
(I think this also means the change below to
initialize_instance
can be reverted as well)
cfallin submitted PR review.
cfallin created PR review comment:
Ah, this refactor is so much nicer. I did this for
MemFdSlot
and also theModuleMemFds
below (just a zero-sized type if memfd support is not included).
cfallin submitted PR review.
cfallin created PR review comment:
Good point -- there actually was no need for a registry system at all. I've made the
wasmtime::Module
own the memfd images. Thanks!
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Done -- extracted a common core between this and the on-demand allocator to build a raw
Instance
.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, yes, I'm using
.values()
here but the default.iter()
on aPrimaryMap
gives us an iterator that actually just gives the keys (the strong index type) directly. Much nicer.
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
That's my old-school (pre-2016 or something?) Rust showing... but then look who I'm talking to :-)
(I actually kinda like the explicitness and the lack of magic in the pattern-matching with precise matching
&
s andref
s, but maybe my brain's internal typechecker is just weird...)
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
That's a very good point. I think the safest answer to this is that if some mmap error occurs within the big mmap used by the pooling allocator for all memories, all bets are off and our only safe/secure option is to panic the process. Otherwise, unless we can carefully reason about why the remapping failed, we have a slot that is permanently unusable, so the system is degraded (slots die over time due to errors and we have no way of replacing them). And we have no way to return an error from an individual instantiation that means "I not only failed allocation but blew up the allocator... please throw everything away and start over". So, replaced with
.expect()
. If you have other ideas here though I'm very curious!
cfallin submitted PR review.
cfallin created PR review comment:
I have not; avoided that due to the use of the global mmap lock by
mprotect()
, taking the fact that we should avoid that on received wisdom / experience from earlier pooling-allocator folks. It would be interesting to see exactly how much it matters for different workloads, and there is indeed a tradeoff (in that the CoW-of-memfd memory is slower, as we've discovered).
cfallin submitted PR review.
cfallin created PR review comment:
As above, made this panic the process if it fails, since such a failure leaves the whole pooling allocator in an unknown state.
Re: uffd -- this gets to the bigger question of how much we can share, but: given that (i) just enough stuff differs in the details between what memfd does and uffd does, and (ii) uffd may go away if memfd works better, I'm somewhat hesitant to refactor to try to share more logic when it's not an easy/simple change. It starts to get into the territory of "clean up uffd", which feels out-of-scope for this PR. My experience with trying to share too much logic as well is that sometimes it becomes hard to see through all the conditionals, and it can actually be simpler to maintain when the two disjoint approaches are kept separate. (See e.g. Cranelift backends and separate instruction lowering rules, vs. big mess of conditional logic. Not exactly the same, but similar in spirit.)
cfallin submitted PR review.
cfallin created PR review comment:
Yeah, I agree in principle; this is just following how the rest of the pooling allocator works. Could definitely be cleaned up in a followup.
cfallin created PR review comment:
We could, yeah; the reason I did it this way was to avoid potentially wasting resources. (The part of the file shadowed by the initial heap size will be a "hole", but there may still be a small resource cost in tracking per-page info or things like that.)
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
As mentioned above, I'm a little wary of this PR becoming "let's refactor and clean up all of uffd while we're at it". If we have issues with uffd's design, perhaps we could save them for another PR?
cfallin submitted PR review.
cfallin created PR review comment:
I unified this with the above, given the conditional-at-the-lowest-level strategy you outlined -- thanks a bunch for that, it's much cleaner now!
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Oh I'm happy to panic the process in these cases, I just wasn't sure what the intention was. I agree that panicking is the best option here.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Nah I think that makes sense to not clean up uffd while you're here, but this feels like a relatively easy thing to do in that there's already a location for uffd where we skip memory initialization, and intentionally not using that location (when I think it works right?) for memfd feels odd
alexcrichton submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I think the
.expect
can be removed here and thebase
check above removed in favor of moving thematch
to here?
alexcrichton created PR review comment:
If this is still necessary then I think this should be done at a lower level where it's more obvious which field causes the automatic inference to not work and a comment can say why this is needed.
alexcrichton created PR review comment:
This organization seems a bit odd where I would expect the "dummy" implementation and the real implementation to live roughly in the same area, similarly they would also theoretically mirror the dummy/real implementation of
ModuleMemFds
. Could there perhaps becrates/runtime/src/memfd.rs
andcrates/runtime/src/memfd_disabled.rs
which have all the structs and all the dummy structs?(unsure if this
MemFdSlot
needs to be defined in this file)
alexcrichton created PR review comment:
I think where possible it's best to replace this with something like
match self {}
which the compiler will allow sinceself
is an empty enum and is a good way of showing that this is statically not reachable.
alexcrichton created PR review comment:
Well to some extent I don't really think this needs a major refactoring. While ideally we have
MemoryKind
variants or something like that what we have today in this PR is twoMemory
variants where one has fields which configures uffd or memfd if they're enabled. I think that this logic in this particular loop can still be structured as "here's an arm for each case" as opposed to "memfd is special and gets the early-continue
". I was originally confused reading this loop because the loop always ends withmadvise
and it sort of seems like memfd doesn't end withmadvise
when it in fact does.
alexcrichton created PR review comment:
I feel like our error handling here is sort of half-measures in the sense that if this initialization fails then an abort happens but if
Memory::new_static
fails below it's bubbled up and handled. In an error below (or later for a memory afterwards) it seems like it would still result in dropping theMemFdSlot
which itself doesn't do any cleanup and leaves the slot in a somewhat indeterminate state?In some sense this may not be the end of the world really about having an indeterminate state because allocation of a memfd slot here always redoes the entire mapping for the entire memory, so if something is "leaked" it's only temporary in the sense that no wasm will be using that slot?
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
Ah, it no longer is, now that we're taking the more idiomatic pass-ownership-of-
MemFdSlot
approach. Removed!
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
Ah, right, this wasn't nearly as much a refactor as I thought; sorry! Done.
cfallin submitted PR review.
cfallin created PR review comment:
Done! This cleans up things sigificantly; pooling.rs has almost no memfd-conditional stuff left.
cfallin submitted PR review.
cfallin created PR review comment:
That's a good point. On thinking about this further:
If instantiation bails out after we've initialized the MemFdSlot (which can happen, e.g., if the resource-limiter says "no") then it's most idiomatic to just drop the slot object and let the pool create a new one later.
The most important thing -- the only thing I can think of that would absolutely require us to panic the process -- is that we don't let some other mmap ASLR its way into a gap in the pooling allocator's region if we leave a hole somewhere during instantiation. But we already avoid that, to prevent exactly that race condition; so even if the MemFdSlot fails, say, the second of 3 mmaps, we are still covering/reserving the whole range.
Given that, I think it's fine to drop the MemFdSlot at any point.
For an extra layer of {sanity, avoiding resource leaks e.g. keeping a memfd image alive, defense-in-depth-don't-map-more-than-we-need}, it is still nice to try to remove a stale mapping. I've written a
Drop
handler forMemFdSlot
that does this in a best-effort way: it tries to map anonymous memory withPROT_NONE
over the whole slot. It usesNORESERVE
to make this more likely to succeed. But if it doesn't, the error is explicitly ignored (with a comment in the source to describe why safe): nothing is supposed to touch the slot anyway until the nextMemFdSlot
is created for it, and if mapping something into the slot fails again later for the newMemFdSlot
, then it has the proper avenues to bubble up the error at that time.Does that seem reasonable?
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
I tried cleaning this up a bit, but it's really subtle and I'm getting lost in the corner cases of paged-vs-segment initialization mode, bulk memory and the need to trap at just the right point. In any case it's not a simple refactor. I can spend a few more hours on understanding this and unraveling it if we think we need it but it feels sort of pointless if we're possibly going to remove uffd soon :-)
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
as a minor nit the traits in rustix seem like you could probably pass
image.fd.as_file()
below as thefd
argument rather than creating a newBorrowedFd
structure (although I haven't used rustix, just a hunch)
alexcrichton created PR review comment:
One thing I think we might want to do here is to have an explicit flag (off by default) which disables this destructor behavior and we could set the flag from the pool to skip this since it's all about to get unmapped anyway.
Otherwise though the difference between this destructor and
instantiate
got me thinking. I think that there's a bug right now that can leak the initial memory from one module to another? When a slot is reused all the memory is accessible but reset back to the previous image, but the new anonymous mapping for the new heap might be smaller than the previous image, and subsequent heap grows simply mprotect the preexisting pages into existence, which I think means that the previous module's memory image may leak into the next.Otherwise though I'd naively expect that the destructor here shares common code with the
instantiate
step (when the hot path isn't taken) where there's a step to clear out all vma mappings with this map anonymous call (although I don't entirely know what the the difference is with an mmap that hasNORESERVE
vsPROT_NONE
)
alexcrichton submitted PR review.
alexcrichton created PR review comment:
as a minor nit I was originally thinking that this and
runtime/src/memfd.rs
would be the same file, in further effort to reduce the number of#[cfg]
necessary
alexcrichton created PR review comment:
Unlike the
Static
variant of memory I think we might be able to skip the storage here entirely (and also make theinstantiate
a litte faster below with fewer syscalls) since with a runtime memory all memfd really is is the initial mapping of the cow image overlaid on top of the underlying anonymous mapping. In that sense I think we can skip most of the logic ofMemFdSlot
and avoid storing redundant data.Now that being said I think the implementation here is correct and will work well regardless. I think it'd be fine for a follow-up to clean this up a bit.
alexcrichton created PR review comment:
Given where this PR now is and the trajectory of this feature, I think this is good to have as an optional dependency but I think it should be enabled by default and probably just called
memfd
(enabled-by-default probably from thewasmtime
crate, not here). Especially with this making such a drastric difference for the on-demand allocator it seems like usingmemfd
is a no-brainer.To make this tweak as well as improve the portability here (right now if you enable
memfd
on Windows it'd probably just fail to compile) I think a few small changes are all that's needed. To be clear though this is icing on the cake and it's totally fine to do this as a follow-up, but I think the steps would be:
- Remove the
memfd-allocator
- Add a
memfd
feature to thewasmtime
crate that forwards towasmtime-runtime/memfd
- Add
memfd
to the default feature set of thewasmtime
crate- Add a
crates/runtime/build.rs
that detects that the target's os is Linux (by looking atCARGO_CFG_TARGET_OS
) and that thememfd
feature is active (by looking atCARGO_FEATURE_MEMFD
), and if both are present then printingprintln!("cargo:rustc-cfg=memfd")
- Change all
cfg(feature = "memfd-allocator")
in the crate tocfg(memfd)
With that I think we'll have on-by-default memfd for Linux-only, with the opt-in ability to turn it off. Again though it's fine to defer this to a future PR.
As I type all this out though I also would question maybe we don't even need a feature for this. I suppose one thing that could come up is that if you have a million modules in a process that's a million file descriptors which can blow kernel limits, but other than that I'm not sure why anyone would explicitly want to disable memfd.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, so I think this bug does not exist, because on
instantiate
, in the different-module case, we remap all memory from the start of heap up tostatic_size
(in practice 6GiB) with anonymous zeroes (here) -- this will wipe all previous mappings. We thenmprotect
everything beyond the initial heap size toPROT_NONE
.Re: the other bits (sharing clearing logic) that's a good point though and I'll clean this up a bit!
cfallin submitted PR review.
cfallin created PR review comment:
... and, actually there was a bug, but a bit more subtle: the initial anonymous mmap was surrounded by
if initial_size > 0
rather than a hypotheticalif static_size > 0
. The former may be false if the memory starts with zero size then grows; the latter is always true in any valid config (so we can make it unconditional). Thanks for poking at this!
cfallin submitted PR review.
cfallin created PR review comment:
I played with this a bit, and it's a good thought! I think though it may be safer to go through the
MemFdSlot
for heap growth specifically in case we choose to implement some other technique (such as ftruncate on a second memfd) in the future. At least if I'm reading your suggestion right, otherwise, we'd use theMemFdSlot
'sinstantiate
logic to set up the actual CoW mapping then throw it away (without its dtor running) and rely on the implicit alignment of themprotect
-growth strategy between memfd and the usual dynamic memory.Re: syscalls on instantiate, we could indeed avoid one if we give the
MemFdSlot
a "no fixed mapping" mode, but then it takes responsibility for the pre-guard too, which complicates the pooling implementation somewhat. I'll think a bit more about this though.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Good call, thanks!
cfallin submitted PR review.
cfallin created PR review comment:
Fixed!
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Yes, good point; I've moved everything into one
memfd.rs
inruntime/src/
(andmemfd_disabled.rs
alongside it).
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
I've gone ahead and done all of this; the default build now will use memfd, validated by building the
wasmtime
binary and strace'ing an execution. Waiting for CI to see if movingmemfd
out of the target-only deps into the main deps will work on non-Linux, but I am hoping it will (the crate's got a toplevel#![cfg(target_os ...)]
that should make it a no-op on other platforms...?).
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin updated PR #3697 from memfd-cow
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Since there's a
pooling-allocator
feature below I think that this andmemfd
can probably be removed from the feature set here, but addingpooling-allocator
to the default set of features seems pretty reasonable to me.
alexcrichton created PR review comment:
One of my worries is that there's a fair bit of tricky state with handling linear memories and having that duplicated across a few locations I think is not ideal. The
MemFdSlot
tries to take over a fair bit of the management of the memory which is also tracked by the runtimeMemory
. We discussed this a bit in person as well, but to reiterate here I think it might be good to investigate a bit to figure out whether some functions below could grow an extra parameter or two instead of storing the state internally so it could be passed in by the "source of truth".I don't have a great idea though for whether this is possible, so I'll leave it up to your discretion about whether it ends up working out or not.
alexcrichton created PR review comment:
Since memfd can be used without the pooling allocator now is this condition still necessary? (other than disabling memfd when uffd is enabled)
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Out of curiosity with the current
#[cfg]
system is this still necessary? With memfd on-by-default now this otherwise runs the risk of suppressing, by default, dead code warnings. (I think though that with the current organization the warnings here should be much less and/or easier to deal with)
alexcrichton created PR review comment:
I think that this path may still perform a few extra syscalls that aren't necesary, and in local benchmarking this isn't super-significant in the profile but it still seems nontrivial (~5% ish).
The
MemFdSlot::instantiate
method will mmap the whole memory as read/write, overwriting theMmap::accessible_reserved
above followed bymmap.make_accessible()
. Themmap
for the cow image is then done, finally followed by anmprotect
to make everything past the end inaccessible. I think that the only mmap needed in this case, without replacing the above calls toMmap
, is the one for the cow image.This is probably still faster than what we have today, though, so if you'd prefer this can get deferred to later.
alexcrichton created PR review comment:
It's guaranteed that both
offset
andlen
are multiples of page sizes, right? (just confirming)
alexcrichton created PR review comment:
Random thought that just occured to me, since we're doing fd equality matching here do we actually still need a compiled module id? I originally thought that was going to be used for this but now that I look at this again I see it not being used. I forget though if the compiled module id is actually used for anything else at this point in time though.
alexcrichton created PR review comment:
I think this could actually use
.get_mut()
to bypass the.lock()
since we've got mutable access to the contents at this point.
alexcrichton created PR review comment:
I think this is fine to be a safe function since this isn't related to Rust's memory safety and is more of a defense-in-depth mechanism.
alexcrichton created PR review comment:
I think
pooling-allocator
can be dropped here?
cfallin submitted PR review.
cfallin created PR review comment:
It's used in the followup affinity PR (#3738) as otherwise there's an ID-reuse problem... strictly speaking that's just a heuristic so we could use fds there too and accept a false affinity sometimes. IMHO they're pretty cheap to maintain so maybe it's ok...?
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Ah right, and nah yeah let's leave them in. Only worthwhile if they could be removed entirely, but using for affinity makes sense!
cfallin submitted PR review.
cfallin created PR review comment:
Ah, these were necessary to actually see commandline
wasmtime
use memfd I think -- thedefault-features = false
otherwise turns everything off. (If there's a better way to arrange this then let me know -- I'm actually not sure I fully grok whydefault-features = false
is right for the CLI tool)
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Good point, no longer needed! Removed.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, I wasn't aware of that bit of
Mutex
's API, thanks!
cfallin submitted PR review.
cfallin created PR review comment:
Seems not to be, removed!
cfallin submitted PR review.
cfallin created PR review comment:
Yes; added doc-comment notes to make this explicit and asserts where the
MemoryMemFd
is built. Thanks!
cfallin submitted PR review.
cfallin created PR review comment:
Indeed!
cfallin submitted PR review.
cfallin created PR review comment:
Good point, changed.
cfallin submitted PR review.
cfallin created PR review comment:
I've updated this so that the extra mmap no longer occurs. The basic idea is actually a slightly more general optimization: if there is no
image
(this can happen any time a memory with no data segments uses aMemFdSlot
), then the whole slot is just the anon-zeroes mmap, and we don't need to wipe it clear with a new one. This is true when theMemFdSlot
is first created as well: it has animage
ofNone
and we require the caller to have created a backing mmap already (as both the dynamic linear memory and the pooling allocator do).
cfallin updated PR #3697 from memfd-cow
to main
.
alexcrichton submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Now that I think about it, mind adding some debug asserts here that
image.offset + image.len
fits withininitial_size_bytes
?
alexcrichton created PR review comment:
One thought I had reading over this, is this strictly necessary? For example if we didn't track
initial_size
and instead only trackedcur_size
then we could leave the entire mapping as zero here and on reuse we'd hit theif initial_size_bytes < self.initial_size
case (although afterwards it'd beself.cur_size
).I don't think that'd actually result in any fewer syscalls happening in any cases though.
alexcrichton created PR review comment:
I think now the
prot
flag is alwaysProtFlags::empty()
so this could perhaps be renamed toreset_with_anon_memory
or something like that? (with no arguments)
alexcrichton created PR review comment:
I think this should be
self.initial_size - initial_size_bytes
, right?In that this is just making the previously-larger heap inaccessible after the base?
alexcrichton created PR review comment:
Oh actually this may not work out, I can see how a grown heap would then, on resue, re-
mmap
the CoW overlay.
acfoltzer requested acfoltzer for a review on PR #3697.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
I think it's correct, but it's pretty subtle so I added some more comments with a figure showing the before-and-after mmap state. Basically we have 0..self.initial_size accessible, and we want only 0..initial_size_bytes (the new, smaller size) accessible, so we make inaccessible the range initial_size_bytes..self.initial_size.
cfallin submitted PR review.
cfallin created PR review comment:
Done! (And I made it a release assert rather than debug-only, as this feels important enough and we use release asserts elsewhere in this file)
cfallin submitted PR review.
cfallin created PR review comment:
Done!
alexcrichton submitted PR review.
alexcrichton created PR review comment:
But this parameter is the length parameter, not an end, so isn't the length of what we're changing to PROT_NONE
self.initial_size - initial_size_bytes
since the new heap is smaller?
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Something I just realized for this case, I think that
self.image
should be set toNone
if we don't take this path through theif
, right?
cfallin submitted PR review.
cfallin created PR review comment:
Eek, indeed. This "oh wait one more case" on the core code is making me nervous enough that I'm gonna go figure out how to test this more thoroughly (so far it's been just "every other test uses this as a backend").
alexcrichton created PR review comment:
Internally it might be a bit more robust to use
Range<usize>
instead ofstart: usize, len: usize
perhaps, but otherwise I don't think this was a functionality bug since it's more that it could theoretically trip the preexisting assert in setting page protections.
alexcrichton submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, it actually can lead to incorrect behavior, or at least I managed to hit it with the tests I just added: if we don't clear
image
when we instantiate with no image, then the stale state remains, and on the next instantiation we may falsely think that we have this image mapped in when we don't (anymore).In any case, fixed and tests added; thank you for finding this!
cfallin updated PR #3697 from memfd-cow
to main
.
alexcrichton created PR review comment:
oh oops I thought I was responding to https://github.com/bytecodealliance/wasmtime/pull/3697#discussion_r797879911 here, but I clearly am not, so disregard that last comment here!
alexcrichton submitted PR review.
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Ah yes, I see now; pushed an update that modifies
set_protection
to useRange
instead and this is indeed much clearer!
cfallin updated PR #3697 from memfd-cow
to main
.
cfallin merged PR #3697.
Last updated: Jan 24 2025 at 00:11 UTC