peterhuene opened PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time and to reuse
committed memory pages wherever possible.See the related RFC.
This PR is based on #2454 and only the most recent commit is new.
peterhuene edited PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.See the related RFC.
This PR is based on #2454 and only the most recent commit is new.
peterhuene updated PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.See the related RFC.
This PR is based on #2454 and only the most recent commit is new.
peterhuene updated PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.See the related RFC.
This PR is based on #2454 and only the most recent commit is new.
peterhuene updated PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.See the related RFC.
This PR is based on #2454 and only the most recent commit is new.
peterhuene updated PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.See the related RFC.
This PR is based on #2454 and only the most recent commit is new.
peterhuene updated PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.See the related RFC.
This PR is based on #2454 and only the most recent commit is new.
peterhuene updated PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.See the related RFC.
This PR is based on #2454 and only the most recent commit is new.
peterhuene updated PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.See the related RFC.
This PR is based on #2454 and only the most recent commit is new.
peterhuene updated PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.See the related RFC.
This PR is based on #2454 and only the most recent commit is new.
peterhuene updated PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.See the related RFC.
This PR is based on #2454 and only the most recent commit is new.
peterhuene updated PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.See the related RFC.
This PR is based on #2454 and only the most recent commit is new.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
This sure is pretty,
cargo-fmt
.
peterhuene edited PR Review Comment.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene edited PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.See the related RFC.
This PR depends on #2434.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene edited PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.See the related RFC.
peterhuene edited PR #2518 from add-allocator
to main
:
This PR implements the pooling instance allocator.
The allocation strategy can be set with
Config::with_allocation_strategy
.The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time.This PR also implements the
uffd
feature in Wasmtime that enables
handling page faults in user space; this can help to reduce kernel lock
contention and thus increase throughput when many threads are
continually allocating and deallocating instances.See the related RFC.
peterhuene requested alexcrichton for a review on PR #2518.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene requested alexcrichton and fitzgen for a review on PR #2518.
peterhuene updated PR #2518 from add-allocator
to main
.
alexcrichton submitted PR Review.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Perhaps the tests here could conventionally have
uffd
in the name so we could just runcargo test --features uffd -p ... -p ... uffd
?
alexcrichton created PR Review Comment:
Is the reason that this is done here instead of module translation to defer the copying of the original wasm's data into a
Box<[u8]>
instead of borrowing it?
alexcrichton created PR Review Comment:
This returns a
Result
but never actually returns an error, would it be possible though to return anErr
on the Windows side to avoid a panic?
alexcrichton created PR Review Comment:
We've generally tried to avoid reexporting inner types to avoid public crate dependencies and such. It's mostly to prevent developers from accidentally thinking they need to use the internal crates.
Would it be ok to duplicate these definitions into this crate and using
match
to translate?
alexcrichton created PR Review Comment:
The original design of
get_data
was intended to intentionally take afn
as the function parameter which explicitly cannot close over any state. The goal was to avoid computing data that didn't take into account something that should be hashed by enforcing that all input state was hashed.I saw that this changed from
fn
toFn
, and it's presumably to handle theallocator
variable which I think is probably somewhat unreasonable to implementHash
.I was thinking about this and ways to possibly improve this, but I remembered that I think this needs to be refactored anyway? The
Module::deserialize
method is the other primary way to construct a module, which doesn't go through this path, so I think that method will need to also be validated against the instance allocator configured in theEngine
? For that I think theallocator
may need to switch to validatingCompiledModule
instances perhaps (or similar), and from there I think that this caching function can switch back tofn
?Er, so basically:
- I don't think that this validation catches modules built with with
Module::deserialize
- By catching those modules I think we can switch the caching back to
fn
instead ofFn
alexcrichton created PR Review Comment:
FWIW if we wanted (and if it mattered, I'm not sure if it does), we could do away with this almost entirely. We might need to allocate more space for wasmtime to do its thing on the async stack (e.g. generate a backtrace after a trap), but we could have all host calls which are polling futures transition back to the main native stack.
Although that would probably also lose the stack trace information with wasm frames if a host function generated a trap.
Anyway not something for this PR naturally, just musing.
alexcrichton created PR Review Comment:
FWIW I have a little red flag go off in my head whenever I see the
nix
dependency, it's historically I feel been a bit too much trying to be "portable" when it isn't actually. That said this is already a dependency oflucet
and fastly folks look to be the ones managing this crate anyway, so this is something that could be updated later if necessary.
alexcrichton created PR Review Comment:
This check initially seemed odd to me because I thought "what if the memory was larger?" but in fact all memories here are defined memories which means that they'll all have the minimum size.
Could the comment perhaps here be updated to reflect that? In theory if we have some future wasm feature to resize memory optionally before memory is initialized we'll need to update this, but that seems like a bit of a stretch.
alexcrichton created PR Review Comment:
Could the
new
function here perhaps return a nativeanyhow::Error
to avoid this conversion?
alexcrichton created PR Review Comment:
I've only read this far up to this point, so I've definitely got more to go. That being said though while this representation makes sense in a vacuum I'm not sure why we'd pick it. Could this comment (or the one on
Paged
) be updated to point to somwhere which has rationale for why we'd structure memory initializers in this fashion?
alexcrichton created PR Review Comment:
I wonder, would it be possible to use some generics-fu and a helper function to make this lookup/
match
use shared logic between memory segments here and table segments above?
alexcrichton created PR Review Comment:
I think that some integration may be missing around this? It looks like this function is never actually called by
wasmtime
, so I think that maybe a change was missed when rebasing on the async changes? (it looks likeStore::on_fiber
still unconditionally callsFiber::new
)
alexcrichton created PR Review Comment:
I was about to say that this is a really weighty dependency to pull in and realized that we already have this in our dependency graph, now there's just 2 ;_;
alexcrichton created PR Review Comment:
One thing we could do to make this function a bit safer perhaps is to use
ptr::slice_from_raw_parts_mut
perhaps?
alexcrichton created PR Review Comment:
Could this return
anyhow::Error
?(ideally we'd be pretty consistent about using that everywhere, although I don't know how successful we are up to this point)
alexcrichton created PR Review Comment:
Similar to other places, could this use
anyhow::Error
perhaps?
alexcrichton created PR Review Comment:
Having only read this far (thanks github alphabetical sorting) and also not having a good grasp on uffd, I don't really have any idea what this field could be used for. Could this comment perhaps point to a different location which has a doc-block with a high-level overview of how uffd and wasmtime work?
alexcrichton created PR Review Comment:
Piggy-backing on an earlier comment, this is one of the locations I stopped to think about overflow.
I realize though this was copied from preexisting code so it wasn't introduced in this PR, but figured it'd be worth mentioning!
alexcrichton created PR Review Comment:
I'm nott entirely certain if this is the place, but it seems like something should be done on overflow here. Either this should explicitly use a
wrapping_add
in the 32-bit index space, or it should use achecked_add
and probably fail if overflow happens.
alexcrichton created PR Review Comment:
I forget if we check this in various locations, but does this perhaps need to guard against overflow?
I thought about this looking at a different part of this PR, but I would imagine that something should guard against overflow at some point, although I'm not sure precisely where that should happen.
alexcrichton created PR Review Comment:
I think that this may actually make us slightly non-spec-compliant. Spec-wise I think we need to process each data segment in-order and only trap when one is out of bounds, but all the previous data segments should be copied into memory. I think this should be easy to fix, though, by basically using Segmented when one of the segments is out of bounds?
(and we could even optimize slightly and avoid storing any data for future data segments after the one that's known to trap)
alexcrichton created PR Review Comment:
I realize this is probably copied from earlier, but the
end
here is already computed above witth thechecked_add
, so perhaps amatch
could be used to bind the result and return early on failure?
alexcrichton created PR Review Comment:
FWIW if you're up for it I think this would be a good refactoring to implement for tables that a "dynamic" as well. I think we'd ideally have the same representation for both static and dynamic where everything is a
Vec<usize>
(or at least a contiguous array of pointer-sized things).That may help unify a few paths here and having so much difference between static/dynamic.
alexcrichton created PR Review Comment:
To avoid some extraneous casts here, could the
*mut Instance
be bound to a local variable and then it's got thedrop_in_place
anddealloc
called on it?
alexcrichton created PR Review Comment:
This I think has interesting API implications where when using the pooling instance allocator you'll always have a maximum listed on tables even though no tables actually mention having a maximum. For example
Table::new
with a maximum size ofNone
would come back to you and say it has a maximum size.I'm not sure if this is the right trade-off, though, since it means that the behavior of the
wasmtime
crate differs depending on the allocation strategy (which I don't think is intended)?Do you think it would be reasonable to have 2 maximums here? One for the wasm-defined maximum and one for the system-defined maximum. Growth beyond the wasm maximum would give a deterministic error message and growth beyond the system-defined maximum would return a different error message in theory.
alexcrichton created PR Review Comment:
This bug predates your PR, but it might be good to actually go ahead and fix it here? Currently this
.unwrap()
will cause a panic ifTableElement
is of the wrong type:#[test] fn fill_wrong() { let store = Store::default(); let ty = TableType::new(ValType::FuncRef, Limits::new(1, None)); let table = Table::new(&store, ty, Val::FuncRef(None)).unwrap(); table.fill(0, Val::ExternRef(None), 1); }
(using the
wasmtime
crate API)
alexcrichton created PR Review Comment:
Would it be possible to bake the assertions above into Rust's builtin bounds checks by calling
get_memory_slice
perhaps? That would also help improve safety from at least the Rust perspective where if we get the arguments wrong it panic
alexcrichton created PR Review Comment:
Would it be possible to inline the allocation that currently happens in the fiber crate to here instead? It'd be pretty nice actually if the fiber crate didn't deal with stack allocation at all and just unsafely assumed the given stack was valid
alexcrichton created PR Review Comment:
Out of curiosity did we have tests already that required the check for forward/backward copy? If not, would you be ok adding a test or two that ensures we do the right direction of copy here?
alexcrichton created PR Review Comment:
I haven't read a ton of code related to this yet, but should this and
make_accessible
perhaps be methods onMmap
? If not methods on the structure itself perhaps associated functions?(would be good to consolidate all the various bits and pieces related to mmap and platform-specific-ness into one place
alexcrichton created PR Review Comment:
This isn't your bug (it predates this PR), but this suffers the same issue where it panics if the types of the tables mismatch:
#[test] fn copy_wrong() { let store = Store::default(); let ty = TableType::new(ValType::FuncRef, Limits::new(1, None)); let table1 = Table::new(&store, ty, Val::FuncRef(None)).unwrap(); let ty = TableType::new(ValType::ExternRef, Limits::new(1, None)); let table2 = Table::new(&store, ty, Val::FuncRef(None)).unwrap(); Table::copy(&table1, 0, &table2, 0, 1); }
while you're here, would you be ok fixing that?
alexcrichton created PR Review Comment:
Ah ok seeing the implementation of this I think that it might be best to change this signature to at least take a
Module
instead ofModuleTranslation
.As for the other comment I had elsewhere about doing this after compilation instead of before, I realize that'll have the impact of some modules will take time to compile and then fail (whereas they could have failed faster), but perhaps we can leave that as an open issue to fix for later?
alexcrichton created PR Review Comment:
Given that we already have a trait here for what linear memory looks like, could we continue to use that instead of having a static variant? Or were you trying to also avoid the allocation overhead of the trait object itself?
alexcrichton created PR Review Comment:
I think that this, and the functions below, may not need to be
unsafe
?
alexcrichton created PR Review Comment:
We may want to update the documentation for enabling the pooling allocator to indicate that these fields are automatically configured and any provided values with
Config
are ignored?
alexcrichton created PR Review Comment:
Could these numbers be decomposed a bit into a more readable form, e.g.
10 * MB
or10 * (1 << 20)
or something like that?
alexcrichton created PR Review Comment:
How come this uses the default instance allocator instead of the configured instance allocator?
alexcrichton created PR Review Comment:
Is this intended to be used beyond tests? If so it may be better to have the rng as a parameter somehow to avoid hardcoding
thread_rng
which may not always be the most performant.
alexcrichton created PR Review Comment:
Could this
clear()
happen as part ofdeallocate
and this could become adebug_assert
that it's already empty?
alexcrichton created PR Review Comment:
It's ok to skip the extra block here since the
lock()
will get freed at the end of the expression anyway
alexcrichton created PR Review Comment:
This might be a bit more terse as:
fn base_pointer_iter(base, num, size) -> impl Iterator<Item = *mut u8> { (0..num).map(|i| base.add(i * size)) }
alexcrichton created PR Review Comment:
In theory the
InstancePool
,MemoryPool
, andTablePool
structures are all along the lines ofMmapBackedPool<T>
and have pretty similar code for allocation/etc. Do you think it'd be possible to make something that's literally generic like that? Or perhaps runtime-bounded where you pass in a runtime size and a runtime count.
alexcrichton created PR Review Comment:
It looks like the index of the instance in the pool is the same as the index of the memory/table used in the pool, but could we perhaps make that more flexible? Could the memory/table pool indices be in separate spaces to allow for more flexible memory/table allocation?
I don't think this helps a ton of use cases today but could in theory help with something where if you allowed up to 2 memories but you expected almost all modules to use only one memory you wouldn't have to allocate 2x the virtual memory space in case all instances allocated 2 (or something like that).
I imagine this may be more of a problem for tables in the near future since multiple tables may be used more regularly than multiple memories.
alexcrichton created PR Review Comment:
This I think is a good place where using
anyhow
would clean this up a bit wherewith_context
wouldn't have to worry about including the original error in the message here, it'd be done automatically.
alexcrichton created PR Review Comment:
This looks like it's at least used in
allocate
below as well, so perhaps a helper method could be added along the lines of:fn instance_ptr(&self, index: usize) -> *mut Instance;
alexcrichton created PR Review Comment:
Could this have a debug assertions that the memory (and table below) are "static" and managed by the pooling allocator?
alexcrichton created PR Review Comment:
Could this have a doc comment indicating that the helper thread should automatically exit when all regions are unmapped, so this doesn't need to do anything like send a signal for the thread to exit?
alexcrichton created PR Review Comment:
Should this perhaps panic since the documentation seems to indicate that it's only used in non-blocking situations where an event wasn't available.
alexcrichton created PR Review Comment:
To confirm my understanding, the fault is on an unmapped region of memory, one with zero backed pages. This is just switching it to a confirmed mapped, but no protection region?
I'm not actually sure how
PROT_NONE
affects the default regions vs not...
alexcrichton created PR Review Comment:
This is actually pretty interesting, while true I don't think that this has been assumed before (well before this PR we didn't assume raw initialization of tables anyway). Could you drop a comment in the
Table
implementation about this? It'll be a hard requirement that tables can be initialized with all-zeros as their representation.
alexcrichton created PR Review Comment:
I'm not sure I understand the semantics of
MAP_NORESERVE
here. This is creating a mapping for the fullmapping_size
, which seems like it's creating a read/write mapping for the 1000 * 1 * 6GB default (6TB?), but that surely isn't what's happening. DoesNORESERVE
here mean that page tables aren't edited at all? Does read/write unconditionally segfault and get routed to uffd with this flag?I tried reading man pages but they weren't the most illuminating for this...
alexcrichton created PR Review Comment:
FWIW I personally have 0 uffd experience. Would you be up for writing a longer-form doc comment explaining some high-level pieces about how uffd works and how it is used here? I think it'd be nice to read a module-doc comment and get a high-level overview of what primitives uffd offers and how they're used in wasmtime.
alexcrichton created PR Review Comment:
Could this handling be broken out into the main body of the loop or a separate function to avoid the rightward-drift here?
alexcrichton created PR Review Comment:
Should this perhaps be wired up as
DefinedMemoryIndex
?
alexcrichton created PR Review Comment:
Hm I see now about my comment earlier having different index spaces for tables/memories will not work because of this. I can't really think of an alternative, so seems fine!
alexcrichton created PR Review Comment:
I'll admit I'm a bit lost and confused with the native os page size and the wasm page size. Why does the page address of the wasm page matter here?
alexcrichton created PR Review Comment:
Sort of similar to above, how come a length of wasm page size is assumed? I would naively assume that the wasm page size wouldn't factor into anything in this module, only perhaps as a multiplier to by default initialize more than one page at a time if one faults.
alexcrichton created PR Review Comment:
Could this have a comment that
Some(None)
means that we just didn't have a data initializer for this page in memory but we did have an initializer for a later page, and thenNone
means that we didn't have an initializer for this page or anything afterwards.
alexcrichton created PR Review Comment:
This is one part where the native page size and wasm page size sent me spinning for a bit. It looks like the inputs to this function are wasm page size indices and multiples, but the
map
that we're loading from is stored with native page sizes aligned. How come that's the case? Should themap
have wasm page sizes instead? (or should this function take native page size multiples/indices?)
alexcrichton created PR Review Comment:
I'll admit I unfortunately have no idea what the purpose of this function nor why this (presumably
mprotect
?) call does what it needs to do. Could you explain a bit more what's going on with this side table of guard pages?
alexcrichton created PR Review Comment:
FWIW I find the error-handling boilerplate here to take up a lot of screen real-estate, it'd be nice to use perhaps a helper function to create these messages to make the
map_err
format onto a single line if possible.
alexcrichton created PR Review Comment:
Why is this called here?
alexcrichton created PR Review Comment:
If I understand this correctly, when uffd is enabled, this I think has somewhat weirder performance? During this standard initialization what will happen is that we'll fault as we write in each byte, but the uffd handler fills in zeros, and then we proceed to the next page.
For this sort of initialization would it be better to go ahead and make the pages backed by real memory before we initialize the memories of the instance?
alexcrichton created PR Review Comment:
While obvious in retrospect, could this be commented to indicate that it's only called with requests which are known to be valid wasm pages that must either have data copied in or data zeroed?
alexcrichton created PR Review Comment:
I'm not sure I saw this, but could you add a test where async code hits the guard page in native code?
I think it'll need to be a new test executable that executes itself similar to
tests/host_segfault.rs
(although this could modify that too)
alexcrichton created PR Review Comment:
I got lost a little trying to find what this
AtomicBool
was being threaded around for. I don't think that we need to have this, though, since if anything hits the guard page then it should abort the whole process (we don't recover from those at all). Basically I think that this could work, but given that the signal handler never handles this SIGSEGV I think that it's not necessary to try to recover and this can simplywake_guard_page_access
alexcrichton created PR Review Comment:
I'm having a bit of trouble imagining where this case is hit. Is this used for things like
Memory::new
explicit constructors?It seems like this also gets triggered perhaps when uffd is used but pages are not initialized with the paging above?
I feel like ideally we would assert that this case was never hit, but I'm not sure if that makes sense within the realm of performance that uffd is targeting.
alexcrichton created PR Review Comment:
Hm ok as I read more and see that this is named in a few locations this may not be that applicable.
alexcrichton created PR Review Comment:
Also, another thing I'm thinking at this point. I still haven't yet read to the point where the
Paged
variant here is motivated (probably uffd related I'm assuming), but wouldn't this initialization strategy be slower for modules not using the uffd allocator? Previously what was one large memcpy is now a lot of page-sized memcpys, which seems like it might be a bit slower but I also suspect probably not really all that perceptibly slower.
alexcrichton created PR Review Comment:
Ok maybe I'm just confused how uffd works.
It looks like a uffd backed page is one with read/write permissions, but not physical memory backed? By using NONE here we're asking the kernel to specifically bypass uffd? Then the
reset_guard_page
switches it back to read/write to get faults routed to uffd?
alexcrichton created PR Review Comment:
Or, well, put another way, I couldn't figure out what was going on with recording all these faults.
Could you write up a doc comment about what's going on with this record of guard pages and how it's used/applied with uffd?
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
We could do it that way, although right now the desire is to run the default pooling allocator tests but with uffd feature enabled as well (i.e. same tests, but use user fault handling).
Is there a good way to do that without duplicating the tests?
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Sure thing, I'll fix that.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll redefine these in the
wasmtime
crate with a mapping.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I believe that was the intention behind doing this post-translation originally because the data was stored on
CompiledModule
vsModule
, but the uffd implementation needs the initialization data onModule
to access it viaInstance
at runtime (Instance
currently only references theModule
).I would be fine doing it on module translation (make sense to give back a fully-formed
Module
), but I wasn't sure if it was the right thing to do in terms of boxing the initialization data.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'd like to address this in a future PR to behave more like how Lucet functions for execution stacks.
Ideally all the platforms would be able to walk through the "fiber stack" frames, so hopefully at some point that isn't a problem (it may mean some patches to
backtrace
to make work especially with the weirdness you saw on Windows fibers with the async work).The approach I would favor, if possible, is that each store allocates a single fiber stack when it is created and the first entry to Wasm (async and sync) switches to that stack. We only switch back if yielded (async) or the entry wasm call returns (sync). This means we do check that the host has enough "guaranteed" stack space (ala Lucet) when a host function is called. We'd remove the prologue stack overflow check in favor of a guard page signal as we'd know exactly where the guard page is so the trap handler can tell if the fault is on a stack guard page.
To that end, I think there would be a setting that controls the size of the fiber stack allocated and a setting for controlling the size reserved for host calls. This only matters at host entry; the wasm code can have the full stack to itself provided it doesn't call the host past the reservation size.
Doing it this way give us a consistent stack story across platforms (perhaps excluding Windows?) for sync/async (i.e. "WebAssembly code gets exactly XX of stack space, but must have at least YY of space left to call the host") and it'd reduce the prologue branching + extra recorded trap (when not interruptible), etc.
A design for another time, perhaps.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
That'd mean a new dependency on
anyhow
from the runtime crate, which I'm personally fine with. Worth it?
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
That's an excellent point about validating deserialized modules.
I'll switch to validating with
CompiledModule
and handle the validation after the data is returned from the cache, which should revert this back tofn
.
peterhuene created PR Review Comment:
I'll update the comment.
peterhuene submitted PR Review.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll update the comment on why "paged" initialization is necessary, which is primarily for the uffd feature so that no work has to be performed in the user fault handler to initialize the page (we just tell the kernel "here's a page you can copy to the page being accessed").
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Thanks for catching this; it was a bad rebase merge after I pulled in your final async changes.
I've pushed up the fix.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Same comment as above as we don't currently have a dependency on
anyhow
for the runtime crate; I'm not opposed to adding one, however.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I can expand the comment to point at higher-level overview documentation on how uffd works.
Specifically this is a workaround to an issue we were seeing in Lucet on more recent Linux kernels: previously waking a faulting thread without zeroing or copying the page data resulted in a
SIGBUS
, which Lucet relied on to signal an out-of-bounds memory access and thus trapped accordingly.Newer Linux kernels (5.7+) now retry the fault ad infinitum, resulting in a busy hang in Lucet when uffd is used.
To fix this when implementing uffd in Wasmtime, Wasmtime records faults to current "guard" pages (i.e. pages that should not be accessible, but with uffd the entire memory region is read/write) when faulted and then explicitly changes the protection level of the page to
NONE
to induce aSIGSEGV
when the kernel retries the fault, ultimately resulting in a trap as if the page wereNONE
the whole time.As an instance may continue to be used after a trap or (with the pooling allocator) the same virtual memory pages reused for a new instance, we have to reset the protection level of these "none" pages back to read-write because they might actually be accessible pages in the future if a memory grows (this should probably check to see if the page is beyond the limit of addressable pages and skip recording it if so, though; I'll fix that).
The ultimate goal of the
uffd
feature is to (almost) never change memory protection levels as doing so requires taking kernel locks.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Will change if we think runtime should depend on
anyhow
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll see what I can come up with.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
This is old code refactored out from
instance.rs
, but I'm happy to change it.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I feel like perhaps this belongs in the module translator such that we never call
declare_data_initialization
ordeclare_table_elements
with offsets + lengths that would overflow.
peterhuene created PR Review Comment:
This is also old code refactored out of
instance.rs
, but I agree this should be checking that the global doesn't introduce an overflow (assuming we're checking for the static offset in module translation).
peterhuene submitted PR Review.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Agreed, this should calculate the end once (also existing code out of
instance.rs
). I'll fix.
peterhuene edited PR Review Comment.
peterhuene created PR Review Comment:
It can't hurt to address this one as well in this PR. I'll fix.
peterhuene submitted PR Review.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
This condition (i.e.
MemoryInitialization::OutOfBounds
) only occurs when all the memories being initialized are defined (i.e. no imported memories).As such, there can't be any visible side-effects to the initialization because the module never successfully instantiates and no linear memory is being written to that existed prior to instantiation.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I totally agree that using paged initialization might have a performance impact when not using uffd (which requires it so that we can quickly tell the kernel "copy this page exactly and wake the faulting thread").
I'd be totally fine with defaulting to segmented initialization always unless the uffd feature is enabled. What's your thoughts on that?
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll fix as the way you describe it is how the code currently works in
InstanceHandle::dealloc
; this is probably a holdover from something that made sense 30 commits ago in this work :big_smile:
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I think we can make that work; the Windows implementation would differ from the other platforms in that the "top of stack" pointer given back is actually a fiber address to use with the Windows API.
I'm on board with that and will make that change.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll fix that.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Table::new
should not have any limits placed on it as instances for all host-defined objects use the default (on demand) allocator.Only module-defined tables and memories should have an implicit upper-bound via the pooling allocator's limits.
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll fix.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
These do call
slice::from_raw_parts
for the static case as we just have a base pointer and size and the unified representation between static and dynamic is&[T]
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
But I can better scope the
unsafe
so the function itself isn'tunsafe
.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
It appears that bulk memory spec suite covers both directions (i.e. some are
dst > src
and some aredst <= src
).Do you think that's sufficient?
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I was looking to avoid allocating a trait object for the static case, but I'm not opposed to putting this behind the trait to simplify the code here.
Perhaps I was a little overzealous on trying to get the pooling allocator to heap allocate as little as possible to service a module instantiation request.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'm fine with doing that.
With
uffd
enabled, perhaps we just won't call these functions rather than stubbing them out (decommit
for uffd is very similar, except it doesn't modify any protection levels).
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
As commented in a previous response, I viewed the "pooling instance allocator" as relating to module instantiations and it's not used for creating host objects because I didn't want them to count towards the allocator's limits (from the runtime's perspective, a host object instance and a module instance is basically indistinguishable).
Personally, I think this part of the design (specifically the notion of the "default instance allocator") is a bit wonky and I'd be happy to revisit it.
What I don't want, however, is users to define a
Wasi
instance and, unbeknownst to them, the availability in the pool has been reduced by many dozens of instances, especially when none of those instances defined any memories or tables (and thus the pool's memory is underutilized). I view the way we instantiate host objects to be an implementation detail of Wasmtime and not one I really want to expose users to when thinking about "instances".Doing it this way also means host objects are not restricted by any limits placed on the pooling allocator.
That said, I'm totally open to better ways of doing this.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Agreed, this should take
Module
and happen after translation.I also doubt failing faster will be necessary, as scenarios in which the pooling allocator is most likely to be utilized, services would validate the module against its known limits once and long before instantiation.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Absolutely will do.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I can do that.
I've also been toying with removing the
memory_reservation_size
knob in the pooling instance allocator limits and instead relying onstatic_memory_bound
when creating the pooling instance allocator. A static memory bound of zero would be an error when using the pooling instance allocator and it would still be treated as the upper bound for all memories (and thus are static).This would mean the same sort of checks done in
Config::async_stack_size
to see if the strategy was already configured when settingstatic_memory_bound
would need to be performed, but that's relatively minor.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Except now I just remembered why I quashed that idea: the 32-bit default for
static_memory_bound
is way too large to accommodate the pooling instance allocator (1 GiB vs 10 MiB).
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
This is intended to be used in production environments to make it harder for attackers to predict linear memory addresses (linear memory ASLR, if you will).
In Lucet, there's
Linear
(which I'm callingNextAvailable
here),Random
(same implementation as here), andCustomRandom(Arc<Mutex<dyn RngCore>>)
which is used in testing to simulate randomness (not implemented in this PR).Perhaps we could include
CustomRandom
just in case a different rng is desired? Thread-local variants would definitely be preferred, though, as the only locks taken in the pooling allocator currently are for modifying the free lists.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll remove this in favor of a simple generator.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll add that.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll remove. I think originally this had a named scope guard rather than having the single statement, so it was explicitly done to denote a lock in play.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll add that.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
That sounds good to me. I'll change.
peterhuene created PR Review Comment:
The reason the memory and table pools don't maintain their own free lists is that the uffd implementation can very quickly go from faulting address to the owning instance based on some simple arithmetic without any extra bookkeeping; but I don't think that's a strong enough reason to not be more flexible in this implementation.
We'd need a table that maps between memory/table pool index to instance that it was allocated for, which isn't that big a deal to maintain.
I'll see about addressing this.
peterhuene submitted PR Review.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I think it would make sense to make this more generic especially if more free lists are maintained per my last comment.
The mutex would probably then surround the instance pool itself rather than its free list, as we only want one lock to be able to modify three different free lists (instance, memory, and table).
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Agreed. If we decide to add the
anyhow
dependency to runtime, then I'll make this change.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll add that comment.
peterhuene created PR Review Comment:
Panicking here makes sense to me. I'll add that.
peterhuene submitted PR Review.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Definitely. I'll factor this out.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I think this comment might be a little confusing.
It's not that the table's memory can be zero for element representation, just that tables aren't lazy initialized like memories are with uffd.
This is simply zeroing the page prior to the initialization performed during module instantiation (what the representation of accessible elements are is then up to that initialization).
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I think I now see what you meant.
Table::new_dynamic
(previouslyTable::new
) is explicitly usingptr::null_mut
andNone
(which is anOption<NotNull<T>>
) for "uninitialized" elements, and this will assume those are equivalent to a zeroed page when uffd is in play.I'll add comments in
Table
that says there's an assumption being made here that uninitialized elements are assumed to be represented with all zeroes.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I left a previous comment that explains this a little bit and I'm happy to codify this better in the uffd module's comments.
One of the big selling points of uffd is that we don't have to explicitly change the kernel's memory protection levels when instances are allocated/deallocated or when tables/memories grow (or for reused fiber stacks either).
It keeps the regions as entirely read/write and initially the entire memory range is mapped but "missing" (i.e. normally the kernel would service a fault as if the page were all zeroes). When a fault occurs in the region, we can detect if the access was semantically to a "guard" page even though it has read/write access.
The older kernels allowed us to simply wake the faulting thread without servicing the fault and, instead of retrying the fault, it would
SIGBUS
the access (this is behavior Lucet relies on currently to translate the access to a trap).Newer kernels will retry the fault indefinitely, meaning we must service it with either zeroing the page or copying the page from some other VA. However, we don't want to service it because the access was to a guard page in this case.
So what this does is record the fact a guard page was hit, "temporarily" set the protection to
NONE
to induce aSIGSEGV
when the kernel retries the fault, and then later on "reset" the protection level back to read/write as some point in the future the page might actually be accessible. Granted, this implementation could be a little smarter in identifying pages that will never become accessible and thus leave them forever with aNONE
protection level (something I will try to address).This means we only ever change protection levels on guard page access. This isn't ideal as malicious modules might intentionally hit guard pages, but there's other mitigations services could do to prevent abuse (i.e. "your module has to sit in the naughty corner because it trapped out of bounds to much").
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I will definitely expand upon this documentation to make this more clear.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Here's my understanding of over-committing memory on Linux:
A lot of distros use the default of
0
forvm.overcommit_memory
. This causes the kernel to apply a heuristic to see if it can likely satisfy a request to physically back all the requested accessible pages in the future. If the kernel thinks it won't be able to physically back the allocation, it fails.
MAP_NORESERVE
effectively tells a kernel using this over-commit strategy to forgo this check. In this case, we're pretty much guaranteed that the kernel will not be able to physically back such a gigantic VM allocation. However, only a fraction of the pages will ever need backing, soMAP_NORESERVE
is our way of telling the kernel we know better than it does.I believe that a setting of
1
forvm.overcommit_memory
is the same as every allocation beingMAP_NORESERVE
.A setting of
2
will always cause the pooling allocator to fail its allocation. In this case,MAP_NORESERVE
has no meaning.Ultimately this means we're at risk of random
SIGSEGV
for accesses to otherwise accessible pages in OOM conditions because we told the kernel to give us no guarantees that the page can be backed (I believe it might also sidestep the OOM killer for such pages, so the kernel won't bother killing other processes to give you backing space).
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I can remove this.
I was being a little to forward thinking that we might eventually want to utilize the stack guard pages to trap for stack overflow rather than relying on prologue checks like we do now (see comment made much earlier today), but we can always put this back in if that ever becomes a reality.
No sense cluttering the code unnecessarily.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Sure, that conveys the intention better. I'll fix.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I have some ideas around making this work so I'll see what I can do.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
So Lucet's uffd implementation has two ways of delay-initializing linear memories: by OS page and by Wasm page. The latter seems more commonly used in production, which is why this implementation simply informs the handler thread which Wasm page was faulted and it initializes per Wasm page.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
The uffd handler initializes per Wasm page rather than handling a fault per OS page when faulting on linear memories.
See above comment too.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
This resets the previously faulted guard pages to the expected read/write permission; the entire range of pages monitored by the user fault handler is read/write so a fault can occur anywhere in that range and the kernel will pass it to us.
This call to
mprotect
only ever occurs when a guard page has faulted; if a memory access is never out of bounds, then the memory managed by the pooling allocator will never have its protection level changed while still providing the correct semantics for Wasm.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
The reason the map stores OS pages while the uffd initialization is Wasm-page sized is that I was concerned over how much we might potentially allocate as part of module compilation to represent potentially sparse and disjointed data segments.
I think if we changed the implementation such that we only ever use the "paged" initialization when on Linux with uffd enabled, then I think we should change the map to be Wasm-page sized and give the kernel a single call to copy the page when handling the fault here.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll add that.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I can add that comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I can do that or instead make use of
anyhow
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
For some context, here's my understanding of how linear memories are created and initialized in Wasmtime (on Linux) with these changes:
On-demand allocator: by default we create a 6 GiB anonymous, private mapping with a
none
protection for the entire range. We then mark the pages that are part of the "minimal" initial set for the linear memory asread+write
. We then iterate the data segments and they instruct us to write to one of those initial pages; but the page is not yet backed, so this causes a minor page fault and the kernel's memory manager handles it by backing the page with a single, shared CoW-marked zero-filled page. The write causes a copy of the page and thus a new page is physically backed for the process and subsequently filled with segment's data. When the instance is deallocated, the entire mapping of the linear memory is freed.Pooling allocator: we allocate the same 6 GiB anonymous, private mapping with a
none
protection, but upfront and part of a pool of linear memories. This is otherwise identical to the on-demand case: we still change the protection when creating/growing the memory and expect the kernel to handle missing backings with a zeroed page. Unlike the on-demand allocator, we may reuse the linear memory's mapping for a new instance; we effectively clear any backing for the linear memory's pages withMADV_DONTNEED
when the instance is freed, thus guaranteeing the same initially-zero page the next time it is accessed (causing another minor fault).Pooling allocator with
uffd
enabled: same as without theuffd
feature, except the entire region is protected asread+write
instead ofnone
. This means we don't have to callmprotect
to grow memories or when we reuse the linear memory's space for a new instance. Instead, we instruct the kernel to pass us any faults in this region and if we determine the access is out of bounds, we'll change the protection (temporarily) tonone
as-needed to induce aSIGSEGV
. We have two ways to handle the fault to an accessible page: zero the page or copy from another page. As we're able to handle the faults ourselves, we can skip any initialization of defined linear memories at instantiation time and just do it when the page is first accessed.The trade-off for uffd is that handling page faults in user space is slower than letting the kernel handle the faults, but we can skip any data segment initialization until actually accessed (a win if there are large data segments that aren't fully utilized). But more importantly, without all those
mprotect
calls, we're reducing kernel lock contention, page table modifications, and (possibly?) TLB shootdowns. Access to guard pages notwithstanding, the page table entries only get modified when an instance is freed with theuffd
feature.In theory, the penalty of handling the faults in user space is outweighed by the improved concurrency.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
So this happens when the data segments either initialize imported memories or use a global base (this is treated as "segmented" initialization instead of "paged").
For the bulk memory proposal, initializing an imported memory should have visible side effects if the module fails to instantiate, so we can't delay-initialize the pages. Thus we handle the fault by zeroing the page; the fault is actually caused by Wasmtime during module instantiation, so the next thing that's happening when we wake the thread is that it's copying the segment data into the page.
I just realized we can actually delay-initialize when a global base is used (for defined memories) because it's required to be an immutable global, so the value cannot change from the time the module instantiates to the time the page is accessed. I'll fix that.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll see about adding this test case.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'll update these methods will more comments to make it clearer as to why this is necessary and how it works.
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Oh nah that makes sense, seems fine to leave this as-is then.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
I think it was originally intended to be a bit more flexible than it is now, but nowadays I think it's best to avoid the
Option
here to be in weird situations where you have to handle bothSome
andNone
somehow. Basically I think it's fine to go ahead and box up the data during module translation. If this is ever a problem in the future we can tackle it then.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Nah yeah definitely something for a future PR, no need to handle this here!
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
If not no worries, just something I figured may help make this slightly easier to read
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Personally I think it's ok to add
anyhow
as a dependency. The alternative is something likethiserror
but for our purposes I thinkanyhow
is easier and works just fine.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Makes sense to me!
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Ah I forgot yeah that this only happens when all memories are being defined. That being said I think weird spec corner cases will require us to still perform in-order initialization.
According to https://webassembly.github.io/bulk-memory-operations/core/exec/modules.html#instantiation the 10th step is table initialization and the 11th step is data segment initialization. That means that we could initialize imported tables with functions this module defines (which close over defined memories). I believe that means that the module's internal memory can be exposed via those functions, so we'll still need to perform partial initialization.
This is actually also a slight bug I think with the pooling allocator which I think calls
set_instance_memories
first beforeset_instance_tables
, but should be an easy fix to swap them.I also feel like that while it's interesting-ish to think about this it's probably not really worth a complicated test to ensure the letter-of-the-law is followed in this respect.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Yeah I think that sounds reasonable to me! We should have semi-comprehensive tests for uffd anyway which I think should cover our bases in ensuring all branches are covered effectively.
alexcrichton created PR Review Comment:
It's ok to defer this to a future PR as well, should in theory be a relatively easy cleanup. I also wouldn't bend over too far backwards to get windows/unix to work, if it's too much we can just leave fibers as they are right now.
alexcrichton submitted PR Review.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Oh that's a good point about
Table::new
, but I also feel like this is a bit interesting wrt defined tables in wasm modules. This means that the pooling allocator will change the reported limits on memories/tables from instantiated modules relative to the on-demand allocator. That to me feels a bit odd because I figured the limits would reflect the in-wasm limits rather than the system-imposed limits.
alexcrichton created PR Review Comment:
Indeed I think that is sufficient!
alexcrichton submitted PR Review.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
I'd be ok either way myself, I figured it was slightly odd that we have a trait but then we also have an enum to do roughly the same thing.
FWIW I was curious if we could make a test where we use a custom
#[global_allocator]
to do something like count the number of allocations that happens duringInstance::new
while using a pooling allocator, and we could have the test clamp with something like "we perform 10 allocations today and that number must not go up" so we can catch additional allocations (and even update the test as we remove allocations in theory). Felt like a little-bit overkill though too.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Eh either way is fine by me, I'm mostly just looking to consolidate platform-specific memory manipulation. The
Mmap
type already is a bit odd I think in that it's kinda using an upstream crate kinda not, and it may be best if we just make our own module nowadays. Anyway that can be a refactoring for another day.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Ah ok that makes sense, and I agree that this seems like the best way to handle this. Agreed that creating a
Func
shouldn't accidentally reserve 6GB of address space because you're using a pooling allocator.Could the comment here be expanded with some of this too?
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Yeah I don't think we need to be super elegant with these knobs, they're quite advanced and will likely rarely get tweaked by folks other than us anyway. I think it's good to have notes about the effects but no need to make them the most ergonomic.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Ah ok makes sense, let's stick with just using
thread_rng
for now (no need I think yet to start using a custom rng unless necessary), and if that's a problem in the future we can fix it then.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
I'm waffling on this a bit over time myself. I'm not actually sure that this is a good idea, because if you want to guarantee that each instance can have a memory (or two memories) then you have to do what your implementation does today anyway (basically pre-assign memories per instance). Having a few extra memories so some modules can have 2 memories and most others can have one only works when everything is cooperative.
I feel like the cooperative case doesn't really happen that much if you're using the pooling allocator?
I think this should probably actually stay as-is where the limits are guaranteed to be available for each instance, and we can always revisit this in the future if necessary.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
And yeah I wouldn't really go out of your way to try to make this generic and wrapped up nicely, only if it makes sense. Just something I noticed while reading!
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Ok that all definitely makes sense, thanks for explaining!
One of the big selling points of uffd is that we don't have to explicitly change the kernel's memory protection levels when instances are allocated/deallocated or when tables/memories grow (or for reused fiber stacks either).
Ah right yes, it's all about avoiding those IPIs with cross-cpu TLB flushes?
We do, however, routinely change the page tables but only insofar as adding allocated pages for regions? e.g. filling something in with zeros or from another page.
Granted, this implementation could be a little smarter in identifying pages that will never become accessible and thus leave them forever with a NONE protection level (something I will try to address).
Ah yeah this is also something I was wondering about. Presumably the guard page for all async stacks and the 2GB guard region for each memory could always be mapped as
NONE
?
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
So if you set
vm.overcommit_memory
to 2, wasmtime with uffd (or I guess the pooling allocator in general?) is inoperable.Otherwise this business about a random
SIGSEGV
, how does that play out with uffd? We're the ones who ask the kernel "please allocate a zero page for that spot" or copy from somewhere else. Does that request for a page fail with an error we can handle? Or does it "succeed" and/or kill our process right there?
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Ah ok, could this perhaps be something like a constant which says "default number of OS pages to initialize on a wasm fault", and we update that as necessary?
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
So to make sure I understand
- Guard pages never have physical memory backing them (e.g.
NORESERVE
)- If permission is read/write, faults get routed to our uffd
- If permission is
NONE
, faults get routed to the signal handlerSo this reset needs to happen so future faults go to uffd?
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Ok that all makes sense, thanks for explaining!
So basically uffd or not, the same thing happens in this case where we manually initialize all data segments, which is that the first fault on each page someone puts a page of zeros there and then we overwrite it.
My pondering here was mostly around how we know how big our memcpy is and which pages need to be resident, so I figured we could skip the uffd handler entirely and just tell the kernel "put a bunch of pages there", but that's probably overkill and not actually really needed for any situation today in terms of being a bottleneck.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Ok cool makes sense, could you add some comments to that effect here of when this case will be hit?
pchickey submitted PR Review.
pchickey created PR Review Comment:
Agreed, and compilation will happen in a different process.
pchickey edited PR Review Comment.
pchickey edited PR Review Comment.
peterhuene created PR Review Comment:
Isn't populating an imported table with functions from a module that failed to instantiate...problematic? The function might (transitively) reference a defined memory that wasn't initialized properly because the data segment never got copied because a previous segment was out of bounds:
Each active segment is initialized in module-definition order. For each segment, if reading the source or writing the destination would go out of bounds, then instantiation fails at that point. Data that had already been written for previous (in-bounds) segments stays written.
I'm not exactly sure why the bulk memory proposal did away with the upfront bounds checking on the segments at initialization time and explicitly allowed for these side effects to occur; of which there is no reasonable recovery other than completely throwing out any table or memory that was imported to the module that failed to instantiate as they are now in indeterminate states.
peterhuene submitted PR Review.
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
pchickey submitted PR Review.
pchickey created PR Review Comment:
In Lucet we implemented both strategies and benchmarked each under production loads. It turned out that initializing a Wasm page on each fault was more efficient than a single OS page. We didn't experiment with intermediate points in that design space but it makes sense to define a constant and we could try it!
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Re: the order of
set_instance_memories
andset_instance_tables
, these are setting up theInstance
's fields and both done prior to initialization, so the order doesn't matter.The order matters in
initialize_instance
, whereinitialize_tables
is called beforeinitialize_memories
(with uffd enabled,initialize_memories
isn't called so that the page fault handler can on-demand init the pages).
peterhuene edited PR Review Comment.
pchickey submitted PR Review.
pchickey created PR Review Comment:
we used to panic in the case where no event was available (https://github.com/bytecodealliance/lucet/blob/main/lucet-runtime/lucet-runtime-internals/src/region/uffd.rs#L203), but in our private production fork (ask me on slack) we no longer panic in this case, and instead log at the
warn
level andbreak
. Despite lots of debugging we cannot determine why this behavior occurs, but it does not appear to cause any real problems. I recommend we do the same thing here, and if we never see the warning after we get some production time on it, we can change it back to a panic.
peterhuene edited PR Review Comment.
pchickey created PR Review Comment:
This thread just loops on a blocking read on uffd, which doesn't leave any mechanism to tear it down. With the change I suggested above, this is unreachable.
pchickey submitted PR Review.
pchickey edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I believe the instance's type (i.e. the fact that an exported table/memory's maximum would be unbounded vs. artificially constrained by the pooling allocator) is based on the table/memory plan from the
Module
and not from the runtime table/memory. So Wasmtime API users would see the same limits as reported by the module for any exported tables or memories when using the on-demand allocator vs. the pooling allocator.This function,
Table::maximum
is only be called when attempting to grow the table. Thus this is accurately reporting how far the runtime representation of the table can grow before it fails, which to me makes sense for the runtime's representation to report as constrained by the pooling allocator.
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Random faults that would otherwise signal a
SIGSEGV
aren't sent to our fault handler as we only monitor specific regions of address space and they are protected in such a way (i.e .read+write
) that the kernel should never signal aSIGSEGV
normally itself (other than, perhaps as a failure to back one of our pages given we're massively over-committing the address space reservation).
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Your summary is correct.
If we limit the recording of resettable guard page to only pages that could potentially be accessible in the future, the reset is basically telling the kernel memory manager that we want to start receiving the faults again for that page. Upon the next fault we then get to decide to back the page or treat it once against as a guard page.
pchickey submitted PR Review.
pchickey created PR Review Comment:
The usage of
nix
inuserfaultfd
is just to call ioctls andread
. I don't think we've run into the funny spots of nix there but I understand your apprehension.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
With wizer (I believe) organizing initialized memories on page boundaries, I think an implementation that mmaps these pages CoW directly from the file might definitely be something to investigate for the future. We could do the same for our cached compiled modules too.
Granted the whole bulk memory side-effects thing might throw this stuff for a loop).
peterhuene edited PR Review Comment.
fitzgen submitted PR Review.
fitzgen created PR Review Comment:
With wizer (I believe) organizing initialized memories on page boundaries
Correct.
It is also perhaps worth doing this kind of pre-processing of all modules' data segments when we compile them, so that we could hit this optimization unconditionally.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
As discussed in Slack, the fault handler will monitor for the regions being unmapped and when the count of regions being monitored hits zero, it should gracefully terminate. This does require a slightly newer kernel version (4.11) compare to Lucet's uffd implementation, however.
The hope with doing it this way is that we get around the weird "POLLIN but can't read" issue we're seeing from Lucet's non-blocking implementation by having blocking reads :fingers_crossed:
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
IIRC the bulk memory change was to spec memory initialization as
memory.init
instructions to make the spec a bit simpler, but I won't pretend to know all the intricacies there. I do believe, though, that it was acknowledged that functions could be put into tables which could reference not-fully-initialized memories at the time.For the ordering ah yeah good point, I think you're right in that a swap isn't necessary.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Aha ok then this already has the semantics I was hoping for, which is that the
Limits
as reported by wasmtime are always those written in the wasm module.
tschneidereit submitted PR Review.
tschneidereit created PR Review Comment:
Granted the whole bulk memory side-effects thing might throw this stuff for a loop.
Should this turn out to really throw a wrench in the works here, I wonder if that might be a reason to revisit this at the spec level? I know that bulk memory is stable, but this seems like a corner-case-y enough thing that perhaps there's some wiggle room?
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I think we can still do the preprocessing, similar to what this PR is doing in-memory, but due to the bulk memory spec's relaxation of the upfront bounds checking at initialization time, we'll have to enforce certain requirements to initialize memory this way:
- If any data segment references an imported table, use the existing memory initialization implementation (i.e. iterate the segments directly).
- If any data segment uses a global base, use the existing implementation (global bases, while immutable, are always an import).
- If neither of those are true, then we can organize the data segments into pages at compilation time; if a data segment is out of bounds, we stop the organization and set a flag indicating a segment went out of bounds.
When initializing the instance, we would initialize tables first per the spec; an out of bounds element segment would fail the instantiation and memories wouldn't be initialized at all.
When initializing the memories, if we're using the existing initialization strategy, we iterate the segments and copy the data, stopping if the segment was out of bounds and thus failing the instantiation.
If we're using "paged" initialization, we copy (delayed for uffd) or, potentially, mmap all of the pages in the organized data and then check the flag to see if there was an out of bounds segment, returning failure to instantiate if that's the case. This should maintain the "partially initialized" side effects allowed for in the bulk memory proposal, I think.
peterhuene edited PR Review Comment.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
So I am correcting this to not treat "out of bounds" and "paged" initialization as two different variants; the "paged" initialization now records if an out of bounds segment was observed in addition to the pages that, up until that point, were initialized.
The memory initialization process now first copies the "successfully initialized up until the out of bounds access" pages (or let's the uffd handler fill the pages as needed) before a trap is returned. This should maintain the observable side-effects required by the bulk memory proposal.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
So I've refactored this a little in a forthcoming push that makes
make_accessible
less generic sounding and explicitly related to protecting pages for linear memories (this also makes the semantics clearer when uffd is used).decommit
remains, but I don't think belongs onMmap
as the pages to decommit are rarely the entire mapping (the pooling alloctor tries to release the backing of a pages that were "used" for the instance, ensuring that new zero pages are mapped back in for the next instance).As such, I don't think there's much to do for this comment and I'm resolving it.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
So I'm going to push a change that removes the use of
MAP_NORESERVE
. This was previously needed because the memory pool was being protected asREAD_WRITE
for the entire mapping (which is multiple TiB by default for 64-bit hosts).We now map the entire memory pool with protection
NONE
, which doesn't count towards overcommit limits.When
uffd
is enabled, it marks only the pages that could potentially be accessible asREAD_WRITE
(and thus only this does count towards the overcommit limits). This also means the user fault handler now only gets faults for pages that might be accessible depending on the linear memory's current size.This has the benefit of failing fast (when the pooling strategy is set in the
Config
) if the kernel doesn't think it can reserve all the backing space needed for the accessible pages; compared to receiving randomSIGSEGV
withMAP_NORESERVE
, I think this is desirable.Resolving this conversation for now.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'm going to keep this current design rather than implementing free lists for the memory and table pools as I think it's desirable that the module limits guarantees
N
number of instances can be created, provided the modules being instantiated conform to the limits, rather than allow for memory or table pool exhaustion causing instantiation failures.I think this is easier for users to reason about. Resolving this conversation for now.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Resolving as we're going to keep the current design where we guarantee an instance can allocate provided 1) the module conforms to the module limits and 2) the maximum number of instances hasn't been reached.
We can address this in the future if this needs to change.
peterhuene created PR Review Comment:
While
MemoryPool
andTablePool
have a lot of overlap, they differ a bit fromInstancePool
andStackPool
to be entirely generic.Consolidating
MemoryPool
andTablePool
probably isn't worth it as the impls are pretty small.Resolving this as "won't fix" since we're not changing
MemoryPool
andTablePool
to maintain their own free lists.
peterhuene submitted PR Review.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'm going to land this PR without lifting the fiber creation implementation in
wasmtime-fiber
. I've opened #2708 to track the follow-up work.
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
The pooling instance allocator now protects statically known guard pages as
NONE
and will only change protections on pages that could potentially be accessible in the future.The uffd implementation now only receives the faults for the potentially accessible pages as well.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
This code has been eliminated and we now just set the protection on the stack guard page to
NONE
without relying on the page fault handler to treat it as a guard page access (it'll never receive the fault event for the stack guard page).I still think we should have an async feature test that will hit the guard page and crash to verify both the
wasmtime-fiber
implementation and the pooling allocator implementation.
peterhuene edited PR Review Comment.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Fixed this in the
wasmtime
crate as returning a "table out of bounds" trap (the only trap code that would make sense to return from this runtime function) here for mismatched type seems wrong; this is an API bug anyway since mismatched type shouldn't pass module validation.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Fixed this in the
wasmtime
crate as returning a "table out of bounds" trap (the only trap code that would make sense to return from this runtime function) here for mismatched table types seems wrong; this is an API bug anyway since mismatched table types shouldn't pass module validation.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Table
now stores by raw pointers for both "static" and "dynamic" storage, so these assumptions are now being enforced viaTableElement
.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'm going to punt on this test for now as it isn't specific to the pooling allocator and add it as part of #2708.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
I'm leaning towards just leaving this as-is and not boxing a trait-object in the static case; I'm of the opinion that the
MemoryStorage
enum doesn't complicate it all that much (it's a pretty small amount of code) and not having to box a trait object will be desirable for Lucet-like use cases. Marking this as resolved.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene requested alexcrichton and fitzgen for a review on PR #2518.
alexcrichton submitted PR Review.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
I've been thinking about this storage here and have been waffling a bit on what to do about this. I think that this works today, but only just. The thing I'm worried about is that a memory doesn't really "belong" to an instance after creation because it can be used in any number of other instances/locations. (or directly from the wasmtime API itself)
I was originally worried that if a different instance than the defining instance faulted the memory that the guard page recording would go to the wrong place. Since it's done by the instance's index in the pooling allocator, though, that should be fine. I was then worried about the wasmtime
Memory::grow
API, but that also turned out to be ok because it's routed throughInstance::memory_grow
(which I forgot).It does sort of feel like we're close to the edge, though, and perhaps it'd be best to store this in the
Memory
itself rather than in the instance?
alexcrichton created PR Review Comment:
I think that we need to be careful here since running the dtor of the
VMExternRef
could run arbitrary code. I think that this should swap the pointers first, and then destroy the previous pointer to ensure if the dtor accesses/aliases the table it can't alias itself (and the table is in a valid state)
alexcrichton created PR Review Comment:
To avoid the extra
cfg_if!
here I think this could implement the new parameter as:const USE_PAGED_MEM_INIT: bool = cfg!(feature = "uffd") && cfg!(target_os = "linux");
and that I think means that the overall structure here could remain the same?
alexcrichton created PR Review Comment:
Could this update the
Resource
variant to store ananyhow::Error
? Otherwise this I think is losing the original source of the error by calling.to_string()
alexcrichton created PR Review Comment:
To confirm, by calling
mmap
here the kernel basically just zeros out the pages if read/write is requested? I wasn't actually aware of that behavior of mmap, could that be added as a small comment here?
alexcrichton created PR Review Comment:
Cut off sentence?
alexcrichton created PR Review Comment:
I think this comment may want to be updated from the other comment that looks like this as well. By my reading it's not enough to just zero the pages, but we also need to guarantee that all future faults to these pages to through to uffd, right? So this is technically both zero-ing and ensuring that faults get routed right?
alexcrichton created PR Review Comment:
Shouldn't
max_memories
play into this computation somewhere as well?
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
As this is a fixed mapping at the same address, it effectively discards the existing pages and remaps them to the CoW zero page (because of being both private and
PROT_READ
|PROT_WRITE
).
peterhuene edited PR Review Comment.
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
It does in
MemoryPool::get
.This loop is to get the base address of every memory in the pool.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Oh clever I missed that! Even though the
for
was staring at me in the face...
peterhuene submitted PR Review.
peterhuene created PR Review Comment:
Agreed that this probably makes more sense in
Memory
. I'll move it there.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene updated PR #2518 from add-allocator
to main
.
peterhuene merged PR #2518.
Last updated: Dec 23 2024 at 13:07 UTC