hi! thought I'd create a stream for #9544 as part of making wasmtime fully functional on illumos, and also getting rid of a lot of unsafety.
I started off with #9620 to add a type-level wrapper around usize
to ensure that it's always host page aligned
A question I had is that https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasmtime/src/runtime/vm/mmap_vec.rs seems to suggest that there's MmapVec::split_at
but that doesn't appear to exist. Is that WIP?
Alex Crichton has marked this topic as unresolved.
Oh that method used to exist but no longer does. Which as I think more about this obviates the need for MmapVec
entirely...
I needed to re-rationalize the docs in https://github.com/bytecodealliance/wasmtime/pull/9614 anyway so I went ahead and removed the Arc
as part of that commit and shifted around some unsafe bits
thanks! sorry about the from_file booboo. I thought those would be page aligned too
(should they be? feels like a good invariant to have)
Ah no that wouldn't work. However it looks like the only way to construct a file-backed mmap is using MmapVec
-- which suggests that maybe we should have a type-level distinction between aligned and unaligned mmaps
ah yeah so Mmap
I think should always be page-aligned but MmapVec
is not guaranteed to be page-aligned
so that'll be the point where sizes are rounded-up and the length of MmapVec
will be a plain old usize
Well -- Mmap::from_file
is not page aligned, right?
the start is page aligned but the length might not be
https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasmtime/src/runtime/vm/sys/unix/mmap.rs#L93 -- there's no rounding up of len happening
oh right yeah good point
I had a rough idea that the constructor there would return (Mmap, usize)
where usize
was the length of the mapping created byte-wise, but the Mmap
would store the rounded-up size as the true length
(assuming OSes actually round up and give us ownership of the whole page)
Looks like it's rounded up and zeroed out on Linux: https://man7.org/linux/man-pages/man2/mmap.2.html#NOTES
But POSIX doesn't commit to anything: https://pubs.opengroup.org/onlinepubs/9799919799/functions/mmap.html
Yeah I just had a chat with Bryan Cantrill and he said to not assume anything about the rest of memory
This may mean we can't store a page aligned length in mmap in that case? That feels a bit unfortunate though... Could also use a separate field to clamp lengths of read-only/executable memory?
I have a draft working with a type-level assertion of this. will post tomorrow
Basically making two versions of Mmap, one that is aligned and one that isn't. From my read of the code they seem to be two generally different kinds of operations that happen to use similar Unix APIs
I think it'd be reasonable to split out yeah, especially if one platform has totally different APIs for managing one vs the other (which I suppose is sort of like how Windows treats it)
put up a PR for what I had in mind: https://github.com/bytecodealliance/wasmtime/pull/9639 -- I've rebased 9620 on top of it and it works well
Looks great, thanks!
awesome, thanks @Alex Crichton. slowly inching towards making this code less terrifying haha
What's the reason host_page_size
is behind the signals-based-traps
feature right now? I need HostAlignedByteCount, and therefore host_page_size, as part of the RuntimeLinearMemory
API which isn't behind that feature
(specifically for the byte_size
function, which returns a multiple of the host page size)
Right now signals-based-traps
is used to basically mean "be as portable as possible, assume no virtual memory", so when that's disabled there might not even be virtual memory which is why the function is gone.
But also I would imagine that RuntimeLinearMemory
wouldn't want to use this because it's all about wasm byte sizes and such and memories could be either 1-byte-pages or 64k-pages there
Hmmm. So there are parts of the codebase that panic today if the byte size isn't a multiple of the host page size
Is the contract that "if there is host virtual memory, byte_size is a multiple of it?"
In theory the contract is that such parts don't exist if virtual memory is disabled at compile time, but we're also sort of designing all of this as we go along
so it's less so an "if" and moreso "the question isn't asked if signals-based-traps
is disabled"
Ahh I see
So can the byte_size API be gated on signals-based-traps?
you mean HostAlignedSize
?
oh no you mean RuntimeLinearMemory::byte_size
that one definitely doesn't want to be host-aligned b/c with 1-byte-page-sized memories that intentionall isn't host-page-aligned
Alex Crichton said:
oh no you mean
RuntimeLinearMemory::byte_size
Yeah I mean this one
in that the return value there is semantically the byte size of the wasm linear memory, which is wasm-page-aligned and that page size depends on the wasm module
Alex Crichton said:
that one definitely doesn't want to be host-aligned b/c with 1-byte-page-sized memories that intentionall isn't host-page-aligned
Hmm that is interesting because an unaligned byte size does theoretically panic today
oh dear that's a bug we should fix!
ok. once I've had some coffee I'll try tracking down what I saw last night
yeah the return value of byte_size
there should definitely always handle being not-page-aligned
Ok. So byte size is really the wasm page size?
right yeah
Which is a power of 2 but is otherwise unrelated to the host page size? yeah make sense
correct yeah
wasm as-is-stable today requires 64KiB page sizes
but the custom-page-sizes proposal relaxes that to also having a page size of 1
other powers-of-two aren't currently allowed, but the theory is we may want to allow them in the future
Ok. I guess that's why it wasn't hit
Btw why is it called byte_size and not wasm_page_size?
there's definitely been some back and forth in this regard, and I by no means would claim the curent naming is the best
but many other places we handle wasm pages elsewhere in the codebase are typically with a concrete type (u32 or u64) and in units of pages rather than units of bytes
in that sense byte_size
was intended to both use usize
to convey it's talking about the host and also clearly indicate it's a byte-based unit and not a page-based unit
Ah
Should be a newtype ;)
we could certainly do with much more documentation
https://github.com/bytecodealliance/wasmtime/blob/a3d00777c1080c147f4e9a03dbd00214c61af6e3/crates/wasmtime/src/runtime/vm/memory.rs#L146 the comment there feels misleading to me btw
yeah I've been meaning to rework these traits and write many more words on the docs
RuntimeLinearMemory
and wasmtime::LinearMemory
are basically duplicates of each other and I'd prefer to only have one
and that documentation should be greatly expanded to talk more about wasm pages and such
Yeah that would be helpful. As a newcomer I definitely did not understand what it meant. I thought I was something closer to currently allocated memory
since byte_size is right above byte_capacity
yeah I've found when it comes to linear memory it's generally required that many words are needed to convey what's going on
(which is why I'm loving the newtypes since that's the "picture is worth a thousand words" idea)
forcing decisions around newtypes certainly shook out this bug lol
@Alex Crichton filed https://github.com/bytecodealliance/wasmtime/issues/9660
aha yeah that rings a bell, thanks!
It's definitely the case that we're walking a fine line with the CoW bits and wasm linear memories that are using 1-byte pages
I think there are likely other places where there's an implicit assumption too -- this was just the one I directly ran into
Is the solution here to round up accessible to the host page size?
yeah we're leaning relatively heavily on our fuzzing here to try to suss out all the weird ways these features can interact
probably? I'd have to dig in more though (dealing with memory image slots just triggered the "oh yeah that sounds familiar" thinking)
that wound be wonderful, thanks!
should I hold off on 9652 until you have a sense of what to do here? 9652 forces a decision here (fortunately or unfortunately), either to:
byte_size
/accessible
(maybe ok for now with the understanding that it would need to be fixed for smaller page sizes)oh in that case let me dig in now
Thanks! I have a personal thing to handle for the rest of the day so no immediate rush
ok I think there's two possible ways to go here:
accessible
is always page-alignedMemoryImageSlot
, one for the wasm-visible accessible size (accessible: usize
today) and another for the page-aligned-rounded-up size (e.g. "memory below this point is all accessible by the host"). That I think is roughly what MmapMemory
keeps track of.I think this latter one is probably the route to go? (may require more refactoring though...) Happy to help out if this is all ending up too far afield of what you're interested in
I'll have a look, thanks! I'll be honest I'm tempted to do the first for now and then look at the second as a followup. It sounds like CoW is an optimization and not something required for correctness?
correct yeah, and happy to merge that too!
Ah, ran into another fun issue around RuntimeLinearMemory
. It looks like there's now -- very recently -- a malloc-based impl as well, however I'm not seeing where it is aligned to a page boundary: https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasmtime/src/runtime/vm/memory/malloc.rs
This base pointer gets passed into mprotect here: https://github.com/bytecodealliance/wasmtime/blob/67674881db5fbdbba8594feb52655aaa351a5f77/crates/wasmtime/src/runtime/vm/cow.rs#L708 -- as mentioned before, mprotect requires the base ptr to be host page aligned
More generally, supporting the malloc impl adds a bit of a challenge -- my plan was previously to pass &Mmap
instances around, but that wouldn't be suitable for the malloc-based impl ofc. We may need to have a general way to express the idea of mprotecting some range of memory within an allocation
This is in theory mitigated by https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasmtime/src/runtime/vm/memory.rs#L136 which prevents the use of malloc with CoW
For CoW stuff it may suffice to add a method to get an optional mmap from the memory?
I feel like I saw a situation where the CoW impl might have used a malloc pointer but I'll double check, thanks! What you're describing makes sense to me
Ah doesn't look like there is, though the argument is pretty convoluted haha
there are a lot of pointers being passed around
another interesting one, https://github.com/bytecodealliance/wasmtime/blob/67674881db5fbdbba8594feb52655aaa351a5f77/crates/wasmtime/src/runtime/vm/instance/allocator/pooling/unix_stack_pool.rs#L164 might underflow with zero-sized stack. I didn't see a precondition check though there might be one I missed
Ah that one is protected by this, but I agree that the precondition checks and reasons they don't panic at runtime are pretty thin as the checks are relatively far from the use-site
and I also definitely agree that the amount of pointer-passing is one I've hoped we can improve over time
ahh I see it. since allocations are guarded by the precondition, deallocations are as well. probably worth checking the precondition again though since this is a public method
I hope all this has been helpful btw, this is definitely making the greater mmap usage easier to handle
yeah I think it's all great, it's much nicer than having a smattering of asserts here and there which felt haphazard at best
ok put up a PR for unix_stack_pool: https://github.com/bytecodealliance/wasmtime/pull/9678. this should now really help with moving more ops over to mmaps
@Alex Crichton hmmm.... I have a suspicion there's a subtle drop order issue with CoW images.
https://github.com/bytecodealliance/wasmtime/blob/438fc938c26e04c8238c4a1275992246fbf9d130/crates/wasmtime/src/runtime/vm/cow.rs#L782 remaps over that space in case of a drop (e.g. due to a panic?)
But https://github.com/bytecodealliance/wasmtime/blob/438fc938c26e04c8238c4a1275992246fbf9d130/crates/wasmtime/src/runtime/vm/instance/allocator/pooling/memory_pool.rs#L107-L115 stores mapping
before image_slots
, which by Rust drop order rules means that the Mmap
will be dropped before image_slots
. That's not correct
This discovery was the result of my attempt to see how hard adding a lifetime parameter to MemoryImageSlot
would be. turns out it's quite hard for now, but this looks incorrect
I think the net effect of this isn't too bad -- it means that those bits of memory stay mapped forever AFAICT, so basically a memory leak. but yikes lol
certainly not anticipated by this code afaict
I think that's handled by the manual drop of the memory pool but we could use improved comments!
@Alex Crichton ah ... thanks
One option I thought about in the past is to pass around Arc<Mmap>
or maybe bring back MmapVec
in one form or another for something like this
so the mmap has shared ownership in places but each owned "slice" can only operate on a certain range
overall I've definitely felt pain about the pointer-passing we have for CoW right now as well
I was wondering that ... my concern is that that can lead to memory pseudo-leaks with slots hanging around unintentionally
(this is why I had half-suggested using weak refs)
lifetimes would ensure that slots don't outlive the original mapping
but they seem hard because there are a lot of self-referential structs rn
Personally I'm a bit more worried to get more memory safety here rather than worrying too much about resource usage, these mmaps are so big bugs with them tend to be found pretty fast
hmm ok
and we wouldn't need to worry about anything like cycles b/c the arcs wouldn't have interior mutability
not that it's a silver bullet of course, but I still think that using an owned "thing" instead of a raw pointer would be nicer for CoW
welll... there are a bunch of &mut self
methods on Mmap
rn
I think the OS does synchronization on the address space so they may not be required
we can probably change those to &self
, they're just &mut
b/c they could be basically
yeah
it's nice to have the guarantee of &mut
when we can
but if we can't well we can't :)
sorry to bring this up again -- I've spent a few hours staring at this and I'm still worried about the interaction between memory_image and the backing mmap.
This code: https://github.com/bytecodealliance/wasmtime/blob/0e3e863a57cd28e1ce9be628fd0d8df192077305/crates/wasmtime/src/runtime/vm/memory.rs#L464C5-L472 stores both an alloc
and a memory_image
.
https://github.com/bytecodealliance/wasmtime/blob/0e3e863a57cd28e1ce9be628fd0d8df192077305/crates/wasmtime/src/runtime/vm/memory.rs#L673-L675 drops alloc
but not memory_image
.
As far as I can tell, alloc
can be either a borrowed allocation (via new_static
) or an owned allocation (via new_dynamic
). In case of the former, I believe this is fine -- since the actual ownership of the memory is by someone else. But in case of the latter, the memory backing memory_image
might get unmapped.
AFAICT the only guard against this is the name unwrap_static_image
-- afaict nothing prevents it from being called against a dynamic image?
This seems like a really complex set of constraints that would be good to fit lifetimes into, but that seems very hard
also put up https://github.com/bytecodealliance/wasmtime/pull/9681 as an RFC. It's unfortunately really tricky -- the Arc<Mmap>
approach is difficult to do as well because of methods like slice_mut
I guess you could imagine Mmap
giving out exclusive access to chunks of memory, checked at runtime. But that's a ton of work, and it would probably also interact with all of the other ways memory is reserved from an Mmap
.
I'm unfortunately running out of time to work on this as well, so I don't think I can take on a giant refactor
ok managed to get the arc approach working, wdyt: https://github.com/bytecodealliance/wasmtime/pull/9682
This seems like a really complex set of constraints that would be good to fit lifetimes into, but that seems very hard
Yeah you're definitely right, I lost a few asserts related to this in the last refactoring and that's not great, it's definitely not going to work with non-static memories there
(left some comments on the PR about Arc)
Alex Crichton said:
(left some comments on the PR about Arc)
thanks, do you want me to make a PR on my own repo with the subsequent work? I hate gh and its lack of stacked PRs
heh all I know is github so I'm used to it at this point, so feel free to either push more commits to that PR or just make a second PR with that as the first commit
basically whatever's easiest for you
no need from wasmtime's side to have pr-per-commit
@Alex Crichton all right, put up https://github.com/bytecodealliance/wasmtime/pull/9687 as a draft that includes 9682 + further work
the description of the last commit isn't fully up to date yet, just wanted to get this out there
nice I'll try to take a look later today
thank you so much! tbh trying to understand this static memory lifetime thing ate up so much of my time yesterday
oh dear sorry it's eating up so much time :(
if it'd be helpful I'd be happy to jump on a call or similar to talk in-depth about this
thank you! might take you up on that soon for future work.
As a tiny bit of feedback I think renaming static to borrowed and dynamic to owned would have really helped
oh we are always very welcome to feedback like that :)
runtime internals are basically whatever someone first thought of which after many refactorings typically is no longer the best name
it's not borrowed in the having a lifetime param sense, but it is definitely logically borrowed if I understand it correctly
you're definitely right yeah
I think in general the logical lifetime of memory was quite hard for me to get my head around. I think I have a decent sense of it now and it is (currently) correct as used, though things like unwrap_static_memory
worry me
yeah I don't think we've done a great job of keeping things documented and coherent as they've evolved over time
now's as good as any time to get things coherent again though
and ideally we could remove hacks like that unwrap
I also feel like for a MemoryImageSlot
, clear-on-drop and logically-borrowed have some degree of coupling though I haven't fully understood the interactions
I suspect there's probably a better way to refactor all of this and represent it differently than it is right now, but I'm not sure how best that would be done myself
hmm, I tried out a few things yesterday but wasn't happy at any of them. I guess one way would be to have pervasive lifetime params everywhere. Don't have Memory
own the memory at all, always have a 'alloc
arena allocate and hand out memory. It would be a huge refactor though
yeah I can see where that might get out of hand
the best I can think of is something like a "state" where you pass in &Mmap
and it doesn't store it itself
and it has an assertion you pass in the same one or something like that
That's what my abandoned PR did actually
but that also feels like a large refactoring and may not be too too beneficial
ah ok that confirms the "not too beneficial" part then lol
https://github.com/bytecodealliance/wasmtime/pull/9681 did that -- have an MmapOffsetRaw
that &Mmap
was always compared against
That ran into at least two issues:
StaticMemory
: https://github.com/bytecodealliance/wasmtime/pull/9681/files#diff-28c2cb0feca42e3f9df5ae472338b696640b6dfd104d8980c26b6b6822ac8443reset_with_anon_memory
(see cow.rs
in that PR)Another issue there is how to deal with the decommit queue, which would need access to mmap functionality
And that's where I felt like pervasive lifetime params would help... but then I ran into issues with the same code representing both owned and borrowed allocations :sweat_smile:
yeah those are all very good points and I can see where it all broke down...
So yeah I basically went through the 5 stages of grief yesterday
lol
man yeah you're not kidding
so many failed attempts yesterday lol
I find that happens to me a lot when I'm not sure how to represent data ahead of time in rust
Arc
in a sense is nice but it also pushes all of this under the carpet
true
I don't think it makes it that much harder to unwind all this in the future though -- at least with Arc it's explicit where all the places that need access to the mmap are
also true yeah
as a heads up my day is coming to a close and I fear I won't be able to get to this today, I'm also going to be mostly out for the rest of the week for US thanksgiving and such, but wanted to let you know I haven't forgotten this and I'll get to it next monday at the latest
no worries, thank you so much! I'm also going to take the next few days off
oh dear my day has escaped me and I didn't get to this, I'll try to do this early tomorrow
thank you! apologies for not getting back this week, have been very busy
Updated https://github.com/bytecodealliance/wasmtime/pull/9687, and added some notes about impl Drop for LocalMemory
-- it actually identified the issue with unwrap_static_image
! gosh I love rust
is there a way to run all CI jobs as part of the PR? seems like my PRs haven't been triggering miri even though my changes affect it
prtest:full
in the commit message will do that!
thanks @Chris Fallin -- to confirm, that's in the message of the PR tip? or any of the commits? or the PR summary?
I think in the commit message of the commit that is tested (the tip); I don't think the CI setup is smart enough to grep further back in the commit stack
cool, thanks
https://github.com/bytecodealliance/wasmtime/blob/0a894cb384201f1899098b8583ad6b2154579b98/.github/workflows/main.yml#L210 ahh it looks like any of the commits in the PR will do based on that + a quick local test. thanks for the prtest:full
reference, was easy to find once I had that :)
oh neat! leave it to our resident YAML Programmer(tm) @Alex Crichton to do it the right way :-)
Last updated: Jan 24 2025 at 00:11 UTC