alexcrichton opened PR #2897 from new-api
to main
:
This PR is an implementation of RFC 11 which is a redesign of Wasmtime's APIs.
Note: The RFC is not merged at this time, so this PR is purely meant for informational purposes and review. This should not merge until the RFC itself has been approved and merged.
I've gotten this impelmentation to the point now where it should be passing all in-tree tests. I think the performance is suitable for performance testing as well to ensure that it doesn't regress anything. I'd like to ideally get some review on this so by the time the RFC merges it's ready to go, and I'm also curious for design feedback which may influence the RFC itself as well.
I don't recommend paying attention to the commits themselves. They're all a mess and 95% of the changes are in the first commit. I plan on squashing everything and making a better commit message before merging.
Currently the only remaining TODO item is documentation. I have not yet update the API documentation of the
wasmtime
crate or the C API, and a lot of love needs to be poured into there. I would consider this a blocker before merging, I'm just working on the actual implementation itself first (Python/Go bindings are up next). I would also like to put more work into documenting some internal design decisions in inline comments, currently I don't think there's a sufficient amount. I plan on doing this with the general documentation pass I'll be doing later this week.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
@fitzgen this is something I added here I mentioned on chat, basically the location that globals of type
externref
get their reference count decremented.
alexcrichton created PR review comment:
@peterhuene you might be interested in this. I inserted the call to
drop_globals
above to fix an issue where when an instance was dropped itsexternref
globals weren't deallocated (this was a problem with the on-demand allocator as well).In doing so, though, I realized the pooling allocator was attempting to double-drop them. This is fine because the "drop" zeros out the memory so next time around it looks like a
null
externref, but I figured it was best to fix that. Here I cleared themodule
/offsets
field back to their original values when the instance was first initialized. I think that'll help prevent accidental leaks of data between instances or trying to interpret an instance as something it's not. Additionally this helps reclaim the memory used byArc<Module>
if this was the last reference to a module on the system.
alexcrichton created PR review comment:
This was something that I caught in the
Static
variant below and applied here. This previously prevented memories from growing to 65536 pages (exactly 4gb), but now this allows it. I've added a test that allocates this many pages to ensure this continues to work, but rereading the spec I believe this is allowed by the spec.
alexcrichton created PR review comment:
Technically the
module
can be inferred fromInstanceHandle
, but having a separate argument here as well made the implementations below much easier. The problem was that acquiring themodule
fromInstanceHandle
looked like we borrowed the entirety ofInstanceHandle
(since it was acquired through a method), which then prevents getting another mutable borrow on it.For exaple during table initialization we'd want to call
get_table
which "borrows" the wholeInstance
. Having these be separate borrrows on input basically proves that theArc<Module>
is rooted elsewhere on the stack above this. That avoided having toclone()
the module temporarily in all the below places.
alexcrichton created PR review comment:
@fitzgen I mentioned this on chat, but this was one of the important removals I made because technically globals should only work by reference so if you pull out an
externref
the reference count is increased.
alexcrichton created PR review comment:
@fitzgen this is probably worth pointing out here, but with
&mut self
we are now free from having to worry about recursive gc-in-gc so a lot of the bits and pieces here got simplified.
alexcrichton created PR review comment:
@fitzgen this is the second issue I mentioned on chat, this is the location where the reference count is now implemented relative to before when it was just copied over.
alexcrichton created PR review comment:
This was a major representation change for both memories and tables, the limiter is now passed in to all relevant operations rather than stored alongside each memory/table. This allows exclusive ownership to reside within the
Store
itself and, if necessary, it's passed down for operations. The limiter is also acquired through the newStore
trait in thewasmtime_runtime
crate
alexcrichton created PR review comment:
FWIW this was a semantic change where previously we'd allocate the memory, then realize we shouldn't have, and drop the memory if the limiter disallowed the
new
request. Now the limit happens before the allocation and we only proceed with alloating if the limiter allows it.
alexcrichton created PR review comment:
I've changed the representation of
Static
here to be more amenable to auto-impl ofSend
andSync
. I also folded themaximum
field into the length of thebase
slice here.My thinking is that this is faithful to the actual implementation as well where for the lifetime of this
Memory
it does indeed have exclusive access to the memory internally. There are other representations possible, too, which just requireunsafe impl Send
and such that can be implemented too.
alexcrichton created PR review comment:
FWIW before reading this file I'd recommend reading
memory.rs
above. Many of the changes here are similar to changes there, and I've commented the changes above to memories.
alexcrichton created PR review comment:
Note that the async bits here are a little different from before, but I think in a way that makes things a bit clearer. The
async_cx
is a "separately owned" structure which is connected (via unsafety) but not lifetimes to the originalcaller
. This allows us to pass the entirecaller
to the future and let it close over it. The Wasmtime implementation, though, has to be careful that the fields referenced byasync_cx
may be aliased, hence theirUnsafeCell
wrappers.
alexcrichton created PR review comment:
This in particular was a very important removal from this crate. This removes the need for
Any
which removes the need for'static
fromStore
, allowing anyT
to be used with aStore
.The store pointer is learned via the
VMContext
and the*mut dyn wasmtime_runtime::Store
contained within now.
alexcrichton created PR review comment:
I removed this in favor of
TypedFunc::new_unchecked
to make its presence a little less prominent.
alexcrichton created PR review comment:
I moved this here from
Store
, mostly just for aesthetic reasons, no real motivation other than that. This could arguably move toConfig
too, but it felt a little more appropriate here. Doesn't really matter one way or another though.
alexcrichton created PR review comment:
I moved this to
tests/all/func.rs
alexcrichton created PR review comment:
This is an interesting bound. This is required by internals (notably the
opaque_send()
function). I think it would be possible to make this future conditionallySend
if desired based onT
(rather than always requiringT: Send
), it'll just require some refactoring internally from what the current structure is. I'm not sure that it's all too useful to have that, so I went ahead and just addedT: Send
to the various async methods and if users really want non-Send
futures with wasmtime we can come back and remove this via other means in the future.
alexcrichton created PR review comment:
Previously this function would have used
with_last_info
from thetraphandlers
module, but now it looks up the store from within the providedcaller
*mut VMContext
. Note that previously the pointer could be null but I've now had it arranged such that this pointer is never null.
alexcrichton created PR review comment:
Note that this doesn't use
AsContextMut
so the liftetime can be explicitly closed over in the returned iterator
alexcrichton created PR review comment:
It's worth pointing out that I shifted functionality around here a bit. Now
into_abi
bakes in thecompatible_with_store
check as well asinvoke
takes/returns "ABI things" instead of raw values. This makes some more captures/closuresCopy
in the above_call
function, but also notably means that theinvoke
function no longer needs a borrow onStore
somewhow, which was much more difficult to provide.Now the borrow is only required on
into_abi
here, then the call can happen without an explicit borrow on the store, and then re-ification back into actual values can happen after the call finishes.
alexcrichton created PR review comment:
Note that this doesn't take
impl AsContext
so we can tie to the precise lifetime.
alexcrichton created PR review comment:
This was a very satisfying thing to remove. This was originally added to prevent cycles, but now there are no cycles, so we could just delete everything related to this!
alexcrichton created PR review comment:
It's worth pointing out that I haven't implemented the
sub_linker
API from the RFC. I'm not sure if that's actually useful to implement or whether someone wants it. The intention was to preserve the functionality where today you can create global functions viaConfig
and then each store-basedLinker
can add its own set of functions after-the-fact.If that's used by applications and still desired I can add the
sub_linker
functionality here, but I figured it might be good to start with something simpler and only add that later if necessary.
alexcrichton created PR review comment:
Note that this requires
Mut
where it doesn't actually need it, but I left it here for the same reason asexports
above.
alexcrichton created PR review comment:
R.e.
Instance::exports
above, this loop and thiscollect()
are what I'd like to avoid in the future. This is an unnecessary allocation that instantiation performs which I think we can get rid of at no extra cost.
alexcrichton created PR review comment:
These are only used for
HostFunc
and correspond to the creation of aHostFunc
and its deletion.
alexcrichton created PR review comment:
Note that I removed this in favor of a
Store::limiter
configuration API
alexcrichton created PR review comment:
The movement here was basically just movement of code from the
ModuleRegistry
to theGlobalModuleRegistry
. These functions should only be accessed during trap creation so the slow locking should be ok, and it means that we didn't need the store during trap creation which was made a few things easier.Data-representation-wise, though, nothing changed. The global registry already had all the information we needed.
alexcrichton created PR review comment:
A double-check on my logic here would be greatly appreciated.
alexcrichton created PR review comment:
This also, I think, keeps the usage of thread locals entirely contained within this file (except for manual management on async yields). One day I hope that we can actually remove the thread locals here entirely.
alexcrichton created PR review comment:
More eyes on this in particular would be appreciated.
alexcrichton created PR review comment:
It's also worth pointing out that
Mut
isn't technically needed here, yet. I realized that today, though, if you instantiate something we'll create a separate hash map in the wasmtime crate for all the exports (and their correspondingExtern
value).That feels inefficient and not necessary since the hash map is already stored within
Arc<Module>
. I'd like to improve this in the future so instantiation simply records theInstanceId
and nothing else, and then lookup/fetching here actually mutates the store to create a newFunc
(or whatever it is) on-demand rather than doing it all up front.This means that in the future mutability may be required, so I went ahead and added it here.
alexcrichton created PR review comment:
And another option of course is some sort of compile-time configuration where the number of bits allocated to the index is a build time variable which specific embeddings could configure. This is so obscure, though, that I'd prefer no one else have to ever worry about this.
alexcrichton created PR review comment:
This is, uh, very verbose. I'm not really sure how best to solve it though
alexcrichton created PR review comment:
(I think that
GuestPtr
needs to be rethought for other reasons too)
alexcrichton created PR review comment:
FWIW I think this is a real open question. I don't know what to do about this really. The way that
GuestPtr
works and how we need itSend
andSync
necessitates thisMutex
for now, but there's no threads here so there's no reason this should have a mutex. I'm hoping we can basically leave this to get improved later, probably by rethinking the wayGuestPtr
works.
alexcrichton created PR review comment:
After writing all this and thinking more about it, I think another possible solution could be:
- In Rust
Stored<T>
is 128 bits, 64 for the store andusize
for the index within the store.- In the C API we discard the 64 for the store and only have 64 for the
usize
, so the C API still uses 64-bit integers.That would make the C API more unsafe than it is right now, but this change could be implemented backwards compatibly (even ABI-compatibly!). That makes me much less worried about the limits imposed here.
peterhuene requested peterhuene for a review on PR #2897.
alexcrichton requested pchickey for a review on PR #2897.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
fitzgen created PR review comment:
// Data contained is always Send+Sync so these should be safe.
fitzgen submitted PR review.
fitzgen created PR review comment:
I kinda feel like we should remove this method at this point -- or alternatively give it an
Ordering
param -- because we use different orderings in different places, and its a bit subtle and something to be aware of, and so having this other method that makes the choice non-obvious and hidden from you seems a bit sketch.Also, this method is only used twice now: once in an assertion and once in the
strong_count
method.
fitzgen submitted PR review.
fitzgen created PR review comment:
// Cast from `*mut T` to `*mut dyn Any` here.
That's a lot easier than the way I was doing the casting :-p
I swear casting to
dyn
is the thing I have the most trouble with in Rust at this point.
fitzgen created PR review comment:
Nice that we can use
&mut self
down here too, I hadn't realized that we could propagate these changes this deep.
fitzgen created PR review comment:
Very nice!!!
fitzgen created PR review comment:
Wait -- is this actually true? What about
Drop
impls that call back into Wasm? I guess they can't since they don't have aStoreContextMut
. Nice.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton created PR review comment:
Good point yeah, the wasmtime API of
ExternRef::strong_count
now mentions that it does aSeqCst
load for synchronization purposes.
alexcrichton submitted PR review.
alexcrichton updated PR #2897 from new-api
to main
.
pchickey submitted PR review.
pchickey created PR review comment:
I'm interested to hear more of your thoughts on this when you have some time!
alexcrichton updated PR #2897 from new-api
to main
.
pchickey submitted PR review.
pchickey created PR review comment:
can we additionally have an
into_inner(self) -> T
?
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Ah yeah I'd forgotten to come back and implement this, now implemented though!
alexcrichton updated PR #2897 from new-api
to main
.
peterhuene submitted PR review.
peterhuene created PR review comment:
Nit: should the multiple
cfg
inside ofAsyncState
be promoted to a single one on the type itself?
peterhuene created PR review comment:
I think this comment is now out of date given these changes.
peterhuene created PR review comment:
I'm confused by this call to
drop
; would you mind explaining its significance?
peterhuene created PR review comment:
Should we update this test or delete it rather than commenting it out?
peterhuene created PR review comment:
// ever have to worry about the `Send`-ness of Wasmtime. If rustc says
peterhuene created PR review comment:
* cannot be called concurrently. For example, functions which correspond to
peterhuene created PR review comment:
:tada: :tada: :tada:
peterhuene submitted PR review.
peterhuene created PR review comment:
Nit: we can replace this (and below) with
InstanceAllocationStrategy::pooling()
for brevity.
peterhuene created PR review comment:
Nit:
// NB it's important that this destructor does not access `self.data`. That is
peterhuene created PR review comment:
At first when I reviewed this change I was concerned over the lack of registering the host func trampoline like we were doing before, but then I reviewed the changes in
store.rs
and saw how the host func trampolines are registered now directly with the store :+1:.
peterhuene created PR review comment:
I read through this comment and the logic seems sound regarding the justification of marking our fiber-based futures as
Send
:+1:
peterhuene created PR review comment:
Could you help me pinpoint where in the C API the 64-bit requirement is placed on
Stored<T>
? I assume it's related towasmtime_extern_union
but I haven't been able to spot where the size assumption comes into play.That said, this scheme makes sense to me assuming that requirement.
peterhuene created PR review comment:
Thanks for fixing this!
Looking over what I wrote originally, I don't think we should be storing
offsets
inInstancePool
at all.Instead
InstancePool::initialize
and this code here that is resetting the offsets withVMOffsets::new(size_of::<*const u8>() as u8, &*self.empty_module)
to ensure the current instance offsets match the associated module (empty both to start and when the instance has been deallocated).For
InstancePool::Initialize
we'd want to pass in the module and instance limits to set the capacity on the memory/table maps.
peterhuene edited PR review comment.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Oh this was a weird one where
memory.reset_guard_pages()
required amut
binding onmemory
but that call only happens with uffd/linux, so other platforms would get a warning about an unusedmut
. To fix that I just unconditionally added a line that would requiremut memory
here regardless of the platform so this would compile everywhere without warnings.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Whoops I completely forgot to come back to this test and update it apparently...
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Ok I talked with @pchickey about this and our conclusion is that eventually
GuestPtr
will probably get phased out entirely. Instead ofGuestPtr
we'll generate the wiggle traits with raw borrowed strings already. This means that theBorrowChecker
here can live in a mutable location and won't need interior mutability, alleviating the need for thisMutex
. That'll happen in a future PR though
alexcrichton submitted PR review.
alexcrichton created PR review comment:
The main requirement is that in the C API I would like to avoid types like
wasmtime_func_t
being owned types. Instead I'd like them to be POD-like with no destructor. For ergonomics they're also currently passed by-value everywhere in functions, e.g. returningwasmtime_func_t
or takingwasmtime_func_t func
or something like that. By-value structs always give me the willies (and I think concretely FFI interfaces like Python'sctypes
doesn't support them) so that limited us to integers. The biggest integer for that was 64-bits, hence the 64-bit restriction.There's other ways to change this though:
- With my idea above the 64-bits in the C API are just index bits, so nothing would be part of the store bits. Then Rust could use 128 bits and C could use 64.
- Another possibility is having the C API use 128-bit integers. This would probably show up as
struct wasmtime_func_t { uint64_t hi; uint64_t low; }
instead of a 128-bit integer, and then all APIs would take pointers to the struct (it'd never be returned by value)- Alternatively we could go back to
wasmtime_func_t*
and require dtors, but I would personally prefer to avoid this.While I originally felt that it was kinda nice to have 64-bit types in the C API, I may be coming around to the 128-bit-size-struct idea. That keeps safety in both Rust & C (somewhat at least) and I think shouldn't make bindings really that much more difficult. I think the ergonomics of the C bindings will go down a bit but that's not really the end of the world per se.
Do you think it's worth these shenanigans to get 64-bit types? Or should we just go ahead and use 128-bits everywhere?
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Hm so having thought a bit more about this this is the diff of "what if all the types in the wasmtime API were 128-bit values". That actually looks relatively clean to me and makes me think that I should probably just change to that? The Go/Python bindings had no trouble updating to using that idiom either.
This also means that if we ever want to store more stuff in there it shouldn't really be a problem either?
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
peterhuene submitted PR review.
peterhuene created PR review comment:
Ah, that makes complete sense.
peterhuene submitted PR review.
peterhuene submitted PR review.
peterhuene created PR review comment:
I think I would lean towards making that change so that we don't have to place limits on the number of stores that can be created (even if the limit seems reasonable today).
I looked over that commit and it looks good (minor spelling error on "belogns" in a few places, though).
pchickey submitted PR review.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
Could we set
default-features
tofalse
here, please? Otherwise this results in thedefault
feature ofwasmtime
being forcefully enabled (because of cargo resolver's feature unification), and making it impossible to disable it in embeddings that indirectly make use ofwiggle
.wasmtime = { path = "../wasmtime", version = "0.27.0", optional = true, default-features = false }
bnjbvr submitted PR review.
bnjbvr created PR review comment:
Hmm this is slightly wrong: should also re-enable manually the
async
feature.
bnjbvr edited PR review comment.
bnjbvr deleted PR review comment.
pchickey updated PR #2897 from new-api
to main
.
pchickey submitted PR review.
pchickey created PR review comment:
Thanks, I missed that.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton updated PR #2897 from new-api
to main
.
alexcrichton merged PR #2897.
Last updated: Oct 23 2024 at 20:03 UTC