alexcrichton opened PR #2897 from new-api to main:
This PR is an implementation of RFC 11 which is a redesign of Wasmtime's APIs.
Note: The RFC is not merged at this time, so this PR is purely meant for informational purposes and review. This should not merge until the RFC itself has been approved and merged.
I've gotten this impelmentation to the point now where it should be passing all in-tree tests. I think the performance is suitable for performance testing as well to ensure that it doesn't regress anything. I'd like to ideally get some review on this so by the time the RFC merges it's ready to go, and I'm also curious for design feedback which may influence the RFC itself as well.
I don't recommend paying attention to the commits themselves. They're all a mess and 95% of the changes are in the first commit. I plan on squashing everything and making a better commit message before merging.
Currently the only remaining TODO item is documentation. I have not yet update the API documentation of the
wasmtimecrate or the C API, and a lot of love needs to be poured into there. I would consider this a blocker before merging, I'm just working on the actual implementation itself first (Python/Go bindings are up next). I would also like to put more work into documenting some internal design decisions in inline comments, currently I don't think there's a sufficient amount. I plan on doing this with the general documentation pass I'll be doing later this week.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
@fitzgen this is something I added here I mentioned on chat, basically the location that globals of type
externrefget their reference count decremented.
alexcrichton created PR review comment:
@peterhuene you might be interested in this. I inserted the call to
drop_globalsabove to fix an issue where when an instance was dropped itsexternrefglobals weren't deallocated (this was a problem with the on-demand allocator as well).In doing so, though, I realized the pooling allocator was attempting to double-drop them. This is fine because the "drop" zeros out the memory so next time around it looks like a
nullexternref, but I figured it was best to fix that. Here I cleared themodule/offsetsfield back to their original values when the instance was first initialized. I think that'll help prevent accidental leaks of data between instances or trying to interpret an instance as something it's not. Additionally this helps reclaim the memory used byArc<Module>if this was the last reference to a module on the system.
alexcrichton created PR review comment:
This was something that I caught in the
Staticvariant below and applied here. This previously prevented memories from growing to 65536 pages (exactly 4gb), but now this allows it. I've added a test that allocates this many pages to ensure this continues to work, but rereading the spec I believe this is allowed by the spec.
alexcrichton created PR review comment:
Technically the
modulecan be inferred fromInstanceHandle, but having a separate argument here as well made the implementations below much easier. The problem was that acquiring themodulefromInstanceHandlelooked like we borrowed the entirety ofInstanceHandle(since it was acquired through a method), which then prevents getting another mutable borrow on it.For exaple during table initialization we'd want to call
get_tablewhich "borrows" the wholeInstance. Having these be separate borrrows on input basically proves that theArc<Module>is rooted elsewhere on the stack above this. That avoided having toclone()the module temporarily in all the below places.
alexcrichton created PR review comment:
@fitzgen I mentioned this on chat, but this was one of the important removals I made because technically globals should only work by reference so if you pull out an
externrefthe reference count is increased.
alexcrichton created PR review comment:
@fitzgen this is probably worth pointing out here, but with
&mut selfwe are now free from having to worry about recursive gc-in-gc so a lot of the bits and pieces here got simplified.
alexcrichton created PR review comment:
@fitzgen this is the second issue I mentioned on chat, this is the location where the reference count is now implemented relative to before when it was just copied over.
alexcrichton created PR review comment:
This was a major representation change for both memories and tables, the limiter is now passed in to all relevant operations rather than stored alongside each memory/table. This allows exclusive ownership to reside within the
Storeitself and, if necessary, it's passed down for operations. The limiter is also acquired through the newStoretrait in thewasmtime_runtimecrate
alexcrichton created PR review comment:
FWIW this was a semantic change where previously we'd allocate the memory, then realize we shouldn't have, and drop the memory if the limiter disallowed the
newrequest. Now the limit happens before the allocation and we only proceed with alloating if the limiter allows it.
alexcrichton created PR review comment:
I've changed the representation of
Statichere to be more amenable to auto-impl ofSendandSync. I also folded themaximumfield into the length of thebaseslice here.My thinking is that this is faithful to the actual implementation as well where for the lifetime of this
Memoryit does indeed have exclusive access to the memory internally. There are other representations possible, too, which just requireunsafe impl Sendand such that can be implemented too.
alexcrichton created PR review comment:
FWIW before reading this file I'd recommend reading
memory.rsabove. Many of the changes here are similar to changes there, and I've commented the changes above to memories.
alexcrichton created PR review comment:
Note that the async bits here are a little different from before, but I think in a way that makes things a bit clearer. The
async_cxis a "separately owned" structure which is connected (via unsafety) but not lifetimes to the originalcaller. This allows us to pass the entirecallerto the future and let it close over it. The Wasmtime implementation, though, has to be careful that the fields referenced byasync_cxmay be aliased, hence theirUnsafeCellwrappers.
alexcrichton created PR review comment:
This in particular was a very important removal from this crate. This removes the need for
Anywhich removes the need for'staticfromStore, allowing anyTto be used with aStore.The store pointer is learned via the
VMContextand the*mut dyn wasmtime_runtime::Storecontained within now.
alexcrichton created PR review comment:
I removed this in favor of
TypedFunc::new_uncheckedto make its presence a little less prominent.
alexcrichton created PR review comment:
I moved this here from
Store, mostly just for aesthetic reasons, no real motivation other than that. This could arguably move toConfigtoo, but it felt a little more appropriate here. Doesn't really matter one way or another though.
alexcrichton created PR review comment:
I moved this to
tests/all/func.rs
alexcrichton created PR review comment:
This is an interesting bound. This is required by internals (notably the
opaque_send()function). I think it would be possible to make this future conditionallySendif desired based onT(rather than always requiringT: Send), it'll just require some refactoring internally from what the current structure is. I'm not sure that it's all too useful to have that, so I went ahead and just addedT: Sendto the various async methods and if users really want non-Sendfutures with wasmtime we can come back and remove this via other means in the future.
alexcrichton created PR review comment:
Previously this function would have used
with_last_infofrom thetraphandlersmodule, but now it looks up the store from within the providedcaller*mut VMContext. Note that previously the pointer could be null but I've now had it arranged such that this pointer is never null.
alexcrichton created PR review comment:
Note that this doesn't use
AsContextMutso the liftetime can be explicitly closed over in the returned iterator
alexcrichton created PR review comment:
It's worth pointing out that I shifted functionality around here a bit. Now
into_abibakes in thecompatible_with_storecheck as well asinvoketakes/returns "ABI things" instead of raw values. This makes some more captures/closuresCopyin the above_callfunction, but also notably means that theinvokefunction no longer needs a borrow onStoresomewhow, which was much more difficult to provide.Now the borrow is only required on
into_abihere, then the call can happen without an explicit borrow on the store, and then re-ification back into actual values can happen after the call finishes.
alexcrichton created PR review comment:
Note that this doesn't take
impl AsContextso we can tie to the precise lifetime.
alexcrichton created PR review comment:
This was a very satisfying thing to remove. This was originally added to prevent cycles, but now there are no cycles, so we could just delete everything related to this!
alexcrichton created PR review comment:
It's worth pointing out that I haven't implemented the
sub_linkerAPI from the RFC. I'm not sure if that's actually useful to implement or whether someone wants it. The intention was to preserve the functionality where today you can create global functions viaConfigand then each store-basedLinkercan add its own set of functions after-the-fact.If that's used by applications and still desired I can add the
sub_linkerfunctionality here, but I figured it might be good to start with something simpler and only add that later if necessary.
alexcrichton created PR review comment:
Note that this requires
Mutwhere it doesn't actually need it, but I left it here for the same reason asexportsabove.
alexcrichton created PR review comment:
R.e.
Instance::exportsabove, this loop and thiscollect()are what I'd like to avoid in the future. This is an unnecessary allocation that instantiation performs which I think we can get rid of at no extra cost.
alexcrichton created PR review comment:
These are only used for
HostFuncand correspond to the creation of aHostFuncand its deletion.
alexcrichton created PR review comment:
Note that I removed this in favor of a
Store::limiterconfiguration API
alexcrichton created PR review comment:
The movement here was basically just movement of code from the
ModuleRegistryto theGlobalModuleRegistry. These functions should only be accessed during trap creation so the slow locking should be ok, and it means that we didn't need the store during trap creation which was made a few things easier.Data-representation-wise, though, nothing changed. The global registry already had all the information we needed.
alexcrichton created PR review comment:
A double-check on my logic here would be greatly appreciated.
alexcrichton created PR review comment:
This also, I think, keeps the usage of thread locals entirely contained within this file (except for manual management on async yields). One day I hope that we can actually remove the thread locals here entirely.
alexcrichton created PR review comment:
More eyes on this in particular would be appreciated.
alexcrichton created PR review comment:
It's also worth pointing out that
Mutisn't technically needed here, yet. I realized that today, though, if you instantiate something we'll create a separate hash map in the wasmtime crate for all the exports (and their correspondingExternvalue).That feels inefficient and not necessary since the hash map is already stored within
Arc<Module>. I'd like to improve this in the future so instantiation simply records theInstanceIdand nothing else, and then lookup/fetching here actually mutates the store to create a newFunc(or whatever it is) on-demand rather than doing it all up front.This means that in the future mutability may be required, so I went ahead and added it here.
alexcrichton created PR review comment:
And another option of course is some sort of compile-time configuration where the number of bits allocated to the index is a build time variable which specific embeddings could configure. This is so obscure, though, that I'd prefer no one else have to ever worry about this.
alexcrichton created PR review comment:
This is, uh, very verbose. I'm not really sure how best to solve it though
alexcrichton created PR review comment:
(I think that
GuestPtrneeds to be rethought for other reasons too)
alexcrichton created PR review comment:
FWIW I think this is a real open question. I don't know what to do about this really. The way that
GuestPtrworks and how we need itSendandSyncnecessitates thisMutexfor now, but there's no threads here so there's no reason this should have a mutex. I'm hoping we can basically leave this to get improved later, probably by rethinking the wayGuestPtrworks.
alexcrichton created PR review comment:
After writing all this and thinking more about it, I think another possible solution could be:
- In Rust
Stored<T>is 128 bits, 64 for the store andusizefor the index within the store.- In the C API we discard the 64 for the store and only have 64 for the
usize, so the C API still uses 64-bit integers.That would make the C API more unsafe than it is right now, but this change could be implemented backwards compatibly (even ABI-compatibly!). That makes me much less worried about the limits imposed here.
peterhuene requested peterhuene for a review on PR #2897.
alexcrichton requested pchickey for a review on PR #2897.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
fitzgen created PR review comment:
// Data contained is always Send+Sync so these should be safe.
fitzgen submitted PR review.
fitzgen created PR review comment:
I kinda feel like we should remove this method at this point -- or alternatively give it an
Orderingparam -- because we use different orderings in different places, and its a bit subtle and something to be aware of, and so having this other method that makes the choice non-obvious and hidden from you seems a bit sketch.Also, this method is only used twice now: once in an assertion and once in the
strong_countmethod.
fitzgen submitted PR review.
fitzgen created PR review comment:
// Cast from `*mut T` to `*mut dyn Any` here.That's a lot easier than the way I was doing the casting :-p
I swear casting to
dynis the thing I have the most trouble with in Rust at this point.
fitzgen created PR review comment:
Nice that we can use
&mut selfdown here too, I hadn't realized that we could propagate these changes this deep.
fitzgen created PR review comment:
Very nice!!!
fitzgen created PR review comment:
Wait -- is this actually true? What about
Dropimpls that call back into Wasm? I guess they can't since they don't have aStoreContextMut. Nice.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton created PR review comment:
Good point yeah, the wasmtime API of
ExternRef::strong_countnow mentions that it does aSeqCstload for synchronization purposes.
alexcrichton submitted PR review.
alexcrichton updated PR #2897 from new-api to main.
pchickey submitted PR review.
pchickey created PR review comment:
I'm interested to hear more of your thoughts on this when you have some time!
alexcrichton updated PR #2897 from new-api to main.
pchickey submitted PR review.
pchickey created PR review comment:
can we additionally have an
into_inner(self) -> T?
alexcrichton updated PR #2897 from new-api to main.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Ah yeah I'd forgotten to come back and implement this, now implemented though!
alexcrichton updated PR #2897 from new-api to main.
peterhuene submitted PR review.
peterhuene created PR review comment:
Nit: should the multiple
cfginside ofAsyncStatebe promoted to a single one on the type itself?
peterhuene created PR review comment:
I think this comment is now out of date given these changes.
peterhuene created PR review comment:
I'm confused by this call to
drop; would you mind explaining its significance?
peterhuene created PR review comment:
Should we update this test or delete it rather than commenting it out?
peterhuene created PR review comment:
// ever have to worry about the `Send`-ness of Wasmtime. If rustc says
peterhuene created PR review comment:
* cannot be called concurrently. For example, functions which correspond to
peterhuene created PR review comment:
:tada: :tada: :tada:
peterhuene submitted PR review.
peterhuene created PR review comment:
Nit: we can replace this (and below) with
InstanceAllocationStrategy::pooling()for brevity.
peterhuene created PR review comment:
Nit:
// NB it's important that this destructor does not access `self.data`. That is
peterhuene created PR review comment:
At first when I reviewed this change I was concerned over the lack of registering the host func trampoline like we were doing before, but then I reviewed the changes in
store.rsand saw how the host func trampolines are registered now directly with the store :+1:.
peterhuene created PR review comment:
I read through this comment and the logic seems sound regarding the justification of marking our fiber-based futures as
Send:+1:
peterhuene created PR review comment:
Could you help me pinpoint where in the C API the 64-bit requirement is placed on
Stored<T>? I assume it's related towasmtime_extern_unionbut I haven't been able to spot where the size assumption comes into play.That said, this scheme makes sense to me assuming that requirement.
peterhuene created PR review comment:
Thanks for fixing this!
Looking over what I wrote originally, I don't think we should be storing
offsetsinInstancePoolat all.Instead
InstancePool::initializeand this code here that is resetting the offsets withVMOffsets::new(size_of::<*const u8>() as u8, &*self.empty_module)to ensure the current instance offsets match the associated module (empty both to start and when the instance has been deallocated).For
InstancePool::Initializewe'd want to pass in the module and instance limits to set the capacity on the memory/table maps.
peterhuene edited PR review comment.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Oh this was a weird one where
memory.reset_guard_pages()required amutbinding onmemorybut that call only happens with uffd/linux, so other platforms would get a warning about an unusedmut. To fix that I just unconditionally added a line that would requiremut memoryhere regardless of the platform so this would compile everywhere without warnings.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Whoops I completely forgot to come back to this test and update it apparently...
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Ok I talked with @pchickey about this and our conclusion is that eventually
GuestPtrwill probably get phased out entirely. Instead ofGuestPtrwe'll generate the wiggle traits with raw borrowed strings already. This means that theBorrowCheckerhere can live in a mutable location and won't need interior mutability, alleviating the need for thisMutex. That'll happen in a future PR though
alexcrichton submitted PR review.
alexcrichton created PR review comment:
The main requirement is that in the C API I would like to avoid types like
wasmtime_func_tbeing owned types. Instead I'd like them to be POD-like with no destructor. For ergonomics they're also currently passed by-value everywhere in functions, e.g. returningwasmtime_func_tor takingwasmtime_func_t funcor something like that. By-value structs always give me the willies (and I think concretely FFI interfaces like Python'sctypesdoesn't support them) so that limited us to integers. The biggest integer for that was 64-bits, hence the 64-bit restriction.There's other ways to change this though:
- With my idea above the 64-bits in the C API are just index bits, so nothing would be part of the store bits. Then Rust could use 128 bits and C could use 64.
- Another possibility is having the C API use 128-bit integers. This would probably show up as
struct wasmtime_func_t { uint64_t hi; uint64_t low; }instead of a 128-bit integer, and then all APIs would take pointers to the struct (it'd never be returned by value)- Alternatively we could go back to
wasmtime_func_t*and require dtors, but I would personally prefer to avoid this.While I originally felt that it was kinda nice to have 64-bit types in the C API, I may be coming around to the 128-bit-size-struct idea. That keeps safety in both Rust & C (somewhat at least) and I think shouldn't make bindings really that much more difficult. I think the ergonomics of the C bindings will go down a bit but that's not really the end of the world per se.
Do you think it's worth these shenanigans to get 64-bit types? Or should we just go ahead and use 128-bits everywhere?
alexcrichton updated PR #2897 from new-api to main.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Hm so having thought a bit more about this this is the diff of "what if all the types in the wasmtime API were 128-bit values". That actually looks relatively clean to me and makes me think that I should probably just change to that? The Go/Python bindings had no trouble updating to using that idiom either.
This also means that if we ever want to store more stuff in there it shouldn't really be a problem either?
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
peterhuene submitted PR review.
peterhuene created PR review comment:
Ah, that makes complete sense.
peterhuene submitted PR review.
peterhuene submitted PR review.
peterhuene created PR review comment:
I think I would lean towards making that change so that we don't have to place limits on the number of stores that can be created (even if the limit seems reasonable today).
I looked over that commit and it looks good (minor spelling error on "belogns" in a few places, though).
pchickey submitted PR review.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
bnjbvr submitted PR review.
bnjbvr created PR review comment:
Could we set
default-featurestofalsehere, please? Otherwise this results in thedefaultfeature ofwasmtimebeing forcefully enabled (because of cargo resolver's feature unification), and making it impossible to disable it in embeddings that indirectly make use ofwiggle.wasmtime = { path = "../wasmtime", version = "0.27.0", optional = true, default-features = false }
bnjbvr submitted PR review.
bnjbvr created PR review comment:
Hmm this is slightly wrong: should also re-enable manually the
asyncfeature.
bnjbvr edited PR review comment.
bnjbvr deleted PR review comment.
pchickey updated PR #2897 from new-api to main.
pchickey submitted PR review.
pchickey created PR review comment:
Thanks, I missed that.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton updated PR #2897 from new-api to main.
alexcrichton merged PR #2897.
Last updated: Jan 10 2026 at 02:36 UTC