alexcrichton opened PR #3319 from optimize-call-take-two
to main
:
This commit is an alternative to #3298 which achieves effectively the
same goal of optimizing theFunc::call
API as well as its C API
sibling ofwasmtime_func_call
. The strategy taken here is different
than #3298 though where a new API isn't created, rather a small tweak to
an existing API is done. Specifically this commit handles the major
sources of slowness withFunc::call
with:
Looking up the type of a function, to typecheck the arguments with and
use to guide how the results should be loaded, no longer hits the
rwlock in theEngine
but instead eachFunc
contains its own
FuncType
. This can be an unnecessary allocation for funcs not used
withFunc::call
, so this is a downside of this implementation
relative to #3298. A mitigating factor, though, is that instance
exports are loaded lazily into theStore
and in theory not too many
funcs are active in the store asFunc
objects.Temporary storage is amortized with a long-lived
Vec
in theStore
rather than allocating a new vector on each call. This is basically
the same strategy as #3294 only applied to different types in
different places. Specificallywasmtime::Store
now retains a
Vec<u128>
forFunc::call
, and the C API retains aVec<Val>
for
callingFunc::call
.Finally, an API breaking change is made to
Func::call
and its type
signature (as well asFunc::call_async
). Instead of returning
Box<[Val]>
as it did before this function now takes a
results: &mut [Val]
parameter. This allows the caller to manage the
allocation and we can amortize-remove it inwasmtime_func_call
by
using space after the parameters in theVec<Val>
we're passing in.
This change is naturally a breaking change and we'll want to consider
it carefully, but mitigating factors are that most embeddings are
likely usingTypedFunc::call
instead and this signature taking a
mutable slice better aligns withFunc::new
which receives a mutable
slice for the results.Overall this change, in the benchmark of "call a nop function from the C
API" is not quite as good as #3298. It's still a bit slower, on the
order of 15ns, because there's lots of capacity checks around vectors
and the type checks are slightly less optimized than before. Overall
though this is still significantly better than today because allocations
and the rwlock to acquire the type information are both avoided. I
personally feel that this change is the best to do because it has less
of an API impact than #3298.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
alexcrichton updated PR #3319 from optimize-call-take-two
to main
.
alexcrichton updated PR #3319 from optimize-call-take-two
to main
.
alexcrichton requested peterhuene for a review on PR #3319.
peterhuene submitted PR review.
peterhuene submitted PR review.
peterhuene created PR review comment:
Can we not use the
wasm_val_storage
here, instead of a new allocation, if we exposed the store data via the associatedStoreRef
?
alexcrichton submitted PR review.
alexcrichton created PR review comment:
We could yeah. The reason I didn't do it here is that this is part of the
wasm.h
C API which uses different store data. We could update that too but I figured I didn't want to worry much about this and thought I could deal with it later if necessary. Given the later*_unchecked
APIs I ended up adding I think that no one's gonna use these for the most performance hooks anyway.Do you agree it's alright to leave this for some future refactoring, if necessary?
peterhuene submitted PR review.
peterhuene created PR review comment:
Definitely :+1:
alexcrichton merged PR #3319.
Last updated: Nov 22 2024 at 16:03 UTC