alexcrichton opened PR #2847 from speed-up-table-init
to main
:
This commit optimizes table initialization as part of instance
instantiation and also applies the same optimization to thetable.init
instruction. One part of this commit is to remove some preexisting
duplication between instance instantiation and thetable.init
instruction itself, after this the actual implementation oftable.init
is optimized to effectively have fewer bounds checks in fewer places and
have a much tighter loop for instantiation.A big fallout from this change is that memory/table initializer offsets
are now stored asu32
instead ofusize
to remove a few casts in a
few places. This ended up requiring moving some overflow checks that
happened in parsing to later in code itself because otherwise the wrong
spec test errors are emitted during testing. I've tried to trace where
these can possibly overflow but I think that I managed to get
everything.In a local synthetic test where an empty module with a single 80,000
element initializer this improves total instantiation time by 4x (562us
=> 141us)<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
fitzgen submitted PR Review.
fitzgen submitted PR Review.
fitzgen created PR Review Comment:
Why this change? The
anyfunc_ptr
call should be inlined just fine, unless I'm missing something. This just makes things more open coded, which isn't the direction I'd generally move in.
fitzgen created PR Review Comment:
This seems very tautological as written; is this supposed to be checking
self.element_type
?
fitzgen created PR Review Comment:
Does this get boiled down into a
memcpy
by LLVM? If so, we can close #983. If not, would we be able to do that here?
fitzgen created PR Review Comment:
It seems like this has just one call site, and I don't think anything is preventing us from doing a
memcpy
in theory here.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
This mostly changed because it was previously used in a loop but the pattern it was using was a bit slower than necessary. The main slowdown is that
vmctx_anyfunc
performed a multiplication instruction with the size of a pointer, where both the size and the index were not constant, resulting in bad codegen. By using the native types here it makes pointer arithmetic much speedier since everything is known at compile time.Once I removed one main use of the function in the table initialization bits I figured I might as well go ahead and remove the function and speed up other callsites as well.
I do agree it's a bit more open coded, but then again everything about
VMContext
feels sort of inherently open-coded
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Oh this should actually probably close that issue. We don't actually have anything to
memcpy
though because our element segments are stored as lists of function indices, but the tables themselves are*mut VMCallerCheckedAnyfunc
. This means that there's a bit of a mismatch where unless we allocate table segments per-instance tomemcpy
from (which I dont think we should do) then there's actually nothing to memcpy.Otherwise, though, I just used
set_raw
here to be the same as the other bits and pieces of this module
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Oops, yes!
alexcrichton updated PR #2847 from speed-up-table-init
to main
.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
I went ahead though and made it a bit more raw to help streamline this a tiny bit more
fitzgen submitted PR Review.
fitzgen created PR Review Comment:
Huh, I would really have expected
anyfunc_ptr
to be inlined and then for inst combine to see that one of the operands of the multiplication was a constant and go to town from there.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Oh it is inlined, but that's not the problem here. The function called has a multiplication with
3 * self.pointer_size
, butself.pointer_size
is not a constant value. We could probably improve codegen by having something likeNativeVMOffsets
for when you're specifically not cross-compiling somewhere else but for now LLVM has no way of seeing that the value originally stored there wasmem::size_of::<usize>()
fitzgen submitted PR Review.
fitzgen created PR Review Comment:
Ah, I get it now. Thanks for your patience explaining it to me!
alexcrichton merged PR #2847.
Last updated: Dec 23 2024 at 12:05 UTC