Stream: git-wasmtime

Topic: wasmtime / PR #3733 Implement lazy VMCallerCheckedAnyfunc...


view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2022 at 21:04):

cfallin opened PR #3733 from lazy-anyfuncs to main:

Currently, in the instance initialization path, we build an "anyfunc"
for every imported function and every function in the module. These
anyfuncs are used as function references in tables, both within a module
and across modules (via function imports).

Building all of these is quite wasteful if we never use most of them.
Instead, it would be better to build only the ones that we use.

This commit implements a technique to lazily initialize them: at the
point that a pointer to an anyfunc is taken, we check whether it has
actually been initialized. If it hasn't, we do so. We track whether an
anyfunc has been initialized with a separate bitmap, and we only need to
zero this bitmap at instantiation time. (This is important for
performance too: even just zeroing a large array of anyfunc structs can
be expensive, at the microsecond scale, for a module with thousands of
functions.)

Keeping the information around that is needed for this lazy init
required a bit of refactoring in the InstanceAllocationRequest; a few
fields now live in an Arc held by the instance while it exists.

This was split out from #3699 (I'll rebase it out of that PR next).

The benign-race trickery to make both instantiation and the access-time check
fast is a little hairy, but I left a long comment that tries to explain why
it works. More eyeballs on this (and/or thoughts on if there is a better
way to do this in semi-idiomatic Rust -- UnsafeCell? something else?)
would be welcome!

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2022 at 21:04):

cfallin requested alexcrichton for a review on PR #3733.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2022 at 21:06):

cfallin updated PR #3733 from lazy-anyfuncs to main.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2022 at 21:13):

cfallin edited PR #3733 from lazy-anyfuncs to main:

Currently, in the instance initialization path, we build an "anyfunc"
for every imported function and every function in the module. These
anyfuncs are used as function references in tables, both within a module
and across modules (via function imports).

Building all of these is quite wasteful if we never use most of them.
Instead, it would be better to build only the ones that we use.

This commit implements a technique to lazily initialize them: at the
point that a pointer to an anyfunc is taken, we check whether it has
actually been initialized. If it hasn't, we do so. We track whether an
anyfunc has been initialized with a separate bitmap, and we only need to
zero this bitmap at instantiation time. (This is important for
performance too: even just zeroing a large array of anyfunc structs can
be expensive, at the microsecond scale, for a module with thousands of
functions.)

Keeping the information around that is needed for this lazy init
required a bit of refactoring in the InstanceAllocationRequest; a few
fields now live in an Arc held by the instance while it exists.

This was split out from #3697 (I'll rebase it out of that PR next).

The benign-race trickery to make both instantiation and the access-time check
fast is a little hairy, but I left a long comment that tries to explain why
it works. More eyeballs on this (and/or thoughts on if there is a better
way to do this in semi-idiomatic Rust -- UnsafeCell? something else?)
would be welcome!

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2022 at 21:25):

cfallin updated PR #3733 from lazy-anyfuncs to main.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 27 2022 at 22:13):

cfallin updated PR #3733 from lazy-anyfuncs to main.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 28 2022 at 02:49):

cfallin updated PR #3733 from lazy-anyfuncs to main.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 29 2022 at 02:25):

cfallin updated PR #3733 from lazy-anyfuncs to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 03 2022 at 07:41):

cfallin updated PR #3733 from lazy-anyfuncs to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 03 2022 at 07:56):

cfallin edited PR #3733 from lazy-anyfuncs to main:

During instance initialization, we build two sorts of arrays eagerly:

Most instances will not touch (via call_indirect or table.get) all
funcref table elements. And most anyfuncs will never be referenced,
because most functions are never placed in tables or used with
ref.func. Thus, both of these initialization tasks are quite wasteful.
Profiling shows that a significant fraction of the remaining
instance-initialization time after our other recent optimizations is
going into these two tasks.

This PR implements two basic ideas:

The use of all-zeroes to mean "uninitialized" means that we can use fast
memory clearing techniques, like madvise(DONTNEED) on Linux or just
freshly-mmap'd anonymous memory, to get to the initial state without
a lot of memory writes.

Funcref tables are a little tricky because funcrefs can be null. We need
to distinguish "element was initially non-null, but user stored explicit
null later" from "element never touched" (ie the lazy init should not
blow away an explicitly stored null). We solve this by stealing the LSB
from every funcref (anyfunc pointer): when the LSB is set, the funcref
is initialized and we don't hit the lazy-init slowpath. We insert the
bit on storing to the table and mask it off after loading.

Performance effect on instantiation in the on-demand allocator (pooling
allocator effect should be similar as the table-init path is the same):

sequential/default/spidermonkey.wasm
                        time:   [71.886 us 72.012 us 72.133 us]

sequential/default/spidermonkey.wasm
                        time:   [22.243 us 22.256 us 22.270 us]
                        change: [-69.117% -69.060% -69.000%] (p = 0.00 < 0.05)
                        Performance has improved.

So, 72µs to 22µs, or a 69% reduction.


Last updated: Oct 23 2024 at 20:03 UTC