Stream: git-wasmtime

Topic: wasmtime / PR #3257 Use an mmap-friendly serialization fo...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 27 2021 at 15:17):

alexcrichton opened PR #3257 from mmap-vec to main:

This commit reimplements the main serialization format for Wasmtime's
precompiled artifacts. Previously they were generally a binary blob of
bincode-encoded metadata prefixed with some versioning information.
The downside of this format, though, is that loading a precompiled
artifact required pushing all information through bincode. This is
inefficient when some data, such as trap/address tables, are rarely
accessed.

The new format added in this commit is one which is designed to be
mmap-friendly. This means that the relevant parts of the precompiled
artifact are already page-aligned for updating permissions of pieces
here and there. Additionally the artifact is optimized so that if data
is rarely read then we can delay reading it until necessary.

The new artifact format for serialized modules is an ELF file. This is
not a public API guarantee, so it cannot be relied upon. In the meantime
though this is quite useful for exploring precompiled modules with
standard tooling like objdump. The ELF file is already constructed as
part of module compilation, and this is the main contents of the
serialized artifact.

THere is some extra information, though, not encoded in each module's
individual ELF file such as type information. This information continues
to be bincode-encoded, but it's intended to be much smaller and much
faster to deserialize. This extra information is appended to the end of
the ELF file. This means that the original ELF file is still a valid ELF
file, we just get to have extra bits at the end. More information on the
new format can be found in the module docs of the serialization module
of Wasmtime.

Another refatoring implemented as part of this commit is to deserialize
and store object files directly in mmap-backed storage. This avoids
the need to copy bytes after the artifact is loaded into memory for each
compiled module, and in a future commit it opens up the door to avoiding
copying the text section into a CodeMemory. For now, though, the main
change is that copies are not necessary when loading from a precompiled
compilation artifact once the artifact is itself in mmap-based memory.

To assist with managing mmap-based memory a new MmapVec type was
added to wasmtime_jit which acts as a form of Vec<T> backed by a
wasmtime_runtime::Mmap. This type notably supports drain(..N) to
slice the buffer into disjoint regions that are all separately owned,
such as having a separately owned window into one artifact for all
object files contained within.

Finally this commit implements a small refactoring in wasmtime-cache
to use the standard artifact format for cache entries rather than a
bincode-encoded version. This required some more hooks for
serializing/deserializing but otherwise the crate still performs as
before.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 27 2021 at 15:17):

alexcrichton requested peterhuene for a review on PR #3257.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 27 2021 at 15:19):

alexcrichton edited PR #3257 from mmap-vec to main:

This commit reimplements the main serialization format for Wasmtime's
precompiled artifacts. Previously they were generally a binary blob of
bincode-encoded metadata prefixed with some versioning information.
The downside of this format, though, is that loading a precompiled
artifact required pushing all information through bincode. This is
inefficient when some data, such as trap/address tables, are rarely
accessed.

The new format added in this commit is one which is designed to be
mmap-friendly. This means that the relevant parts of the precompiled
artifact are already page-aligned for updating permissions of pieces
here and there. Additionally the artifact is optimized so that if data
is rarely read then we can delay reading it until necessary.

The new artifact format for serialized modules is an ELF file. This is
not a public API guarantee, so it cannot be relied upon. In the meantime
though this is quite useful for exploring precompiled modules with
standard tooling like objdump. The ELF file is already constructed as
part of module compilation, and this is the main contents of the
serialized artifact.

THere is some extra information, though, not encoded in each module's
individual ELF file such as type information. This information continues
to be bincode-encoded, but it's intended to be much smaller and much
faster to deserialize. This extra information is appended to the end of
the ELF file. This means that the original ELF file is still a valid ELF
file, we just get to have extra bits at the end. More information on the
new format can be found in the module docs of the serialization module
of Wasmtime.

Another refatoring implemented as part of this commit is to deserialize
and store object files directly in mmap-backed storage. This avoids
the need to copy bytes after the artifact is loaded into memory for each
compiled module, and in a future commit it opens up the door to avoiding
copying the text section into a CodeMemory. For now, though, the main
change is that copies are not necessary when loading from a precompiled
compilation artifact once the artifact is itself in mmap-based memory.

To assist with managing mmap-based memory a new MmapVec type was
added to wasmtime_jit which acts as a form of Vec<T> backed by a
wasmtime_runtime::Mmap. This type notably supports drain(..N) to
slice the buffer into disjoint regions that are all separately owned,
such as having a separately owned window into one artifact for all
object files contained within.

Finally this commit implements a small refactoring in wasmtime-cache
to use the standard artifact format for cache entries rather than a
bincode-encoded version. This required some more hooks for
serializing/deserializing but otherwise the crate still performs as
before.

cc #3230

view this post on Zulip Wasmtime GitHub notifications bot (Aug 28 2021 at 05:33):

peterhuene submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 28 2021 at 05:33):

peterhuene submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 28 2021 at 05:33):

peterhuene created PR review comment:

Clippy nit:

        Ok(ret)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 28 2021 at 05:33):

peterhuene created PR review comment:

    /// Gets cached data if state matches, otherwise calls `compute`.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 28 2021 at 05:33):

peterhuene created PR review comment:

    /// This `MmapVec` will shrink by `range.end` bytes, and it will only refer

view this post on Zulip Wasmtime GitHub notifications bot (Aug 30 2021 at 14:18):

alexcrichton updated PR #3257 from mmap-vec to main.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 30 2021 at 14:19):

alexcrichton merged PR #3257.


Last updated: Jan 24 2025 at 00:11 UTC