alexcrichton opened PR #3257 from mmap-vec
to main
:
This commit reimplements the main serialization format for Wasmtime's
precompiled artifacts. Previously they were generally a binary blob of
bincode
-encoded metadata prefixed with some versioning information.
The downside of this format, though, is that loading a precompiled
artifact required pushing all information throughbincode
. This is
inefficient when some data, such as trap/address tables, are rarely
accessed.The new format added in this commit is one which is designed to be
mmap
-friendly. This means that the relevant parts of the precompiled
artifact are already page-aligned for updating permissions of pieces
here and there. Additionally the artifact is optimized so that if data
is rarely read then we can delay reading it until necessary.The new artifact format for serialized modules is an ELF file. This is
not a public API guarantee, so it cannot be relied upon. In the meantime
though this is quite useful for exploring precompiled modules with
standard tooling likeobjdump
. The ELF file is already constructed as
part of module compilation, and this is the main contents of the
serialized artifact.THere is some extra information, though, not encoded in each module's
individual ELF file such as type information. This information continues
to bebincode
-encoded, but it's intended to be much smaller and much
faster to deserialize. This extra information is appended to the end of
the ELF file. This means that the original ELF file is still a valid ELF
file, we just get to have extra bits at the end. More information on the
new format can be found in the module docs of the serialization module
of Wasmtime.Another refatoring implemented as part of this commit is to deserialize
and store object files directly inmmap
-backed storage. This avoids
the need to copy bytes after the artifact is loaded into memory for each
compiled module, and in a future commit it opens up the door to avoiding
copying the text section into aCodeMemory
. For now, though, the main
change is that copies are not necessary when loading from a precompiled
compilation artifact once the artifact is itself in mmap-based memory.To assist with managing
mmap
-based memory a newMmapVec
type was
added towasmtime_jit
which acts as a form ofVec<T>
backed by a
wasmtime_runtime::Mmap
. This type notably supportsdrain(..N)
to
slice the buffer into disjoint regions that are all separately owned,
such as having a separately owned window into one artifact for all
object files contained within.Finally this commit implements a small refactoring in
wasmtime-cache
to use the standard artifact format for cache entries rather than a
bincode-encoded version. This required some more hooks for
serializing/deserializing but otherwise the crate still performs as
before.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
alexcrichton requested peterhuene for a review on PR #3257.
alexcrichton edited PR #3257 from mmap-vec
to main
:
This commit reimplements the main serialization format for Wasmtime's
precompiled artifacts. Previously they were generally a binary blob of
bincode
-encoded metadata prefixed with some versioning information.
The downside of this format, though, is that loading a precompiled
artifact required pushing all information throughbincode
. This is
inefficient when some data, such as trap/address tables, are rarely
accessed.The new format added in this commit is one which is designed to be
mmap
-friendly. This means that the relevant parts of the precompiled
artifact are already page-aligned for updating permissions of pieces
here and there. Additionally the artifact is optimized so that if data
is rarely read then we can delay reading it until necessary.The new artifact format for serialized modules is an ELF file. This is
not a public API guarantee, so it cannot be relied upon. In the meantime
though this is quite useful for exploring precompiled modules with
standard tooling likeobjdump
. The ELF file is already constructed as
part of module compilation, and this is the main contents of the
serialized artifact.THere is some extra information, though, not encoded in each module's
individual ELF file such as type information. This information continues
to bebincode
-encoded, but it's intended to be much smaller and much
faster to deserialize. This extra information is appended to the end of
the ELF file. This means that the original ELF file is still a valid ELF
file, we just get to have extra bits at the end. More information on the
new format can be found in the module docs of the serialization module
of Wasmtime.Another refatoring implemented as part of this commit is to deserialize
and store object files directly inmmap
-backed storage. This avoids
the need to copy bytes after the artifact is loaded into memory for each
compiled module, and in a future commit it opens up the door to avoiding
copying the text section into aCodeMemory
. For now, though, the main
change is that copies are not necessary when loading from a precompiled
compilation artifact once the artifact is itself in mmap-based memory.To assist with managing
mmap
-based memory a newMmapVec
type was
added towasmtime_jit
which acts as a form ofVec<T>
backed by a
wasmtime_runtime::Mmap
. This type notably supportsdrain(..N)
to
slice the buffer into disjoint regions that are all separately owned,
such as having a separately owned window into one artifact for all
object files contained within.Finally this commit implements a small refactoring in
wasmtime-cache
to use the standard artifact format for cache entries rather than a
bincode-encoded version. This required some more hooks for
serializing/deserializing but otherwise the crate still performs as
before.cc #3230
peterhuene submitted PR review.
peterhuene submitted PR review.
peterhuene created PR review comment:
Clippy nit:
Ok(ret)
peterhuene created PR review comment:
/// Gets cached data if state matches, otherwise calls `compute`.
peterhuene created PR review comment:
/// This `MmapVec` will shrink by `range.end` bytes, and it will only refer
alexcrichton updated PR #3257 from mmap-vec
to main
.
alexcrichton merged PR #3257.
Last updated: Jan 24 2025 at 00:11 UTC