alexcrichton opened issue #3230:
Currently today loading a precompiled module from disk or some other location is actually a relatively slow task in Wasmtime. While much faster than compiling a module, we can do much better! I'm going to open this meta-issue for those interested in tracking some progress I'm doing here. Over the next week or so I'm going to open PRs which migrate Wasmtime towards a more "mmap-and-go" approach where all we need to do to load a module is to mmap it into memory, no copies necessary.
Currently as-is of the time of this opening we're pretty far away from this world. On my quick-and-dirty branch to implement this I'm seeing a roughly 10x improvement in load times for precompiled modules. While this doesn't have every possible improvement, it's a lot further than where we are today!
My general goal for where I'm going to go with this refactoring is:
- Decode far less data with
bincode
itself- Most data is usable as-is from the on-disk format, no deserialization necessary
- Heavy usage of
object
as a crate and the ELF file format. While not strictly necessary theobject
crate has lots of nice utilities and I think this also makes us more amenable in some possible future to generate raw object files usable for linking. Additionally I think it's extremely useful to be able to inspect the raw output of compliation with standard tools likeobjdump
, which cannot be done today.- Far fewer copies of data between places. The main goal is that there should be one "source of truth" for a module which is the only location a module's data resides in.
Lots more details will be apparent as I open PRs and we debate the finer points, of course. For those intersted in the existence of PRs feel free to subscribe to this issue and I'll post individual PRs here, and that way you don't have to get all the review noise necessarily.
cc @fitzgen, @peterhuene, @cfallin
alexcrichton commented on issue #3230:
https://github.com/bytecodealliance/wasmtime/pull/3231 is the first PR down this road.
alexcrichton commented on issue #3230:
https://github.com/bytecodealliance/wasmtime/pull/3235 is the next
alexcrichton commented on issue #3230:
https://github.com/bytecodealliance/wasmtime/pull/3236 is removing most of the need to parse symbol names at runtime
alexcrichton commented on issue #3230:
https://github.com/bytecodealliance/wasmtime/pull/3239 is converting compliation artifacts to purely a list of bytes
alexcrichton commented on issue #3230:
https://github.com/bytecodealliance/wasmtime/pull/3240 is a custom encoding of address maps into a section of the final image
alexcrichton commented on issue #3230:
https://github.com/bytecodealliance/wasmtime/pull/3241 is moving trap information into an image section
alexcrichton commented on issue #3230:
https://github.com/bytecodealliance/wasmtime/pull/3246 is reducing dependencies on
finished_functions
, a map we build at load-timehttps://github.com/bytecodealliance/wasmtime/pull/3247 is in the same vein
alexcrichton commented on issue #3230:
https://github.com/bytecodealliance/wasmtime/pull/3253 is the replacement for https://github.com/bytecodealliance/wasmtime/pull/3236, removing symbol name parsing in
CodeMemory
alexcrichton commented on issue #3230:
https://github.com/bytecodealliance/wasmtime/pull/3254 is the removal of relocations from the final image, and also optimizing all existing wasm modules with intra-module function calls to use more predictable
call
instructions which CPUs should optimize better.
alexcrichton commented on issue #3230:
https://github.com/bytecodealliance/wasmtime/pull/3257 changes the serialization format to be mmap-friendly and also a native ELF file
alexcrichton assigned issue #3230:
Currently today loading a precompiled module from disk or some other location is actually a relatively slow task in Wasmtime. While much faster than compiling a module, we can do much better! I'm going to open this meta-issue for those interested in tracking some progress I'm doing here. Over the next week or so I'm going to open PRs which migrate Wasmtime towards a more "mmap-and-go" approach where all we need to do to load a module is to mmap it into memory, no copies necessary.
Currently as-is of the time of this opening we're pretty far away from this world. On my quick-and-dirty branch to implement this I'm seeing a roughly 10x improvement in load times for precompiled modules. While this doesn't have every possible improvement, it's a lot further than where we are today!
My general goal for where I'm going to go with this refactoring is:
- Decode far less data with
bincode
itself- Most data is usable as-is from the on-disk format, no deserialization necessary
- Heavy usage of
object
as a crate and the ELF file format. While not strictly necessary theobject
crate has lots of nice utilities and I think this also makes us more amenable in some possible future to generate raw object files usable for linking. Additionally I think it's extremely useful to be able to inspect the raw output of compliation with standard tools likeobjdump
, which cannot be done today.- Far fewer copies of data between places. The main goal is that there should be one "source of truth" for a module which is the only location a module's data resides in.
Lots more details will be apparent as I open PRs and we debate the finer points, of course. For those intersted in the existence of PRs feel free to subscribe to this issue and I'll post individual PRs here, and that way you don't have to get all the review noise necessarily.
cc @fitzgen, @peterhuene, @cfallin
alexcrichton commented on issue #3230:
https://github.com/bytecodealliance/wasmtime/pull/3265 is the final substantive step which removes the copy we have from an ELF image into executable memory, instead just using the executable memory as-is in the ELF image.
alexcrichton commented on issue #3230:
https://github.com/bytecodealliance/wasmtime/pull/3266 is the actual final addition, a
deserialize_file
methodAfter everything lands we should have gone from ~300ms to load a module I was looking at to ~1ms, yay!
alexcrichton commented on issue #3230:
https://github.com/bytecodealliance/wasmtime/pull/3275 is take 2 of #3254
alexcrichton commented on issue #3230:
This is a table of the speedups we'll be getting once #3275 lands. Each row is a different wasm (found via the internet) file compiled by Wasmtime either on AArch64 or x86_64. The v0.29.0 numbers are measured using that commit deserializing a slice of bytes, and then there's two columns for after #3275: one for deserializing from memory (as before) and another for deserializing from a file (which isn't implemented in v0.29.0 but is on
main
).The tl;dr; is that we're around 100x faster than before, yay! Given that I'm happy with where this is at, so I'm going to close this, and I'll take care of final bits and pieces on #3275.
<img width="695" alt="Screen Shot 2021-09-01 at 1 03 16 PM" src="https://user-images.githubusercontent.com/64996/131721185-993478c0-4fb4-4e3f-a599-aad51b9ac4a4.png">
alexcrichton closed issue #3230 (assigned to alexcrichton):
Currently today loading a precompiled module from disk or some other location is actually a relatively slow task in Wasmtime. While much faster than compiling a module, we can do much better! I'm going to open this meta-issue for those interested in tracking some progress I'm doing here. Over the next week or so I'm going to open PRs which migrate Wasmtime towards a more "mmap-and-go" approach where all we need to do to load a module is to mmap it into memory, no copies necessary.
Currently as-is of the time of this opening we're pretty far away from this world. On my quick-and-dirty branch to implement this I'm seeing a roughly 10x improvement in load times for precompiled modules. While this doesn't have every possible improvement, it's a lot further than where we are today!
My general goal for where I'm going to go with this refactoring is:
- Decode far less data with
bincode
itself- Most data is usable as-is from the on-disk format, no deserialization necessary
- Heavy usage of
object
as a crate and the ELF file format. While not strictly necessary theobject
crate has lots of nice utilities and I think this also makes us more amenable in some possible future to generate raw object files usable for linking. Additionally I think it's extremely useful to be able to inspect the raw output of compliation with standard tools likeobjdump
, which cannot be done today.- Far fewer copies of data between places. The main goal is that there should be one "source of truth" for a module which is the only location a module's data resides in.
Lots more details will be apparent as I open PRs and we debate the finer points, of course. For those intersted in the existence of PRs feel free to subscribe to this issue and I'll post individual PRs here, and that way you don't have to get all the review noise necessarily.
cc @fitzgen, @peterhuene, @cfallin
Last updated: Jan 24 2025 at 00:11 UTC