wasmtime / issue #3230 Loading precompiled modules is slo... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #3230 Loading precompiled modules is slo...

Wasmtime GitHub notifications bot (Aug 23 2021 at 20:13):

alexcrichton opened issue #3230:

Currently today loading a precompiled module from disk or some other location is actually a relatively slow task in Wasmtime. While much faster than compiling a module, we can do much better! I'm going to open this meta-issue for those interested in tracking some progress I'm doing here. Over the next week or so I'm going to open PRs which migrate Wasmtime towards a more "mmap-and-go" approach where all we need to do to load a module is to mmap it into memory, no copies necessary.

Currently as-is of the time of this opening we're pretty far away from this world. On my quick-and-dirty branch to implement this I'm seeing a roughly 10x improvement in load times for precompiled modules. While this doesn't have every possible improvement, it's a lot further than where we are today!

My general goal for where I'm going to go with this refactoring is:

Decode far less data with bincode itself

Most data is usable as-is from the on-disk format, no deserialization necessary

Heavy usage of object as a crate and the ELF file format. While not strictly necessary the object crate has lots of nice utilities and I think this also makes us more amenable in some possible future to generate raw object files usable for linking. Additionally I think it's extremely useful to be able to inspect the raw output of compliation with standard tools like objdump, which cannot be done today.

Far fewer copies of data between places. The main goal is that there should be one "source of truth" for a module which is the only location a module's data resides in.

Lots more details will be apparent as I open PRs and we debate the finer points, of course. For those intersted in the existence of PRs feel free to subscribe to this issue and I'll post individual PRs here, and that way you don't have to get all the review noise necessarily.

cc @fitzgen, @peterhuene, @cfallin

Wasmtime GitHub notifications bot (Aug 23 2021 at 22:38):

alexcrichton commented on issue #3230:

https://github.com/bytecodealliance/wasmtime/pull/3231 is the first PR down this road.

Wasmtime GitHub notifications bot (Aug 24 2021 at 14:03):

alexcrichton commented on issue #3230:

https://github.com/bytecodealliance/wasmtime/pull/3235 is the next

Wasmtime GitHub notifications bot (Aug 24 2021 at 15:20):

alexcrichton commented on issue #3230:

https://github.com/bytecodealliance/wasmtime/pull/3236 is removing most of the need to parse symbol names at runtime

Wasmtime GitHub notifications bot (Aug 25 2021 at 14:06):

alexcrichton commented on issue #3230:

https://github.com/bytecodealliance/wasmtime/pull/3239 is converting compliation artifacts to purely a list of bytes

Wasmtime GitHub notifications bot (Aug 25 2021 at 15:02):

alexcrichton commented on issue #3230:

https://github.com/bytecodealliance/wasmtime/pull/3240 is a custom encoding of address maps into a section of the final image

Wasmtime GitHub notifications bot (Aug 25 2021 at 15:25):

alexcrichton commented on issue #3230:

https://github.com/bytecodealliance/wasmtime/pull/3241 is moving trap information into an image section

Wasmtime GitHub notifications bot (Aug 26 2021 at 14:19):

alexcrichton commented on issue #3230:

https://github.com/bytecodealliance/wasmtime/pull/3246 is reducing dependencies on finished_functions, a map we build at load-time

https://github.com/bytecodealliance/wasmtime/pull/3247 is in the same vein

Wasmtime GitHub notifications bot (Aug 26 2021 at 21:51):

alexcrichton commented on issue #3230:

https://github.com/bytecodealliance/wasmtime/pull/3253 is the replacement for https://github.com/bytecodealliance/wasmtime/pull/3236, removing symbol name parsing in CodeMemory

Wasmtime GitHub notifications bot (Aug 26 2021 at 22:14):

alexcrichton commented on issue #3230:

https://github.com/bytecodealliance/wasmtime/pull/3254 is the removal of relocations from the final image, and also optimizing all existing wasm modules with intra-module function calls to use more predictable call instructions which CPUs should optimize better.

Wasmtime GitHub notifications bot (Aug 27 2021 at 15:19):

alexcrichton commented on issue #3230:

https://github.com/bytecodealliance/wasmtime/pull/3257 changes the serialization format to be mmap-friendly and also a native ELF file

Wasmtime GitHub notifications bot (Aug 30 2021 at 15:22):

alexcrichton assigned issue #3230:

Currently today loading a precompiled module from disk or some other location is actually a relatively slow task in Wasmtime. While much faster than compiling a module, we can do much better! I'm going to open this meta-issue for those interested in tracking some progress I'm doing here. Over the next week or so I'm going to open PRs which migrate Wasmtime towards a more "mmap-and-go" approach where all we need to do to load a module is to mmap it into memory, no copies necessary.

Currently as-is of the time of this opening we're pretty far away from this world. On my quick-and-dirty branch to implement this I'm seeing a roughly 10x improvement in load times for precompiled modules. While this doesn't have every possible improvement, it's a lot further than where we are today!

My general goal for where I'm going to go with this refactoring is:

Decode far less data with bincode itself

Most data is usable as-is from the on-disk format, no deserialization necessary

Heavy usage of object as a crate and the ELF file format. While not strictly necessary the object crate has lots of nice utilities and I think this also makes us more amenable in some possible future to generate raw object files usable for linking. Additionally I think it's extremely useful to be able to inspect the raw output of compliation with standard tools like objdump, which cannot be done today.

Far fewer copies of data between places. The main goal is that there should be one "source of truth" for a module which is the only location a module's data resides in.

Lots more details will be apparent as I open PRs and we debate the finer points, of course. For those intersted in the existence of PRs feel free to subscribe to this issue and I'll post individual PRs here, and that way you don't have to get all the review noise necessarily.

cc @fitzgen, @peterhuene, @cfallin

Wasmtime GitHub notifications bot (Aug 30 2021 at 15:36):

alexcrichton commented on issue #3230:

https://github.com/bytecodealliance/wasmtime/pull/3265 is the final substantive step which removes the copy we have from an ELF image into executable memory, instead just using the executable memory as-is in the ELF image.

Wasmtime GitHub notifications bot (Aug 30 2021 at 15:57):

alexcrichton commented on issue #3230:

https://github.com/bytecodealliance/wasmtime/pull/3266 is the actual final addition, a deserialize_file method

After everything lands we should have gone from ~300ms to load a module I was looking at to ~1ms, yay!

Wasmtime GitHub notifications bot (Sep 01 2021 at 16:06):

alexcrichton commented on issue #3230:

https://github.com/bytecodealliance/wasmtime/pull/3275 is take 2 of #3254

Wasmtime GitHub notifications bot (Sep 01 2021 at 18:06):

alexcrichton commented on issue #3230:

This is a table of the speedups we'll be getting once #3275 lands. Each row is a different wasm (found via the internet) file compiled by Wasmtime either on AArch64 or x86_64. The v0.29.0 numbers are measured using that commit deserializing a slice of bytes, and then there's two columns for after #3275: one for deserializing from memory (as before) and another for deserializing from a file (which isn't implemented in v0.29.0 but is on main).

The tl;dr; is that we're around 100x faster than before, yay! Given that I'm happy with where this is at, so I'm going to close this, and I'll take care of final bits and pieces on #3275.

<img width="695" alt="Screen Shot 2021-09-01 at 1 03 16 PM" src="https://user-images.githubusercontent.com/64996/131721185-993478c0-4fb4-4e3f-a599-aad51b9ac4a4.png">

Wasmtime GitHub notifications bot (Sep 01 2021 at 18:06):

alexcrichton closed issue #3230 (assigned to alexcrichton):

Currently today loading a precompiled module from disk or some other location is actually a relatively slow task in Wasmtime. While much faster than compiling a module, we can do much better! I'm going to open this meta-issue for those interested in tracking some progress I'm doing here. Over the next week or so I'm going to open PRs which migrate Wasmtime towards a more "mmap-and-go" approach where all we need to do to load a module is to mmap it into memory, no copies necessary.

Currently as-is of the time of this opening we're pretty far away from this world. On my quick-and-dirty branch to implement this I'm seeing a roughly 10x improvement in load times for precompiled modules. While this doesn't have every possible improvement, it's a lot further than where we are today!

My general goal for where I'm going to go with this refactoring is:

Decode far less data with bincode itself

Most data is usable as-is from the on-disk format, no deserialization necessary

Heavy usage of object as a crate and the ELF file format. While not strictly necessary the object crate has lots of nice utilities and I think this also makes us more amenable in some possible future to generate raw object files usable for linking. Additionally I think it's extremely useful to be able to inspect the raw output of compliation with standard tools like objdump, which cannot be done today.

Far fewer copies of data between places. The main goal is that there should be one "source of truth" for a module which is the only location a module's data resides in.

Lots more details will be apparent as I open PRs and we debate the finer points, of course. For those intersted in the existence of PRs feel free to subscribe to this issue and I'll post individual PRs here, and that way you don't have to get all the review noise necessarily.

cc @fitzgen, @peterhuene, @cfallin

Last updated: Apr 17 2025 at 07:03 UTC