Stream: git-wasmtime

Topic: wasmtime / Issue #2318 Serialized wasmtime modules are ex...


view this post on Zulip Wasmtime GitHub notifications bot (Oct 25 2020 at 21:28):

whitequark opened Issue #2318:

I'm using custom module serialization based caching in YoWASP, primarily to be able to show a message during cold startup (which can take many minutes on slow machines), but also to avoid applying zstd compression that noticeably increases startup latency.

The problem is that the cached modules are extremely large—so large that it causes a near-pathological amount of disk I/O. On Linux, yosys.wasm
(21M wasm file) results in a 290M serialized module, an increase of ~14×. On Windows, for some reason, that very same file results in a ~580M serialized module, an increase of ~28×. The native Yosys binary for x86_64 is 17M; even once you account for libc and libstdc++, it really should not take over half a gigabyte of space.

I'm filing this issue on request of @alexcrichton here.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 25 2020 at 21:28):

whitequark edited Issue #2318:

I'm using custom module serialization based caching in YoWASP, primarily to be able to show a message during cold startup (which can take many minutes on slow machines), but also to avoid applying zstd compression that noticeably increases startup latency.

The problem is that the cached modules are extremely large—so large that it causes a near-pathological amount of disk I/O. On Linux, yosys.wasm (21M wasm file) results in a 290M serialized module, an increase of ~14×. On Windows, for some reason, that very same file results in a ~580M serialized module, an increase of ~28×. The native Yosys binary for x86_64 is 17M; even once you account for libc and libstdc++, it really should not take over half a gigabyte of space.

I'm filing this issue on request of @alexcrichton here.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 25 2020 at 21:28):

whitequark commented on Issue #2318:

Oh and the script that runs it is here. It's extremely simple but maybe I forgot something important.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 25 2020 at 21:29):

whitequark edited Issue #2318:

I'm using custom module serialization based caching in YoWASP, primarily to be able to show a message during cold startup (which can take many minutes on slow machines), but also to avoid applying zstd compression that noticeably increases warm startup latency.

The problem is that the cached modules are extremely large—so large that it causes a near-pathological amount of disk I/O. (On Windows VMs, due to the I/O, warm startup actually becomes slower. On bare metal Windows on SSDs it still gets faster.)

On Linux, yosys.wasm (21M wasm file) results in a 290M serialized module, an increase of ~14×. On Windows, for some reason, that very same file results in a ~580M serialized module, an increase of ~28×. The native Yosys binary for x86_64 is 17M; even once you account for libc and libstdc++, it really should not take over half a gigabyte of space.

I'm filing this issue on request of @alexcrichton here.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 25 2020 at 21:30):

whitequark edited Issue #2318:

I'm using custom module serialization based caching in YoWASP, primarily to be able to show a message during cold startup (which can take many minutes on slow machines), but also to avoid applying zstd compression that noticeably increases warm startup latency.

The problem is that the cached modules are extremely large—so large that it causes a near-pathological amount of disk I/O. (On Windows VMs, due to the I/O, warm startup actually becomes slower compared to wasmtime's own caching. On bare metal Windows on SSDs it still gets faster.)

On Linux, yosys.wasm (21M wasm file) results in a 290M serialized module, an increase of ~14×. On Windows, for some reason, that very same file results in a ~580M serialized module, an increase of ~28×. The native Yosys binary for x86_64 is 17M; even once you account for libc and libstdc++, it really should not take over half a gigabyte of space.

I'm filing this issue on request of @alexcrichton here.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 26 2020 at 00:05):

alexcrichton commented on Issue #2318:

The cache entry size is effectively the bincode-coded size of this structure, which I believe is roughly analagous to the in-memory size of that structure. Adding some simple instrumentation I was able to get the encoded byte size of each of the fields:

module:                  152817
obj:                   56236896
funcs:                242146780
  funcs(traps):                24182940
  funcs(address_map):         217774640
  funcs(stack_maps):             189216
data_initializers:      2880584
unwind_info:            4611040
total:                306028126

clearly funcs is the massive field (too much so), and the worst offenders are the traps and address_maps maps. Neither of these maps really justifies taking up so much space, and longer-term we need to have a more serious refactoring to probably encode these entirely different in a much more compact fashion. The consumer, however, relies on random-access right now (or at least the ability to do a binary search), so it won't be immediately trivial to restructure these data structures. In the meantime https://github.com/bytecodealliance/wasmtime/pull/2321 should be a band-aid for now to make it a bit more reasonable (although still pretty unreasonable).

I've yet to investigate the Windows discrepancy here, that's quite worrisome too!

view this post on Zulip Wasmtime GitHub notifications bot (Oct 26 2020 at 00:06):

alexcrichton commented on Issue #2318:

Oh and after that PR, the breakdown looks like:

module:                  152817
obj:                   56236896
funcs:                 97596780
  funcs(traps):                24182940
  funcs(address_map):          73224640
  funcs(stack_maps):             189216
data_initializers:      2880584
unwind_info:            4611040
total:                161478126

view this post on Zulip Wasmtime GitHub notifications bot (Oct 27 2020 at 03:43):

alexcrichton commented on Issue #2318:

Ok I've got one more linked improvement as well as https://github.com/bytecodealliance/wasmtime/pull/2326 which should account for the discrepancy on Windows.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 29 2020 at 19:51):

alexcrichton commented on Issue #2318:

With current master plus https://github.com/bytecodealliance/wasmtime/pull/2324 the cache entry size of the original wasm module is down to 97MB, and the cache entry sizes should be roughly the same on Linux and Windows. @whitequark for your use cases is that still too large? It's still pretty large having a >4x size increase, but further changes may require more invasive tweaks than the low hanging fruit picked off so far.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 30 2020 at 07:30):

whitequark commented on Issue #2318:

for your use cases is that still too large?

Seems excellent to me. Any chance you can do a point release of wasmtime-py with the updated interpreter? I have to test on a fairly heterogenous set of machines so it'd be much easier if I didn't have to distribute patched wheels as well.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 30 2020 at 19:12):

alexcrichton commented on Issue #2318:

Sure! I'll wait until #2324 is in and then I'll do a release of wasmtime

view this post on Zulip Wasmtime GitHub notifications bot (Oct 30 2020 at 22:15):

fitzgen commented on Issue #2318:

I don't know how easy bincode is to extend, but all the address-keyed func data could probably compress really well if we used deltas instead of absolute values combined with a variable length integer encoding (e.g. LEB128).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 30 2020 at 22:15):

fitzgen commented on Issue #2318:

(and we ensured they were sorted)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2020 at 16:03):

alexcrichton commented on Issue #2318:

@fitzgen turns out bincode has an option for just that!

view this post on Zulip Wasmtime GitHub notifications bot (Nov 05 2020 at 16:00):

alexcrichton commented on Issue #2318:

Ok wasmtime-py 0.21.0 is now published @whitequark if you'd be able to test it out

view this post on Zulip Wasmtime GitHub notifications bot (Nov 05 2020 at 16:01):

whitequark commented on Issue #2318:

I'll take a look over the next few days!

view this post on Zulip Wasmtime GitHub notifications bot (Nov 05 2020 at 16:04):

whitequark commented on Issue #2318:

Were there any breaking changes? It's a bit unfortunate that I'll have to rebuild all the packages as that makes it a bit harder to get other people to compare several different versions.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 05 2020 at 16:19):

whitequark commented on Issue #2318:

I've rebuilt yowasp-yosys and yowasp-nextpnr-ice40 with wasmtime 0.21.0 on my Linux machine.

* Yosys: 291M→94M cache file size improvement, ~2x warm startup latency improvement (wow!!);
* nextpnr-ice40: 23M→6.9M cache file size improvement, warm startup latency changes within noise threshold.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 05 2020 at 16:29):

alexcrichton commented on Issue #2318:

Right now our release process for wasmtime is just "major bump everything", and we're then trying to keep the external packages like wasmtime-py and wasmtime-go in sync with the main repo. We should probably allocate space to have minor version bumps of everything as well though...

Glad to hear the results as well!

view this post on Zulip Wasmtime GitHub notifications bot (Nov 05 2020 at 16:31):

whitequark commented on Issue #2318:

Ah ok if it's in sync with the main repo then it might actually be pretty handy. I wasn't sure exactly how it worked.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 05 2020 at 22:53):

whitequark commented on Issue #2318:

@alexcrichton I've tried running yowasp-yosys and yowasp-nextpnr-ice40 on the cheapest and slowest laptop I found nearby, and it's only something like 2× slower on both cold and warm startup than my XPS 13. I have no idea how you managed to achieve that, and I'm quite impressed. As far as I'm concerned this is enough performance.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 05 2020 at 22:53):

whitequark closed Issue #2318:

I'm using custom module serialization based caching in YoWASP, primarily to be able to show a message during cold startup (which can take many minutes on slow machines), but also to avoid applying zstd compression that noticeably increases warm startup latency.

The problem is that the cached modules are extremely large—so large that it causes a near-pathological amount of disk I/O. (On Windows VMs, due to the I/O, warm startup actually becomes slower compared to wasmtime's own caching. On bare metal Windows on SSDs it still gets faster.)

On Linux, yosys.wasm (21M wasm file) results in a 290M serialized module, an increase of ~14×. On Windows, for some reason, that very same file results in a ~580M serialized module, an increase of ~28×. The native Yosys binary for x86_64 is 17M; even once you account for libc and libstdc++, it really should not take over half a gigabyte of space.

I'm filing this issue on request of @alexcrichton here.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 05 2020 at 23:10):

alexcrichton commented on Issue #2318:

Nice! Thanks again for the report and helping to investigate!


Last updated: Oct 23 2024 at 20:03 UTC