whitequark opened Issue #2318:
I'm using custom module serialization based caching in YoWASP, primarily to be able to show a message during cold startup (which can take many minutes on slow machines), but also to avoid applying zstd compression that noticeably increases startup latency.
The problem is that the cached modules are extremely large—so large that it causes a near-pathological amount of disk I/O. On Linux, yosys.wasm
(21M wasm file) results in a 290M serialized module, an increase of ~14×. On Windows, for some reason, that very same file results in a ~580M serialized module, an increase of ~28×. The native Yosys binary for x86_64 is 17M; even once you account for libc and libstdc++, it really should not take over half a gigabyte of space.I'm filing this issue on request of @alexcrichton here.
whitequark edited Issue #2318:
I'm using custom module serialization based caching in YoWASP, primarily to be able to show a message during cold startup (which can take many minutes on slow machines), but also to avoid applying zstd compression that noticeably increases startup latency.
The problem is that the cached modules are extremely large—so large that it causes a near-pathological amount of disk I/O. On Linux, yosys.wasm (21M wasm file) results in a 290M serialized module, an increase of ~14×. On Windows, for some reason, that very same file results in a ~580M serialized module, an increase of ~28×. The native Yosys binary for x86_64 is 17M; even once you account for libc and libstdc++, it really should not take over half a gigabyte of space.
I'm filing this issue on request of @alexcrichton here.
whitequark commented on Issue #2318:
Oh and the script that runs it is here. It's extremely simple but maybe I forgot something important.
whitequark edited Issue #2318:
I'm using custom module serialization based caching in YoWASP, primarily to be able to show a message during cold startup (which can take many minutes on slow machines), but also to avoid applying zstd compression that noticeably increases warm startup latency.
The problem is that the cached modules are extremely large—so large that it causes a near-pathological amount of disk I/O. (On Windows VMs, due to the I/O, warm startup actually becomes slower. On bare metal Windows on SSDs it still gets faster.)
On Linux, yosys.wasm (21M wasm file) results in a 290M serialized module, an increase of ~14×. On Windows, for some reason, that very same file results in a ~580M serialized module, an increase of ~28×. The native Yosys binary for x86_64 is 17M; even once you account for libc and libstdc++, it really should not take over half a gigabyte of space.
I'm filing this issue on request of @alexcrichton here.
whitequark edited Issue #2318:
I'm using custom module serialization based caching in YoWASP, primarily to be able to show a message during cold startup (which can take many minutes on slow machines), but also to avoid applying zstd compression that noticeably increases warm startup latency.
The problem is that the cached modules are extremely large—so large that it causes a near-pathological amount of disk I/O. (On Windows VMs, due to the I/O, warm startup actually becomes slower compared to wasmtime's own caching. On bare metal Windows on SSDs it still gets faster.)
On Linux, yosys.wasm (21M wasm file) results in a 290M serialized module, an increase of ~14×. On Windows, for some reason, that very same file results in a ~580M serialized module, an increase of ~28×. The native Yosys binary for x86_64 is 17M; even once you account for libc and libstdc++, it really should not take over half a gigabyte of space.
I'm filing this issue on request of @alexcrichton here.
alexcrichton commented on Issue #2318:
The cache entry size is effectively the bincode-coded size of this structure, which I believe is roughly analagous to the in-memory size of that structure. Adding some simple instrumentation I was able to get the encoded byte size of each of the fields:
module: 152817 obj: 56236896 funcs: 242146780 funcs(traps): 24182940 funcs(address_map): 217774640 funcs(stack_maps): 189216 data_initializers: 2880584 unwind_info: 4611040 total: 306028126
clearly
funcs
is the massive field (too much so), and the worst offenders are thetraps
andaddress_maps
maps. Neither of these maps really justifies taking up so much space, and longer-term we need to have a more serious refactoring to probably encode these entirely different in a much more compact fashion. The consumer, however, relies on random-access right now (or at least the ability to do a binary search), so it won't be immediately trivial to restructure these data structures. In the meantime https://github.com/bytecodealliance/wasmtime/pull/2321 should be a band-aid for now to make it a bit more reasonable (although still pretty unreasonable).I've yet to investigate the Windows discrepancy here, that's quite worrisome too!
alexcrichton commented on Issue #2318:
Oh and after that PR, the breakdown looks like:
module: 152817 obj: 56236896 funcs: 97596780 funcs(traps): 24182940 funcs(address_map): 73224640 funcs(stack_maps): 189216 data_initializers: 2880584 unwind_info: 4611040 total: 161478126
alexcrichton commented on Issue #2318:
Ok I've got one more linked improvement as well as https://github.com/bytecodealliance/wasmtime/pull/2326 which should account for the discrepancy on Windows.
alexcrichton commented on Issue #2318:
With current master plus https://github.com/bytecodealliance/wasmtime/pull/2324 the cache entry size of the original wasm module is down to 97MB, and the cache entry sizes should be roughly the same on Linux and Windows. @whitequark for your use cases is that still too large? It's still pretty large having a >4x size increase, but further changes may require more invasive tweaks than the low hanging fruit picked off so far.
whitequark commented on Issue #2318:
for your use cases is that still too large?
Seems excellent to me. Any chance you can do a point release of wasmtime-py with the updated interpreter? I have to test on a fairly heterogenous set of machines so it'd be much easier if I didn't have to distribute patched wheels as well.
alexcrichton commented on Issue #2318:
Sure! I'll wait until #2324 is in and then I'll do a release of wasmtime
fitzgen commented on Issue #2318:
I don't know how easy
bincode
is to extend, but all the address-keyed func data could probably compress really well if we used deltas instead of absolute values combined with a variable length integer encoding (e.g. LEB128).
fitzgen commented on Issue #2318:
(and we ensured they were sorted)
alexcrichton commented on Issue #2318:
@fitzgen turns out bincode has an option for just that!
alexcrichton commented on Issue #2318:
Ok wasmtime-py 0.21.0 is now published @whitequark if you'd be able to test it out
whitequark commented on Issue #2318:
I'll take a look over the next few days!
whitequark commented on Issue #2318:
Were there any breaking changes? It's a bit unfortunate that I'll have to rebuild all the packages as that makes it a bit harder to get other people to compare several different versions.
whitequark commented on Issue #2318:
I've rebuilt yowasp-yosys and yowasp-nextpnr-ice40 with wasmtime 0.21.0 on my Linux machine.
* Yosys: 291M→94M cache file size improvement, ~2x warm startup latency improvement (wow!!);
* nextpnr-ice40: 23M→6.9M cache file size improvement, warm startup latency changes within noise threshold.
alexcrichton commented on Issue #2318:
Right now our release process for wasmtime is just "major bump everything", and we're then trying to keep the external packages like wasmtime-py and wasmtime-go in sync with the main repo. We should probably allocate space to have minor version bumps of everything as well though...
Glad to hear the results as well!
whitequark commented on Issue #2318:
Ah ok if it's in sync with the main repo then it might actually be pretty handy. I wasn't sure exactly how it worked.
whitequark commented on Issue #2318:
@alexcrichton I've tried running yowasp-yosys and yowasp-nextpnr-ice40 on the cheapest and slowest laptop I found nearby, and it's only something like 2× slower on both cold and warm startup than my XPS 13. I have no idea how you managed to achieve that, and I'm quite impressed. As far as I'm concerned this is enough performance.
whitequark closed Issue #2318:
I'm using custom module serialization based caching in YoWASP, primarily to be able to show a message during cold startup (which can take many minutes on slow machines), but also to avoid applying zstd compression that noticeably increases warm startup latency.
The problem is that the cached modules are extremely large—so large that it causes a near-pathological amount of disk I/O. (On Windows VMs, due to the I/O, warm startup actually becomes slower compared to wasmtime's own caching. On bare metal Windows on SSDs it still gets faster.)
On Linux, yosys.wasm (21M wasm file) results in a 290M serialized module, an increase of ~14×. On Windows, for some reason, that very same file results in a ~580M serialized module, an increase of ~28×. The native Yosys binary for x86_64 is 17M; even once you account for libc and libstdc++, it really should not take over half a gigabyte of space.
I'm filing this issue on request of @alexcrichton here.
alexcrichton commented on Issue #2318:
Nice! Thanks again for the report and helping to investigate!
Last updated: Dec 23 2024 at 12:05 UTC