alexcrichton commented on issue #3789:
For reference, the 80% number comes from the benchmark in https://github.com/bytecodealliance/wasmtime/pull/3790 and looks like:
deserialize/deserialize/rustpython.wasm time: [666.37 us 666.56 us 666.75 us] change: [-79.458% -79.453% -79.448%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 5 (5.00%) high mild deserialize/deserialize/spidermonkey.wasm time: [1.4307 ms 1.4310 ms 1.4313 ms] change: [-82.183% -82.176% -82.170%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low severe 3 (3.00%) low mild 3 (3.00%) high mild
fitzgen commented on issue #3789:
Another thing we could do to take this even further is have a header on this new section (or add another new section) that contains the function index -> (offset, length) pairs. This way the
CompiledModuleInfo
would be even a bit smaller than it is after this PR.
github-actions[bot] commented on issue #3789:
Subscribe to Label Action
cc @peterhuene
<details>
This issue or pull request has been labeled: "wasmtime:api"Thus the following users have been cc'd because of the following labels:
- peterhuene: wasmtime:api
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
alexcrichton commented on issue #3789:
I was considering doing that but ended up deciding against it because we dont have much other laziness in
CompiledModule
right now, although theoretically we should be able to have a fair amount more. I was originally tipped off to this by a unnaturally large.wasmtime.info
section in the binary:$ objdump -h spidermonkey.cwasm spidermonkey.cwasm: file format elf64-littleaarch64 Sections: Idx Name Size VMA LMA File off Algn 0 .text 00dc0000 0000000000000000 0000000000000000 00010000 2**16 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .eh_frame 001bf84c 0000000000000000 0000000000000000 00dd0000 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .wasmtime.addrmap 00bddf9c 0000000000000000 0000000000000000 00f8f84c 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .wasmtime.traps 002aefb7 0000000000000000 0000000000000000 01b6d7e8 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .rodata.wasm 00e5e488 0000000000000000 0000000000000000 01e1c79f 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 5 .wasmtime.info 0042320c 0000000000000000 0000000000000000 02c7ac27 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA
where here it's on the order of the size of the text section which was massive. This PR actually doesn't help this too too much and instead yields:
$ objdump -h spidermonkey.cwasm ... 5 .name.wasm 002e344f 0000000000000000 0000000000000000 02ce31df 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 6 .wasmtime.info 0013fdc4 0000000000000000 0000000000000000 02fc662e 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA
where
.name.wasm + .wasmtime.info
is roughly the size of the old section. This still seems abnormally large for.wasmtime.info
though as it's still sort of on the order of the entire text section. I benchmarked after this PR and it looked like all the remaining decoding time wasPrimaryMap<DefinedFuncIndex, FunctionInfo>,
. I didn't immediately seeVec<FunctionName>
in the profile so I wasn't too worried about its decoding time.I'll leave optimizing
Vec<FunctionInfo>
for another day, and in the meantime I do agree that we've still got in our back pocket lazily decoding theVec<FunctionName>
entirely as well since we don't need to create that immediately either.
fitzgen commented on issue #3789:
because we dont have much other laziness in
CompiledModule
right now,Wouldn't even need to be lazy inside
CompiledModuleInfo
, just use the header section directly. If each entry is two words (offset and length) then we can lookup thei
th function's name by just doing#[repr(C)] struct OffsetAndLength { offset: usize, length: usize, } let header_section_ptr: *const OffsetAndLengthPair = ...; let header_section_len = ...; let header_section = slice::from_raw_parts(...); return header_section[i];
fitzgen commented on issue #3789:
But yeah no need to hold up this PR or do that now, if we don't want to.
alexcrichton commented on issue #3789:
Oh actually that's a great point. We already do that for the address map section where there's a special encoder followed by a specialized lookup routine for the data produced by that encoder. While we need to be somewhat careful about alignments and such it's not really the end of the world.
Arguably almost everything in
wasmtime::Module
should move to something like this because deserializingwasmtime::Module
is not a trivial task and is otherwise quite wasteful (lots of allocations, none of which are ever mutated so could continue to use the source-of-truth)...Anyway I think I'll probably go with this for now since it's relatively lightweight, but there's definitely still more fruit to pick on the accelerate-load-precompiled-modules tree!
Last updated: Jan 24 2025 at 00:11 UTC