Stream: git-wasmtime

Topic: wasmtime / issue #3547 Shrink the size of a compiled arti...


view this post on Zulip Wasmtime GitHub notifications bot (Nov 18 2021 at 17:57):

alexcrichton opened issue #3547:

Upon thinking about this recently I believe we can shrink the .addrmap section of compiled artifacts a significant amount. Currently this section is used to translate from machine code addresses to addresses of instructions within the original wasm file itself. This information is used primarily backtraces to go from machine address to wasm address and then via the wasm dwarf from wasm address to filename and line number.

At this time, though, we have a mapping from machine code address to wasm address for every single wasm instruction in the entire module. I don't actually think that this is necessary. Instead I think we only need mappings for trapping instructions and instructions which call a function (not a wasm function but instead a Cranelift-level call to include things like memory.grow and such). At this time we're not collecting "asynchronous backtraces" or anything like that so there's no need to actually have an address map for every single wasm instruction in the module.

I suspect that this would lead to huge savings on the .addrmap section which is currently sometimes even larger than the .text section. I don't think this will necessarily be trivially implemented, though, and will involve some trickery on the cranelift side of things to correlating the source of all machine instructions, whether they're calling, and whether they can trap.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 18 2021 at 18:16):

cfallin commented on issue #3547:

Great idea!

I'll note that the line-number-per-wasm-instruction (or actually one really wants line-number-per-compiled-instruction I think) is actually useful if one is single-stepping through code; the infra right now is I guess fully general because of this use-case. But of course backtraces are different, as you say!

It'd probably be reasonably easy to add a knob implements your suggestion -- I'd do it by returning an Option<SourceLoc> here, probably...

view this post on Zulip Wasmtime GitHub notifications bot (Nov 18 2021 at 18:18):

fitzgen commented on issue #3547:

As discussed in private chat, we should actually be able to remove the .addrmap section completely by updating/fixing/special casing our wasm -> source to native -> source DWARF translation so that we can use the relevant DWARF sections directly without any native -> wasm translation that .addrmap is providing.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 18 2021 at 18:19):

fitzgen commented on issue #3547:

(The DWARF translation would happen at module compilation time, not runtime.)

view this post on Zulip Wasmtime GitHub notifications bot (Nov 18 2021 at 18:26):

bjorn3 commented on issue #3547:

DWARF translation only happens when there is DWARF debuginfo in the source modules in the first place. .addrmap is also used for backtraces when there is no DWARF debuginfo at all or when DWARF translation is not enabled.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 18 2021 at 18:41):

fitzgen commented on issue #3547:

That's correct. In the mode I am proposing, we would still only include the subset of DWARF that we are querying today to reconstruct backtraces after translating the native PC to a Wasm PC via .addrmap, this wouldn't imply including every DWARF section and all of their contents.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 18 2021 at 18:53):

bjorn3 commented on issue #3547:

The DWARF .debug_line section can't encode a native pc -> wasm pc mapping as necessary when there is no debuginfo. It can only encode a native pc -> (file, line, column, flags) tuple mapping. Wasmtime shows module name + wasm pc when there is no debuginfo, right? On one hand if you encode it as native pc -> (file, line=wasm pc, 0, no flags) that would be bigger than what you can do using .addrmap if you were to encode it using deltas. On the other hand .debug_line always encodes it as deltas, which makes lookup much slower than the current scheme as you have to traverse the entire section. On native DWARF this is somewhat less painful as every compilation unit gets it's own .debug_line mapping, but for generated wasm .debug_line you would likely put the entire wasm module in a single compilation unit.


Last updated: Nov 22 2024 at 16:03 UTC