cfallin opened PR #12636 from cfallin:guest-debugging-preserve-bytecode to bytecodealliance:main:
This PR adds logic to embed the original core Wasm module(s) from a compilation into a new ELF section, alongside other metadata sections. When a component is compiled, the core Wasms inside are preserved, accessible by their
StaticModuleIndexes.The need for this support arises from the guest-debugger ecosystem. Consider either a debug
component (bytecodealliance/rfcs#45) or a bespoke debugger in native code using Wasmtime's APIs. In either case, the existing APIs to introspect execution state provideModulereferences for each instance from each stack frame, and PC offsets into theseModules are the way in which breakpoints are configured. The debugger will somehow need to associate theseModules with the original Wasm bytecode, including e.g. any custom sections containing the producer-specific ways of encoding debug metadata, to do something useful. In particular also note that the GDB-stub protocol as extended for Wasm requires read access directly to the Wasm bytecode (it shows up as part of a "memory map" that is viewed by the standard read-remote-memory command); we can't delegate this requirement to the remote end of the stub connection, but have to handle it in the stub server that runs inside Wasmtime (as a component or bespoke).We have two main choices: carry the original bytecode all the way through the Wasmtime compilation pipeline and present it via
Module::bytecode(), ready to use; or say that this task is out-of-scope and that the debugger top-half can find it on disk somehow.Unfortunately the latter ("out of scope, find the file") is somewhat at odds with the desired developer experience:
It means that we need some way of mapping a compiled Wasm artifact back to a source Wasm; absent "here's the full bytecode", that means "here's the path to the full bytecode", but that path is an identifier that may not be universally accessible (consider e.g. capabilities/preopens present for a debugger component) or portable (consider e.g. moving the artifact to a different machine).
- Or we don't even provide that metadata, and require the user to explicitly specify the same module filename twice -- once to actually run it, and once as an argument to the debugger.
It means that we should account for stale artifacts and mark the mismatch somehow; e.g. if the user starts debugging with Wasmtime, either from a
.cwasmon disk or with one produced in-memory just for this run, and then subsequently rebuilds their source.wasm, we no longer have a reference for it. (The same problem exists one level up if source code is edited, but source to a Wasm producer toolchain is definitely out-of-scope for Wasmtime.)It means that special logic is required in the case of components to map a module back to a specific component section (we would essentially have to expose the static module IDs, then require the debugger top-half to re-implement our exact flattening algorithm to find that core module).
The permissions issue alone was enough to convince me that we should do something better than providing a filename (why should we have to authorize the adapter to read the user's filesystem?) but all of the other benefits -- ensuring an exact match and ensuring perfect availability -- are a nice bonus. The main downside is making the
.cwasmlarger (possibly substantially so), but this overhead is only present when enabling guest-debugging, the data has to be present anyway, and this is likely not a dealbreaker.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
cfallin requested dicej for a review on PR #12636.
cfallin requested wasmtime-core-reviewers for a review on PR #12636.
cfallin updated PR #12636.
cfallin updated PR #12636.
cfallin edited PR #12636:
(Stacked on top of #12637; will rebase that one out once landed.)
This PR adds logic to embed the original core Wasm module(s) from a compilation into a new ELF section, alongside other metadata sections. When a component is compiled, the core Wasms inside are preserved, accessible by their
StaticModuleIndexes.The need for this support arises from the guest-debugger ecosystem. Consider either a debug
component (bytecodealliance/rfcs#45) or a bespoke debugger in native code using Wasmtime's APIs. In either case, the existing APIs to introspect execution state provideModulereferences for each instance from each stack frame, and PC offsets into theseModules are the way in which breakpoints are configured. The debugger will somehow need to associate theseModules with the original Wasm bytecode, including e.g. any custom sections containing the producer-specific ways of encoding debug metadata, to do something useful. In particular also note that the GDB-stub protocol as extended for Wasm requires read access directly to the Wasm bytecode (it shows up as part of a "memory map" that is viewed by the standard read-remote-memory command); we can't delegate this requirement to the remote end of the stub connection, but have to handle it in the stub server that runs inside Wasmtime (as a component or bespoke).We have two main choices: carry the original bytecode all the way through the Wasmtime compilation pipeline and present it via
Module::bytecode(), ready to use; or say that this task is out-of-scope and that the debugger top-half can find it on disk somehow.Unfortunately the latter ("out of scope, find the file") is somewhat at odds with the desired developer experience:
It means that we need some way of mapping a compiled Wasm artifact back to a source Wasm; absent "here's the full bytecode", that means "here's the path to the full bytecode", but that path is an identifier that may not be universally accessible (consider e.g. capabilities/preopens present for a debugger component) or portable (consider e.g. moving the artifact to a different machine).
- Or we don't even provide that metadata, and require the user to explicitly specify the same module filename twice -- once to actually run it, and once as an argument to the debugger.
It means that we should account for stale artifacts and mark the mismatch somehow; e.g. if the user starts debugging with Wasmtime, either from a
.cwasmon disk or with one produced in-memory just for this run, and then subsequently rebuilds their source.wasm, we no longer have a reference for it. (The same problem exists one level up if source code is edited, but source to a Wasm producer toolchain is definitely out-of-scope for Wasmtime.)It means that special logic is required in the case of components to map a module back to a specific component section (we would essentially have to expose the static module IDs, then require the debugger top-half to re-implement our exact flattening algorithm to find that core module).
The permissions issue alone was enough to convince me that we should do something better than providing a filename (why should we have to authorize the adapter to read the user's filesystem?) but all of the other benefits -- ensuring an exact match and ensuring perfect availability -- are a nice bonus. The main downside is making the
.cwasmlarger (possibly substantially so), but this overhead is only present when enabling guest-debugging, the data has to be present anyway, and this is likely not a dealbreaker.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
cfallin updated PR #12636.
cfallin edited PR #12636:
(Stacked on top of #12637; will rebase that one out once landed.)
This PR adds logic to embed the original core Wasm module(s) from a compilation into a new ELF section, alongside other metadata sections, when guest debugging is enabled. When a component is compiled (with debugging), the core Wasms inside are preserved, accessible by their
StaticModuleIndexes.The need for this support arises from the guest-debugger ecosystem. Consider either a debug
component (bytecodealliance/rfcs#45) or a bespoke debugger in native code using Wasmtime's APIs. In either case, the existing APIs to introspect execution state provideModulereferences for each instance from each stack frame, and PC offsets into theseModules are the way in which breakpoints are configured. The debugger will somehow need to associate theseModules with the original Wasm bytecode, including e.g. any custom sections containing the producer-specific ways of encoding debug metadata, to do something useful. In particular also note that the GDB-stub protocol as extended for Wasm requires read access directly to the Wasm bytecode (it shows up as part of a "memory map" that is viewed by the standard read-remote-memory command); we can't delegate this requirement to the remote end of the stub connection, but have to handle it in the stub server that runs inside Wasmtime (as a component or bespoke).We have two main choices: carry the original bytecode all the way through the Wasmtime compilation pipeline and present it via
Module::bytecode(), ready to use; or say that this task is out-of-scope and that the debugger top-half can find it on disk somehow.Unfortunately the latter ("out of scope, find the file") is somewhat at odds with the desired developer experience:
It means that we need some way of mapping a compiled Wasm artifact back to a source Wasm; absent "here's the full bytecode", that means "here's the path to the full bytecode", but that path is an identifier that may not be universally accessible (consider e.g. capabilities/preopens present for a debugger component) or portable (consider e.g. moving the artifact to a different machine).
- Or we don't even provide that metadata, and require the user to explicitly specify the same module filename twice -- once to actually run it, and once as an argument to the debugger.
It means that we should account for stale artifacts and mark the mismatch somehow; e.g. if the user starts debugging with Wasmtime, either from a
.cwasmon disk or with one produced in-memory just for this run, and then subsequently rebuilds their source.wasm, we no longer have a reference for it. (The same problem exists one level up if source code is edited, but source to a Wasm producer toolchain is definitely out-of-scope for Wasmtime.)It means that special logic is required in the case of components to map a module back to a specific component section (we would essentially have to expose the static module IDs, then require the debugger top-half to re-implement our exact flattening algorithm to find that core module).
The permissions issue alone was enough to convince me that we should do something better than providing a filename (why should we have to authorize the adapter to read the user's filesystem?) but all of the other benefits -- ensuring an exact match and ensuring perfect availability -- are a nice bonus. The main downside is making the
.cwasmlarger (possibly substantially so), but this overhead is only present when enabling guest-debugging, the data has to be present anyway, and this is likely not a dealbreaker.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
github-actions[bot] added the label wasmtime:api on PR #12636.
cfallin updated PR #12636.
dicej requested alexcrichton for a review on PR #12636.
cfallin updated PR #12636.
cfallin edited PR #12636:
This PR adds logic to embed the original core Wasm module(s) from a compilation into a new ELF section, alongside other metadata sections, when guest debugging is enabled. When a component is compiled (with debugging), the core Wasms inside are preserved, accessible by their
StaticModuleIndexes.The need for this support arises from the guest-debugger ecosystem. Consider either a debug
component (bytecodealliance/rfcs#45) or a bespoke debugger in native code using Wasmtime's APIs. In either case, the existing APIs to introspect execution state provideModulereferences for each instance from each stack frame, and PC offsets into theseModules are the way in which breakpoints are configured. The debugger will somehow need to associate theseModules with the original Wasm bytecode, including e.g. any custom sections containing the producer-specific ways of encoding debug metadata, to do something useful. In particular also note that the GDB-stub protocol as extended for Wasm requires read access directly to the Wasm bytecode (it shows up as part of a "memory map" that is viewed by the standard read-remote-memory command); we can't delegate this requirement to the remote end of the stub connection, but have to handle it in the stub server that runs inside Wasmtime (as a component or bespoke).We have two main choices: carry the original bytecode all the way through the Wasmtime compilation pipeline and present it via
Module::bytecode(), ready to use; or say that this task is out-of-scope and that the debugger top-half can find it on disk somehow.Unfortunately the latter ("out of scope, find the file") is somewhat at odds with the desired developer experience:
It means that we need some way of mapping a compiled Wasm artifact back to a source Wasm; absent "here's the full bytecode", that means "here's the path to the full bytecode", but that path is an identifier that may not be universally accessible (consider e.g. capabilities/preopens present for a debugger component) or portable (consider e.g. moving the artifact to a different machine).
- Or we don't even provide that metadata, and require the user to explicitly specify the same module filename twice -- once to actually run it, and once as an argument to the debugger.
It means that we should account for stale artifacts and mark the mismatch somehow; e.g. if the user starts debugging with Wasmtime, either from a
.cwasmon disk or with one produced in-memory just for this run, and then subsequently rebuilds their source.wasm, we no longer have a reference for it. (The same problem exists one level up if source code is edited, but source to a Wasm producer toolchain is definitely out-of-scope for Wasmtime.)It means that special logic is required in the case of components to map a module back to a specific component section (we would essentially have to expose the static module IDs, then require the debugger top-half to re-implement our exact flattening algorithm to find that core module).
The permissions issue alone was enough to convince me that we should do something better than providing a filename (why should we have to authorize the adapter to read the user's filesystem?) but all of the other benefits -- ensuring an exact match and ensuring perfect availability -- are a nice bonus. The main downside is making the
.cwasmlarger (possibly substantially so), but this overhead is only present when enabling guest-debugging, the data has to be present anyway, and this is likely not a dealbreaker.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Why ends vs starts? The
append_section_datamethod above also returnsu64which could be used to calculate the start.
alexcrichton created PR review comment:
This can probably make good use of
object::U32Bytes<LittleEndian>and slice helpers in that crate to directly turnself.wasm_bytecode_ends()into&[U32Bytes<LittleEndian>]which should make the indexing here easier too.
alexcrichton created PR review comment:
Bikeshed: maybe
debug_bytecodeto be similar to other methods to emphasize how it's only available in debug mode?
alexcrichton created PR review comment:
In the interest of keeping as few things
pubas possible and that this is only used internally, mind removingpub?
cfallin submitted PR review.
cfallin created PR review comment:
Ah, either one works (in the end-is-implicit case we take start of next) except that using starts gets a little hairy when computing the length of the last: in case of section alignment padding, we need to store the length for that one special case (i.e. we need n+1 numbers for n elements). So I usually prefer to store ends when using this idiom as the fencepost on the other end is always 0.
cfallin submitted PR review.
cfallin created PR review comment:
Good call -- we use that in frame table parsing etc but I had avoided it here because I thought we didn't take the dep on
objectin the mainwasmtimecrate already; but (i) we do and (ii) we link environ anyway so that's a silly concern. Thanks!
cfallin updated PR #12636.
cfallin submitted PR review.
cfallin created PR review comment:
Sure, updated; thanks.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, yep, updated.
cfallin has enabled auto merge for PR #12636.
cfallin added PR #12636 Debugging: preserve original Wasm bytecode inside of compiled ELF artifact. to the merge queue
cfallin merged PR #12636.
cfallin removed PR #12636 Debugging: preserve original Wasm bytecode inside of compiled ELF artifact. from the merge queue
Last updated: Feb 24 2026 at 04:36 UTC