alexcrichton opened PR #10034 from alexcrichton:pulley-profile
to bytecodealliance:main
:
This commit adds basic support for profiling the Pulley interpreter. This is partially achievable previously through the use of native profilers, but the downside of that approach is that you can find hot instructions but it's not clear in what context the hot instructions are being executed nor what functions are hot. The goal of this profiler is to show pulley bytecode and time spent in bytecode itself to better understand the shape of code around a hot instruction to identify new macro opcodes for example.
The general structure of this new profiler is:
There is a compile-time feature for Pulley which is off-by-default where, when enabled, Pulley will record its current program counter into an
AtomicUsize
before each instruction.When the CLI has
--profile pulley
Wasmtime will spawn a sampling thread in the same process which will periodically read from thisAtomicUsize
to record where the program is currently executing.The Pulley profiler additionally records all bytecode through the use of the
ProfilingAgent
trait to ensure that the recording has access to all bytecode as well.Samples are taken throughout the process and emitted to a
pulley-$pid.data
file. This file is then interpreted and printed by an "example" programprofiler-html.rs
in thepulley/examples
directory.The end result is that hot functions of Pulley bytecode can be seen and instructions are annotated with how frequently they were executed. This enables finding hot loops and understanding more about the whole loop, bytecodes that were selected, and such.
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
alexcrichton requested dicej for a review on PR #10034.
alexcrichton requested wasmtime-default-reviewers for a review on PR #10034.
alexcrichton requested wasmtime-core-reviewers for a review on PR #10034.
alexcrichton requested fitzgen for a review on PR #10034.
alexcrichton updated PR #10034.
alexcrichton updated PR #10034.
alexcrichton updated PR #10034.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I'll note that this change is required to appease
become
becauserun
's signature differed from the handler's signature (Interpreter
-by-value vs exploded-to-components)
alexcrichton created PR review comment:
I'll note here that we were forgetting this before, which caused the profiler to not work with the tail loop initially.
alexcrichton created PR review comment:
I'll note that this change was done to ensure/guarantee that these three components of
Interpreter
are passed in registers. I was worried about crossing a threshold where ifInterpeter
got too big it would be passed by-ref instead of "exploded" into components like we want.
alexcrichton created PR review comment:
I'll also note that
ExecutingPcRef
is a zero-sized-type whenprofile
is disabled, otherwise it's a pointer-large.
alexcrichton updated PR #10034.
fitzgen submitted PR review:
Super excited for this!
fitzgen created PR review comment:
This seems like a fairly trivial change in terms of effort, but good payoff in terms of longer-term maintainability. Mind doing it now?
fitzgen created PR review comment:
Makes sense, thanks for the explanation.
alexcrichton updated PR #10034.
alexcrichton has enabled auto merge for PR #10034.
alexcrichton updated PR #10034.
alexcrichton has enabled auto merge for PR #10034.
alexcrichton merged PR #10034.
Last updated: Jan 24 2025 at 00:11 UTC