wasmtime / PR #6282 wasmtime: In-process sampling profiler · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #6282 wasmtime: In-process sampling profiler

Wasmtime GitHub notifications bot (Apr 25 2023 at 01:06):

jameysharp opened PR #6282 from jameysharp:guest-profiler to bytecodealliance:main:

This isn't ready for detailed review but the basic functionality seems to work so I'm putting it up in case anyone wants to play with it.

Unlike the existing profiling options, this works on all platforms and does not rely on any external profiling tools like perf or VTune. On the other hand, it can only profile time spent in the WebAssembly guest, not in Wasmtime itself or other host code. Also it can't measure time as precisely as platform-native tools can.

Wasmtime GitHub notifications bot (Apr 26 2023 at 00:28):

jameysharp updated PR #6282.

Wasmtime GitHub notifications bot (Apr 26 2023 at 00:31):

jameysharp has marked PR #6282 as ready for review.

Wasmtime GitHub notifications bot (Apr 26 2023 at 00:31):

jameysharp requested fitzgen for a review on PR #6282.

Wasmtime GitHub notifications bot (Apr 26 2023 at 00:31):

jameysharp requested wasmtime-core-reviewers for a review on PR #6282.

Wasmtime GitHub notifications bot (Apr 26 2023 at 00:31):

jameysharp requested wasmtime-default-reviewers for a review on PR #6282.

Wasmtime GitHub notifications bot (Apr 26 2023 at 08:54):

bjorn3 created PR review comment:

You can use self.start.elapsed().

Wasmtime GitHub notifications bot (Apr 26 2023 at 14:29):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Apr 26 2023 at 14:29):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Apr 26 2023 at 14:29):

alexcrichton created PR review comment:

Could this be moved into a CLI argument as well?

Wasmtime GitHub notifications bot (Apr 26 2023 at 14:29):

alexcrichton created PR review comment:

I think that the error handling here may want to be a bit more nuanced perhaps. For example if invoke_func failed and this finalization also fails, the original failure will be masked by the writing of the profile.

This concern of mine is additionally shared with the writing of a core dump after a trap, which IIRC will print out an error but won't return the core dump writing error further up the stack. Could the logic there and here be unified perhaps into a sort of "perform a fallible operation post-wasm-execution"?

Wasmtime GitHub notifications bot (Apr 26 2023 at 14:29):

alexcrichton created PR review comment:

The interning and various allocations here make me wonder: would it be better to defer this until after execution? Should this instead buffer up a Vec<WasmBacktrace> which is then all serialized out at the end of execution?

Or perhaps amortizing the cost by collecting N backtraces and then flushing them all at once perhaps.

Wasmtime GitHub notifications bot (Apr 26 2023 at 14:29):

alexcrichton created PR review comment:

Are there any flags or configuration we could perhaps pass to indicate which module a frame comes from? Or perhaps which instance a frame comes from? We don't currently have a means of learning that from a wasm frame but it's something we could theoretically add.

Wasmtime GitHub notifications bot (Apr 26 2023 at 17:36):

jameysharp updated PR #6282.

Wasmtime GitHub notifications bot (Apr 26 2023 at 17:36):

jameysharp submitted PR review.

Wasmtime GitHub notifications bot (Apr 26 2023 at 17:36):

jameysharp submitted PR review.

Wasmtime GitHub notifications bot (Apr 26 2023 at 17:36):

jameysharp created PR review comment:

Nice, I didn't know about Instant::elapsed. Thanks!

Wasmtime GitHub notifications bot (Apr 26 2023 at 17:36):

jameysharp created PR review comment:

I've now spent a bit of time looking at the coredump path and I don't see a nice way to unify this with that. So instead I've changed setup_epoch_handler so the closure it returns doesn't have a return value, forcing it to report and suppress errors internally.

Wasmtime GitHub notifications bot (Apr 26 2023 at 17:36):

jameysharp created PR review comment:

Oh yeah, I intended to do that and forgot. Thanks!

Wasmtime GitHub notifications bot (Apr 26 2023 at 17:36):

jameysharp created PR review comment:

The Firefox profile format supports a lot of metadata that I'm not currently filling in. Much of it isn't currently exposed by the fxprof-processed-profile crate, but there's definitely more available in the current crate that we could be using.

One way we could express module/instance tags today is by creating a CategoryHandle for each one, rather than using the OTHER category.

There are other things I'd like to add too: notably, thread-local CPU time consumed between samples, and a call-hook to add markers to the profile for exit from and re-entry into wasm.

I'd prefer not to add any of these features in this PR though. So I plan to add them as notes in comments while I'm writing documentation for this profiler, and I'll mention module/instance tags too.

Wasmtime GitHub notifications bot (Apr 26 2023 at 17:36):

jameysharp created PR review comment:

I want to move the GuestProfiler type into the wasmtime crate primarily because I want it to be usable by embedders, but an additional reason is because then it's reasonable for it to use wasmtime-runtime's low-level stack-walking API directly. I think there's a lot of work that can be avoided that way.

I don't think batching up WasmBacktrace objects is going to help much, because just collecting one of those involves looking up which function each PC comes from, along with information like the function name. At that point I'd guess three hash-table lookups per function for interning aren't going to make much difference.

I'd like to merge this with the current approach, of eagerly consuming the high-level WasmBacktrace, as-is for now. I don't think the public API of GuestProfiler will need to change to support more efficient collection of backtraces, so I think we can iterate on performance improvements afterward without impacting any users.

I definitely want to improve this over time though. I think eventually we're going to want API changes in fxprof-processed-profile for the highest performance sampling we can do in-process.

Wasmtime GitHub notifications bot (Apr 26 2023 at 20:52):

jameysharp updated PR #6282.

Wasmtime GitHub notifications bot (Apr 26 2023 at 21:51):

jameysharp updated PR #6282.

Wasmtime GitHub notifications bot (Apr 26 2023 at 22:01):

jameysharp updated PR #6282.

Wasmtime GitHub notifications bot (Apr 27 2023 at 14:27):

alexcrichton submitted PR review:

I'm ok landing this personally, but I think we should be clear in the documentation of the type that it's not really at a state of implementation quality to be a high-fidelity general purpose guest profiler. For example some drawbacks I can think of are:

Requires epochs as the driver for stack collection, meaning the granularity of samples is function entries and loop headers. Additionally this means that you're not necessarily measuring the "raw" performance since it's code-based instrumentation.

Stacks are always collected from wasm meaning that any time spent in the host is not accounted for. This means that samples can be lost while in the host and additionally whether or not the host blocked or did compute I think is unclear. (although perhaps this is fixable)

I suspect that a large number of use cases are totally ok with these drawbacks, but at the same time I think it's important to be clear that we're not looking to get into the business of a "real" profiler at this time with off-thread stack sampling, host integration, etc.

Additionally would you also be up for updating the documentation on docs/*.md? Currently we have examples-profiling.md which delegates to perf and vtune and I think it'd be good to add an entry for this strategy of profiling. Ideally there could be at least a small paragraph of words explaining why one might choose one profiling strategy over another (e.g. perf captures everything but is Linux-specific and not always easy to interpret, while this is cross-platform but not as "high fidelity" as perf)

Wasmtime GitHub notifications bot (Apr 27 2023 at 14:27):

alexcrichton submitted PR review:

I'm ok landing this personally, but I think we should be clear in the documentation of the type that it's not really at a state of implementation quality to be a high-fidelity general purpose guest profiler. For example some drawbacks I can think of are:

Requires epochs as the driver for stack collection, meaning the granularity of samples is function entries and loop headers. Additionally this means that you're not necessarily measuring the "raw" performance since it's code-based instrumentation.

Stacks are always collected from wasm meaning that any time spent in the host is not accounted for. This means that samples can be lost while in the host and additionally whether or not the host blocked or did compute I think is unclear. (although perhaps this is fixable)

I suspect that a large number of use cases are totally ok with these drawbacks, but at the same time I think it's important to be clear that we're not looking to get into the business of a "real" profiler at this time with off-thread stack sampling, host integration, etc.

Additionally would you also be up for updating the documentation on docs/*.md? Currently we have examples-profiling.md which delegates to perf and vtune and I think it'd be good to add an entry for this strategy of profiling. Ideally there could be at least a small paragraph of words explaining why one might choose one profiling strategy over another (e.g. perf captures everything but is Linux-specific and not always easy to interpret, while this is cross-platform but not as "high fidelity" as perf)

Wasmtime GitHub notifications bot (Apr 27 2023 at 14:27):

alexcrichton created PR review comment:

I think this'd be a good place to have a doc-block example if you're up for it about how to configure a Store and such.

Wasmtime GitHub notifications bot (Apr 27 2023 at 14:27):

alexcrichton created PR review comment:

This was actually a question I thought of, what happens if you block in a hostcall for a second?

I think you'd get a stack sample, much later, in probably the wrong location (e.g. the next function call after the host import or the loop that's calling the host import). I'm not sure how those really-far-apart stack samples are interpreted by the firefox profiler, but perhaps this CPU time measurement would help?

Wasmtime GitHub notifications bot (Apr 27 2023 at 14:27):

alexcrichton created PR review comment:

API question: instead of requiring an Arc and Mutex here, which technically aren't necessary since there aren't any threads, could this instead be stored in the T of Store<T>? As-is the GuestProfiler::sample method would not work since it would be stored in T but additionally takes a store as a parameter, but I think that could be fixed by taking a&WasmBacktrace as an argument since that's all the store is used for internally.

Wasmtime GitHub notifications bot (Apr 27 2023 at 14:27):

alexcrichton created PR review comment:

Could this be expanded with more information about what to do with the output? For example explaining that it's a JSON blob to upload to https://profiler.firefox.com I think would be good.

Wasmtime GitHub notifications bot (Apr 27 2023 at 14:27):

alexcrichton created PR review comment:

Is it worth perhaps providing a CLI flag to auto-upload the profile and printing a URL that can be clicked on?

Wasmtime GitHub notifications bot (Apr 27 2023 at 17:07):

fitzgen submitted PR review:

Really excited for this!

When profiling is enabled in the CLI, it probably makes sense to write some output after running the Wasm that repeats where the profile file was written to and how to upload it to https://profiler.firefox.com/

Wasmtime GitHub notifications bot (Apr 27 2023 at 17:07):

fitzgen submitted PR review:

Really excited for this!

When profiling is enabled in the CLI, it probably makes sense to write some output after running the Wasm that repeats where the profile file was written to and how to upload it to https://profiler.firefox.com/

Wasmtime GitHub notifications bot (Apr 27 2023 at 17:07):

fitzgen created PR review comment:

Should this method take self? Would it be valid to add more samples after calling finish?

Wasmtime GitHub notifications bot (Apr 27 2023 at 17:07):

fitzgen created PR review comment:

It could be good to have a full example here, if we are just going to expose building blocks rather than a single enable-profiling-and-get-dump-profiles-to-this-file kind of API.

Wasmtime GitHub notifications bot (Apr 27 2023 at 17:07):

fitzgen created PR review comment:

Doesn't matter a ton in this case, but one pattern I've used to make APIs like this slightly harder to get wrong is to return an impl Drop:
struct FinishEpochHandler<F: FnOnce()>(Option<F>);
impl<F: FnOnce()> Drop for FinishEpochHandler<F> {
    fn drop(&mut self) {
        if let Some(f) = self.0.take() {
            f();
        }
    }
}
return FinishEpochHandler(Some(move || {
    // what you have now...
}));
This way, callers don't need to actually remember to call the resulting function, at most they just need to add a scope.

But I guess if you do Alex's suggestion of moving the profiler into Store, then this result function would require a store argument, and the RAII/impl Drop approach won't really work anymore...

Wasmtime GitHub notifications bot (Apr 27 2023 at 17:59):

jameysharp created PR review comment:

"Wait, I could just keep it in the Store," I thought, as I fell asleep last night; then wake up to find this comment. :laughing:

I don't want to make WasmBacktrace part of the public API here because I want to re-implement this in terms of the lower-level wasmtime-runtime stack-walking API. That doesn't directly need access to the store at all, but I do need some information from the store at some point to figure out how native PCs map to individual modules and to functions within those modules.

It might work though to pass in the store when constructing a GuestProfiler, collect just the necessary information about all the modules and symbols from it, and then not need it in sample.

Otherwise, the only alternative I've thought of is to define fn sample(store: impl AsContextMut, get_self: impl FnOnce(&mut Store) -> &mut Self). Not the nicest interface but I think it should work, right?

Wasmtime GitHub notifications bot (Apr 27 2023 at 19:16):

jameysharp created PR review comment:

I've only been using the "local file" interface, so I didn't realize there was a way to upload a profile and share a link. I guess the way to do that is documented at https://github.com/firefox-devtools/profiler/blob/main/docs-developer/loading-in-profiles.md.

I don't see any HTTP client usage anywhere in Wasmtime except in wasi-http right now. So that seems like a relatively large change that I'd prefer to think about separately from the basic profiling implementation. It would definitely be nice for usability, though.

Wasmtime GitHub notifications bot (Apr 27 2023 at 19:54):

jameysharp created PR review comment:

If you call finish again it'll just update the thread end-time and write out the whole profile again, including any new samples. So it's probably not useful to add more samples after calling finish, but it should be harmless.

That said, I only did it this way because there's still an extra reference to the GuestProfiler in the epoch callback after the instance finishes, and I didn't want to have to clone the profile. But if I can make Alex's suggestion work, of keeping the GuestProfiler in the store instead of in an Arc, then that won't be an issue and I could make the change you're suggesting too.

Wasmtime GitHub notifications bot (Apr 27 2023 at 21:06):

alexcrichton created PR review comment:

Ah ok yeah if it's not easy to reuse whatever is in wasi-http I'm happy to leave this as-is

Wasmtime GitHub notifications bot (Apr 27 2023 at 21:09):

alexcrichton created PR review comment:

How about:
impl<T> Store<T> {
    pub fn sample(&mut self, get_profiler: impl FnOnce(&mut T) -> &mut GuestProfiler) { ... }
}
with similar impls for StoreContextMut?

which is more-or-less the same thing, but just as a method instead of a function on GuestProfiler now that I reread your comment

Wasmtime GitHub notifications bot (Apr 29 2023 at 01:09):

jameysharp updated PR #6282.

Wasmtime GitHub notifications bot (May 01 2023 at 22:10):

jameysharp updated PR #6282.

Wasmtime GitHub notifications bot (May 01 2023 at 22:37):

jameysharp updated PR #6282.

Wasmtime GitHub notifications bot (May 02 2023 at 00:45):

alexcrichton submitted PR review:

Sounds like a good plan to me :+1:

Wasmtime GitHub notifications bot (May 02 2023 at 16:25):

jameysharp merged PR #6282.

Last updated: Apr 16 2025 at 23:03 UTC