jameysharp commented on issue #6282:
This PR now produces output that works as expected with https://profiler.firefox.com/, and I don't think it's fundamentally missing anything.
Before merging, I'd like to move the
GuestProfiler
struct somewhere in thewasmtime
crate, instead of leaving it inwasmtime-cli
where onlywasmtime run
can use it.I don't think any of the implementation will change though, just move elsewhere, so I'm marking this as ready for review.
github-actions[bot] commented on issue #6282:
Subscribe to Label Action
cc @kubkon, @peterhuene
<details>
This issue or pull request has been labeled: "wasi", "wasmtime:api"Thus the following users have been cc'd because of the following labels:
- kubkon: wasi
- peterhuene: wasmtime:api
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
jameysharp commented on issue #6282:
I got rid of the need for
Store
insample()
. Then I discovered that I can't put aGuestProfiler
directly in theHost
structure because some of the WASI extensions require thatHost
must be cloneable.As far as I can tell I end up needing an
Arc
andMutex
anyway. I would love for somebody to tell me how to avoid that.It was still worth doing this because now any particular PC is only looked up once per profile and cached by
fxprof_processed_profile
. Different PCs within the same function need to be looked up separately but since we only collect stack traces at epoch interruption points there aren't so many distinct PCs. This should significantly reduce profiling overhead but I haven't measured how much impact it actually has.I struggled with this a bunch because I was trying to make it automatically use all modules which have been registered within the current
Store
'sModuleRegistry
. But for WASI commands the module isn't registered in the store until you actually try to invoke it, and I tried all sorts of ways to hook into that registration before realizing that the caller knows which modules they care about and I can just make it their problem to tell me.Then I decided that it's a feature to be able to exclude modules from profiles: if you have some proprietary module that you're linking with your customer's modules, you may not want to reveal anything about the implementation of your module when providing them with a profile.
alexcrichton commented on issue #6282:
As far as I can tell I end up needing an Arc and Mutex anyway.
Ah this is due to the integration of
wasi-threads
where that's modeled by cloning the store's context to duplicate it on another thread. Otherwise though the cloning is not required. You can take a similar route as theWasiNnCtx
integration, for example, which additionally does not implementClone
so what it does is it's bundled into anArc
where whenever it's required it callsArc::get_mut
. That fails if there's more than one clone, or more than one thread, and the error message says "this doesn't support threads yet". I think that profiling will fall into the same bucket?In the long-term future I could imagine that there's a standalone profiler struct where you acquire thread-specific profilers from that structure. The
Host
could store the standalone profiler (which internally probably has an arc and a mutex) and additionally thread-specific profilers. TheClone for Host
impl would then use the standalone profiler to get a new thread-specific profiler, allowing each thread to have mutable access to its profiler but no others. Something like that. No need to do here of course, mostly just modeling how this may look in the future with support for threads.I can just make it their problem to tell me.
This seems like a neat idea to me! I like the idea of precomputing symbol tables. I do think this has the downside of that it has more setup required, but that's also perhaps not a bad thing and it's not exactly onerous at this time, so it seems fine by me.
I got rid of the need for Store in sample().
Personally I'm a bit wary of this because to me I wouldn't want to commit long-term to being able to capture a stack trace without any input. I'm not sure how we'll ever move away from those thread locals but I sure would love to one day if someone gets a good idea. In such a situation there's no way to start the backtrace without a contextual store being passed in. All that being said though it's not like we need to get the API 100% prefect to begin with, that's why we have major version bumps after all. I mostly wanted to put this out there, but I additionally think it's fine to land as-is with your current strategy. If and when we remove TLS (fingers crossed) we can figure out how to change this API at this point. I realize that passing the store into something stored within the store is not easy, so let's stick with what you've got.
jameysharp commented on issue #6282:
Thanks for the tips, Alex: that worked!
Okay, I've addressed most of the issues that you all have raised. What I haven't done is added documentation to the Wasmtime book, or API usage examples to the doc-comments. I'd kind of like to merge the implementation and then come back to that, if that's okay. I hope the doc-comments I've written so far will be a good start for the book, and the usage in
run.rs
will be a good start for the usage examples…There are also still several to-do items, like measuring CPU time and noting wasm entry/leave events, which I also want to defer to future work.
Last updated: Nov 22 2024 at 16:03 UTC