Hi! I've tried the jitdump support in our embedding today, and I can say that something works \o/ I can profile with perf record -k 1
, then see some guest symbols show up in hotspot caller/callee trees :partying_face:
However, I'm seeing very little, and most time is still spent under unknown symbols. Our embedding will load multiple wasm modules at the same time, and it seems that only one wasm module's symbols will be injected into the jitdump, as most of the information from other modules isn't available at all.
Is that expected/known? (I will investigate a bit more tomorrow, if no one knows!)
I know at least personally I've never exercised multiple modules, but otherwise with perf
I rarely see unknown symbols
so my guess is you're running into a bug with multiple modules somehow (not that I know how though)
I've found the reason: we're building one Engine
per wasm module, and each jitdump profiler instance will recreate (thus clobber) the jitdump file with the same id every time. I'll look into reusing the same Engine
in our codebase, but wonder if we shouldn't still support having multiple jitdumps, or reuse the same jitdump profiler instance across all Engine
s instances.
Oh good point! I'd lean towards unique files for each engine if that works but one global file also seems reasonable
Ok, i've tried suffixing the jitdump file name with a globally incremented atomic counter... then perf doesn't pick it up anymore. I've tried looking into why, if it was related to the mmapping done so that perf detects those files, but with no success. So I've used a single jitdump file for the entire process, and it seems to work now!
Ah looks like this is required by perf
to have only one file per process since given any one process it'll only recognize one filename
Ok, after running for 5 hours and not completing (at 15% utilization of a single core :pensive:) the perf inject
step is definitely taking too long, unfortunately. Either need to rewrite it in Rust and make it massively parallel, or I shall try vtune instead.
It generated 179153 .so files lol
Yeah it generates a *.so
per-jit-function, not exactly efficient...
Wasn't there another dumb way to map code regions to symbols? I do recall something like this for Spidermonkey, where we'd generate a very simplistic file. At the scale of our wasm module, having a dumb mode like this would be pretty sweet :thinking:
I'm not aware myself, but it there's an easier option than perf
I think that'd be great to have implemented!
Oh I was talking about the simpler perf
support that just assigns symbol names to code regions, with little granularity. Basically a file that contains lines that are {function start address} {code size} {symbol name}
. That was used in the past in Spidermonkey (now SM also uses jitdump!), and I recall it was pretty effective, especially when there are many functions, so I'm tempted to try to implement this, as an additional profiling agent impl.
Cranelift-jit already supports it. This format doesn't handle reusing memory locations between functions though, so you did have to leak all modules.
Interesting, thanks. Turns out wasmtime-jit doesn't use cranelift-jit, the format is simple enough that it might be fun trying it out separately.
Last updated: Nov 22 2024 at 16:03 UTC