Hello!
What would involve making Cranelift support PGO? Would the current infrastructure with perf/vtune suffice?
About the second question, I think perf and vtune would -- with my understanding -- but currently are only capable of profiling JIT compiled modules. Actually, using libcalls, making a simple profiling scheme wouldn't be that bad, but probably redundant.
What do you think?
it depends what kind of transforms you wanted to do informed by the profiling
probably you would want to have a profiling mode of compilation that inserts counters for which way conditional branches go, what the function pointer is used as the callee of indirect calls, etc
and then you need a system for mapping the results of those counters back into CLIF so that you can actually use the results in the following PGO build
Yes, exactly. I don't know Cranelift too well, I've only read a tiny bit of its code. Do all of these have to be written?
Anyway, seems like I misunderstood the code. It looks like the vtune feature only profiles the backend itself, and not the code it is running. If that's actually the case and I'm double-misunderstanding. Then my plan would be to have a profiling pass, and make some libcalls to initialize the profiling and log them (there you don't have to map back to CLIF, because the exact spots would be marked for you). Then the PGO passes would use that metadata
The profiling pass would, as you said, just add counters of every function and branch etc
none of these things exist today, it all would need to be implemented
I do think there's one small misunderstanding; Wasmtime's profiling support does profile the code that Cranelift generates, not just the implementation of Wasmtime and Cranelift themselves.
If so, https://docs.wasmtime.dev/examples-profiling-vtune.html isn't updated (apart from me not grasping the codebase yet)
@ghostway can you point to where things are out of date? AFAICT that page does describe profiling the generated code (see e.g. the screenshot where "dynamic code" is on the stack)
Yes, the "dynamic code" exists, but I'd imagine you can't browse what's in it? You don't have debug symbols it can open AFAIK. But I'd like to be wrong
I've seen it work before -- it might require DWARF to be turned on for line numbers, but at least function names are present
(in perf in my case, but the screenshot of VTune seems to show a function name "fib")
Oh, indeed they do! I've missed it. If it is possible to get from that to clif and it logs all the necessary things (branches etc), then that would suffice
going to CLIF is one extra step; in principle it seems like it should be possible to carry markers of some sort through, though
If you were to make something like llvm's compiler-rt, do you have to take that trip from rust->wasm->clif (for sanitizers etc)? For profiling (if what's present currently isn't enough), you'd have to compile to wasi (which isn't that bad, probably)
Anyway, profiling in AOT would require that library/cranelift-rt (a cool name, isn't it?) to access internal structures of, for example, where the profiling data is (probably in a special section, but it has to get its address). So this might require some big changes. Or, only support JIT profiling which then can use VMContext and forward the data from there
I'm not sure I understand this question.
With cg-clif you can go directly from rust->clif. If you need something like compiler-rt there, you can emit additional clif functions and call them. If clif can't do what you need, you can use some form of linker to pull in native code.
I don't know what profiling should look like with cg-clif. I think it might be possible to implement PGO entirely inside cg-clif, without any help from Cranelift. I think all of Wasmtime's current profiling support is also outside of Cranelift.
If you're looking at a VMContext then I think you're looking at Wasmtime specifically, which passes a VM context struct pointer to every function it emits whether in JIT or AOT mode. (Even compiled AOT, Wasmtime-generated binaries can't run outside of the Wasmtime runtime.)
With cg-clif (I imagine you mean the Cranelift backend for rust) it would be a lot easier -- I just ruled it out as something that shouldn't be used. I don't think, abstractly, you should implement PGO in the frontend. It would create some spaghetti.
Does cg-clif respect the #[link_section] attribute (it does!)? And if so, does it in wasm map to a module (that I can then get the data from)? I tried to do this, but it actually didn't work (bug? I couldn't even see the incrementation and the module, so it's a no?)! It didn't increment the global static variable.
About the VMContext thing, that's cool! Already implemented something there (just a vector of counters in StoreData) but I haven't got that tested, and probably won't be able to work on it this week
Cranelift currently doesn't have a wasm backend. Only a wasm frontend.
Rust's #[link_section]
doesn't map to anything in webassembly except in linking custon section which only exists in wasm object files to be passed to a linker and not in the final wasm module. Instead rust has the #[wasm_import_module]
attribute to map to imported modules.
And yes, cg_clif is the cranelift backend for rustc.
Last updated: Jan 24 2025 at 00:11 UTC