Is there a convenient way to see the disassembly of a function generated using cranelift-jit and craneleft-frontend? I can easily see the Cranelift IR using FunctionBuilder::display
, but I would like to see what instructions come out after optimization and compilation.
@Veverak there's no API to do this programmatically, but (i) clif-util -D
uses capstone
to show disassemblies of compilations of either .clif or .wasm inputs, and (ii) if you set your log level to 'trace' (RUST_LOG=trace
) with a binary that has log output set up (wasmtime
does, for example), you'll see a bunch of info fly by, including final VCode
for functions
also, if you're building your own JIT on top of Cranelift then it might be worthwhile to hook up capstone to show disassemblies, imho, so that you can get just that without all the other debug spew
(happy to take suggestions on ways we could improve the API here to provide something better!)
Can you provide a link to clif-util?
@Veverak it's a utility binary that is built with Cranelift; cargo build --release -p cranelift-tools
should give you target/release/clif-util
if you have a .clif file, you can run clif-util compile --target x86_64 -D file.clif
(or s/x86_64/aarch64/ if desired)
Using a powerful disassembler like Capstone seems a bit odd. I suppose Cranelift already has internal state that could be used to pretty-print the generated instructions with little effort without having to use reverse-engineering techniques.
Indeed, that's the VCode option I mentioned above
Currently the pretty-printing for that isn't exposed in the external API but I'm happy to look at a PR that does so!
the reason I didn't mention it as the first option is that it is not exactly 1:1 with the final machine code; for example there is late editing that happens in MachBuffer
to resolve control-flow and simplify branches. The only state that fully represents the final result is the machine code itself, so starting from that is truly the best option if you want the precise disassembly
I guess it doesn't matter to me whether it exactly represents the final machine code. I just like to get an idea of what optimizations are applied. Is there a way I can set up log output in my own JIT?
Yep, if you want to log all the code that is generated via the VCode pretty-printer, this log::trace!()
call is printing the output that you want; so, probably the best way to expose that is to either pass in a Cranelift option to emit at a higher log-level (so you don't have to see all the trace-level output, which is extremely verbose), or add an API to return a String
given the final compiled function
... and with that I am disappearing for now but I'm happy to review a PR to do one of the above if you decide to go that way!
I guess the part about simplifying branches is important after all. The VCode contains a lot of blocks that only contain one instruction that jumps to the next block. It would be nice to have some API to get the final optimized code but as structured data rather than a blob, and with a source map, allowing to make a tool like godbolt.org. I'm still getting used to Cranelift, so don't expect a PR from me, yet.
I agree that that would be a very nice facility to have! Unfortunately providing a "structured data" view of the final assembly is a bit beyond the design of the compiler backend: the final code emission intentionally does not build a data structure in memory that represents the (post-branch-simplification) code before emitting machine code, because it's not necessary and not doing so reduces the cost of emission.
This is why I recommended hooking up Capstone above if one wants the final disassembly -- to come back to what you said earlier:
Using a powerful disassembler like Capstone seems a bit odd. I suppose Cranelift already has internal state that could be used to pretty-print the generated instructions with little effort
it's exactly because we don't have this internal state that we're faster than otherwise at emitting code, but the tradeoff is that one has to pay more cost to build that internal state post-hoc if one wants it.
In theory one could use the debug location info we emit to make a Godbolt-like UI with correspondences to original source lines; that actually sounds like a really useful tool, if someone were to build it. Happy to help answer questions and/or work out ways to expose additional information as needed if you're hoping to do so!
It's interesting to hear that Cranelift has been made faster by not maintaining enough detail about the generated code to allow pretty-printing it. I obviously don't know all the details, but I suspect that in any case, there may be a clever way around this. Maybe instead of making the main state more descriptive, a layer of annotations could be kept besides it, and this layer can be turned off when not needed. I guess this is already how debug location info works, and that the format of this could be adjusted to allow listing the instructions at each byte without having to disassemble them. What's the API to get the debug location info?
Hi @Veverak, yes, it's possible we could do something like that!
I'll note that such a project would need some careful design in the way that it interacts with MachBuffer
(which is the code that mutates branches and removes redundant ones during binary emission), and would need some careful consideration in the way that it keeps in sync with the emitted binary code. I'd be happy to talk more design details with anyone who's interested in taking this up...
Some history might be useful context too. The original intent of MachInst
was to be more-or-less exactly the assembler-input data structure that you're requesting: one MachInst
for one machine instruction, with passes before final emission to get all the instructions into emittable shape. (Then the VCode pretty-printed output was exactly the final assembly.) In fact in the original aarch64 backend I was testing the VCode-emitted bytes by comparing to the gas
assembly of the VCode pretty-printing.
But two issues came up that caused the divergence:
MachBuffer
instead, which edits code in-place in a way that doesn't require an extra pass and is guaranteed to collapse long branch-chains. The impact of this though is that each MachInst
that is a branch is always an N-successor branch, and the conversion to a machine-style taken/fallthrough is only visible in the machine code.So, both efficiency and correctness concerns led us to this design; in other words, there are good reasons why we don't build intermediate data structures that exactly represent the final instructions before we emit them. (I'll note that "emit the bytes directly while traversing something" is not an uncommon design choice in JITs in general, too, for speed reasons.)
The last thing I would say is regarding software-maintenance overhead: if we guarantee we can pretty-print in a way that exactly corresponds to machine code, that's another thing we have to test, and get right whenever we add instructions or instruction sequences. Not the end of the world, but a cost nonetheless. Sometimes saying "just use Capstone if you need that" is actually the right decision from an overall cost perspective -- it significantly simplifies the overall design (the compiler just outputs bytes; pretty-printing is a separate bytes-to-text pipeline). "Speed of pretty-printing" was not a top-level goal when we designed the current backend; that probably helps illuminate some of the tradeoffs we have made :-)
Anyway, that's more-or-less my complete braindump on the topic; anyone who wants to suggest an improved design is very welcome to do so and I'd be happy to point to the relevant bits of code that would be involved in implementation!
It's an interesting potential improvement to keep in mind for the future, but for now, as you say, “just use Capstone if you need that”. When I try to do that however, I run into the same problem of not being able to access internal state. JITModule
has internal state telling the size of each function, but the public methods only allow getting the pointer to each function, not the size. Maybe there needs to be another form of get_finalized_function
which similarly to get_finalized_data
includes the size of the data in the return value?
@Veverak yes, that sounds like a reasonable API addition for JITModule
Last updated: Nov 22 2024 at 16:03 UTC