Hey! I'm using Cranelift as a backend for a language I'm working on. I also have an LLVM backend (via Inkwell). While messing around with some test code, I noticed that for a very large function (10000+ lines), my Cranelift backend takes longer (4.0s) to generate code than my LLVM backend (1.5s). Here's my crude test case source, which continues on for about 10000 lines:
Screenshot-2022-11-09-032154.png
Which results in CLIR that looks like this:
Screenshot-2022-11-09-032240.png
And LLVM IR that looks like this:
Screenshot-2022-11-09-032702.png
I'm wondering if there might be something very wrong with my code generation process or there are some best practices I might have missed that could account for the difference. Most of the 4 seconds is spent in ObjectModule::define_function
. Is this sort of large inline function a special case?
define_function
is the function that does the actual compilation from clif ir to machine code. I believe you can get the timings of individual compilation passes using println!("{}", cranelift_codegen::timing::take_current())
after the define_function
call.
Could you open a GitHub issue and attach the generated CLIF? We've been working on compile-time performance improvements and it sounds like this would be an interesting test case to investigate.
Here are the timing results:
define_function main: 3.4641968s
======== ======== ==================================
Total Self Pass
-------- -------- ----------------------------------
1.090 0.696 Verify Cranelift IR
0.314 0.314 Verify CPU flags
6.916 0.161 Compilation passes
0.040 0.040 Control flow graph
0.060 0.060 Dominator tree
0.000 0.000 Remove unreachable blocks
0.009 0.009 Remove constant phi-nodes
0.466 0.466 VCode lowering
0.225 0.225 VCode emission
0.001 0.001 VCode emission finalization
1.487 1.487 Register allocation
======== ======== ==================================
And here's the GitHub issue: #5237
Last updated: Jan 24 2025 at 00:11 UTC