Stream: cranelift

Topic: Thoughts about the TDPE paper?


view this post on Zulip Olivier FAURE (Jun 07 2025 at 09:49):

TPDE: A Fast Adaptable Compiler Back-End Framework was recently published.

Fast machine code generation is especially important for fast start-up just-in-time compilation, where the compilation time is part of the end-to-end latency. However, widely used compiler frameworks like LLVM do not prioritize fast compilation and require an extra IR translation step increasing latency even further; and rolling a custom code generator is a substantial engineering effort, especially when targeting multiple architectures. Therefore, in this paper, we present TPDE, a compiler back-end framework that adapts to existing code representations in SSA form. Using an IR-specific adapter providing canonical access to IR data structures and a specification of the IR semantics, the framework performs one analysis pass and then performs the compilation in just a single pass, combining instruction selection, register allocation, and instruction encoding. The generated target instructions are primarily derived code written in high-level language through LLVM's Machine IR, easing portability to different architectures while enabling optimizations during code generation. To show the generality of our framework, we build a new back-end for LLVM from scratch targeting x86-64 and AArch64. Performance results on SPECint 2017 show that we can compile LLVM-IR 8--24x faster than LLVM -O0 while being on-par in terms of run-time performance. We also demonstrate the benefits of adapting to domain-specific IRs in JIT contexts, particularly WebAssembly and database query compilation, where avoiding the extra IR translation further reduces compilation latency.

view this post on Zulip Olivier FAURE (Jun 07 2025 at 09:51):

6.2.1 Setup. We evaluate the performance of the Cranelift back-end by measuring compile- and run-time on the three default benchmarks in Wasmtime’s own benchmark suite Sightglass [10] and PolyBench [31 ]. We compare this against Cranelift with its backtracking and single pass register allocator, both without any IR optimizations, and Winch. [...]
6.2.2 Results. Figure 9 shows the results. The TPDE-based back-end compiles 4.27x faster than Cranelift and 2.68x faster than Cranelift with its fast register allocator, but is 1.74x slower than Winch. [...] The run-time performance of TPDE-generated code is faster than both Winch and Cranelift with its fast register allocator (1.14x and 1.31x respectively), but 1.64x slower than Cranelift with its default backtracking register allocator. This shows that a more sophisticated register allocation heuristic is likely to substantially improve the run-time performance.

view this post on Zulip Chris Fallin (Jun 10 2025 at 15:17):

@Olivier FAURE yes, those numbers are interesting; my main takeaway was that good regalloc is important (1.64x faster code from Cranelift over TPDE) :-) TDPE and Cranelift serve different design points -- both are useful. The more interesting part of the project is the general IR consumer interface -- that's pretty neat

view this post on Zulip Chris Fallin (Jun 10 2025 at 15:17):

Did you have specific questions or thoughts?


Last updated: Dec 06 2025 at 06:05 UTC