Comparisons between cranelift and LLVM? · cranelift

Has there been any benchmarking of cranelift against LLVM recently? I know maintainers are reluctant to make these kinds of comparisons, and that it's easy to compare apples against oranges, etc; but at the same time, short build times are an explicit goal of the project and build time improvements are often mentioned in progress reports, so it makes sense to wonder how they stack up against the state of the art.

The only data I can find is the often-cited 20-30% figure from bjorn3 from 2020 (no associated benchmarks) and this arxiv paper from 2021 mentioned on the Cranelift readme.

Add cg_clif as optional codegen backend by bjorn3 · Pull Request #77975 · rust-lang/rust

Rustc_codegen_cranelift is an alternative codegen backend for rustc based on Cranelift. It has the potential to improve compilation times in debug mode. In my experience the compile time improvemen...

bjorn3 (Oct 10 2022 at 12:27):

Compiling https://github.com/ebobby/simple-raytracer/ from scratch in debug mode is what I used as benchmark for that ~20-30% figure. This covers the full compilation. The time spent in the codegen backend is about half of that.

GitHub - ebobby/simple-raytracer: Simple ray tracing project in Rust to learn both Rust and the algorithms and math for raytracing.

Simple ray tracing project in Rust to learn both Rust and the algorithms and math for raytracing. - GitHub - ebobby/simple-raytracer: Simple ray tracing project in Rust to learn both Rust and the a...

Chris Fallin (Oct 10 2022 at 16:07):

@Olivier FAURE most of my work in this area has been driven by comparisons against SpiderMonkey, rather than LLVM. We're faster than LLVM (sometimes by a lot) so it's somewhat less interesting -- "hey great, we won!" -- whereas SM is currently 2-3x faster at compiling the same Wasm module as Wasmtime-with-Cranelift, so there's a lot to learn

Chris Fallin (Oct 10 2022 at 16:08):

there are active efforts to dig further into this and drive improvements based on it

Olivier FAURE (Oct 11 2022 at 11:00):

Faster for code of equivalent quality? I'm not sure how much that's been measured.

I think it's still interesting for cg_clif adoption. If people know that Cranelift is X% faster than LLVM for equivalent code, there'll be more enthusiasm for the project.

Chris Fallin (Oct 11 2022 at 17:39):

I don't think it's very realistic to expect LLVM-quality code out of a JIT-speed compiler; it's worth aiming for, and getting as close as we can, but we'll never hit that threshold in all likelihood. And given that we can't generate LLVM-quality code today, we can't measure that datapoint.

I do agree it would be fantastic if it existed, but again, it's not a very realistic expectation, IMHO. We can possibly push the "code goodness per unit of compile time" efficiency metric further, through smart algorithms; but LLVM has engineer-decades of optimization work poured into it, and there is no shortcut to getting all of the edge cases and isel details and niche optimizations right.

There's the paper cited in cranelift/README.md. I'm not aware of anyone running continuous/up-to-date benchmarks, but it's not so hard to do oneself (write a JS wrapper that loads a Wasm module and run it in the SpiderMonkey shell, vs. wasmtime compile).

Andrew Brown (Oct 11 2022 at 23:27):

@Olivier FAURE, I've done some of those comparisons over in https://github.com/bytecodealliance/sightglass. It currently only measures Wasmtime in main but you may be interested in a PR I submitted to also measure V8: https://github.com/bytecodealliance/sightglass/pull/166. Though unfinished, that is most of the way there to get some numbers. @Yury Delendik also has a similar SpiderMonkey patch but that has not been submitted as a PR yet.

GitHub - bytecodealliance/sightglass: A benchmark suite and tool to compare different implementations of the same primitives.

A benchmark suite and tool to compare different implementations of the same primitives. - GitHub - bytecodealliance/sightglass: A benchmark suite and tool to compare different implementations of th...

Add a V8 engine by abrown · Pull Request #166 · bytecodealliance/sightglass

This change adds the beginnings of a new V8 engine to Sightglass. It uses V8's libwee8 library as the backing engine and constructs a libengine.so in C++ that is compatible with Sightglass. As-is, ...

Petr Penzin (Oct 12 2022 at 01:16):

As an example, there is report on using LEA instruction for arithmetic on x86. I am curious what would be the best way to measure effects of change that would implement this suggestion today? Does sightglass have universal benchmarks that would cover this? And how much would wasmtime-bench-api affect the results:

[cranelift] Avoid 64-bit imul_imm if possible on all architectures · Issue #4686 · bytecodealliance/wasmtime

Feature Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes 3 cycle latency, 1c throughput which not always faster lea + shl / add combinat...

GitHub - bytecodealliance/sightglass: A benchmark suite and tool to compare different implementations of the same primitives.

Chris Fallin (Oct 12 2022 at 01:24):

Re: how to measure -- Sightglass benchmarks are a good start, yeah. The main thing that we've found Sightglass to do that perturbs results in undesirable ways is its randomized allocator; you can build bench-api without that (turn off the Cargo feature; I forget the exact incantation offhand). One has to be pretty careful with variance otherwise -- we've found that limiting to one thread, pinning to a single core, and doing all the other usual system-quieting things for benchmarking (disable frequency scaling and hyperthreading, etc) is necessary to get good results. Sightglass will otherwise happily tell you "no statistically significant difference" when there is a small swing but it's buried in noise.

Olivier FAURE (Oct 12 2022 at 09:08):

To be clear, I'm not asking whether Cranelift can beat LLVM at -O1 or -O2 (though I'm not sure why you're being so pessimistic; when LLVM came out, GCC was the compiler with a decade of history behind it, and it only took them 10 years to catch up). But I do wonder if Cranelift can generate better code than LLVM at -O0? Because if it can't, then yeah, there's not much of a point of comparing between Cranelift and LLVM.

I mean, ultimately the hope is for a Rust backend that produces roughly usable binaries much faster than LLVM. I don't know if "roughly usable" necessarily means "on par with LLVM -O0" or if you can go even lower though.

fitzgen (he/him) (Oct 12 2022 at 12:45):

No one has done any sort of in-depth comparison with LLVM at -O0 AFAIK. Closest would probably be whatever benchmarking @bjorn3 has done with cg_clif mentioned upthread, but I don't know the details of that and whether that was comparing against LLVM -O0 or what. I think that, unfortunately, the answers to your questions mostly don't exist because no one has done the work to design the experiment and gather the data.

bjorn3 (Oct 12 2022 at 12:47):

Chris Fallin (Oct 12 2022 at 16:00):

@Olivier FAURE thanks, that makes much more sense; wasn't clear to me that you had meant "unoptimizing LLVM"! This is indeed an interesting data point; actually I'd be curious about both optimizing Cranelift vs unoptimizing LLVM (probably reasonably improved code, and maybe comparable compile time?) and unoptimizing Cranelift vs unoptimizing LLVM (cg_clif comparisons are this one)

Carlo Kok (Oct 12 2022 at 18:05):

I do think that there's quite a lot of "cheap" optimizations that can be done and keep it fast enough for jjt purposes

Stream: cranelift

Topic: Comparisons between cranelift and LLVM?

Olivier FAURE (Oct 10 2022 at 11:24):

bjorn3 (Oct 10 2022 at 12:27):

Chris Fallin (Oct 10 2022 at 16:07):

Chris Fallin (Oct 10 2022 at 16:08):

Chris Fallin (Oct 10 2022 at 16:08):

Olivier FAURE (Oct 11 2022 at 11:00):

Chris Fallin (Oct 11 2022 at 17:39):

Andrew Brown (Oct 11 2022 at 23:27):

Petr Penzin (Oct 12 2022 at 01:16):

Chris Fallin (Oct 12 2022 at 01:24):

Olivier FAURE (Oct 12 2022 at 09:08):

fitzgen (he/him) (Oct 12 2022 at 12:45):

bjorn3 (Oct 12 2022 at 12:47):

Chris Fallin (Oct 12 2022 at 16:00):

Carlo Kok (Oct 12 2022 at 18:05):