Stream: cranelift

Topic: Comparisons between cranelift and LLVM?


view this post on Zulip Olivier FAURE (Oct 10 2022 at 11:24):

Has there been any benchmarking of cranelift against LLVM recently? I know maintainers are reluctant to make these kinds of comparisons, and that it's easy to compare apples against oranges, etc; but at the same time, short build times are an explicit goal of the project and build time improvements are often mentioned in progress reports, so it makes sense to wonder how they stack up against the state of the art.

The only data I can find is the often-cited 20-30% figure from bjorn3 from 2020 (no associated benchmarks) and this arxiv paper from 2021 mentioned on the Cranelift readme.

Are there any more recent benchmarks to be found?

Rustc_codegen_cranelift is an alternative codegen backend for rustc based on Cranelift. It has the potential to improve compilation times in debug mode. In my experience the compile time improvemen...

view this post on Zulip bjorn3 (Oct 10 2022 at 12:27):

Compiling https://github.com/ebobby/simple-raytracer/ from scratch in debug mode is what I used as benchmark for that ~20-30% figure. This covers the full compilation. The time spent in the codegen backend is about half of that.

Simple ray tracing project in Rust to learn both Rust and the algorithms and math for raytracing. - GitHub - ebobby/simple-raytracer: Simple ray tracing project in Rust to learn both Rust and the a...

view this post on Zulip Chris Fallin (Oct 10 2022 at 16:07):

@Olivier FAURE most of my work in this area has been driven by comparisons against SpiderMonkey, rather than LLVM. We're faster than LLVM (sometimes by a lot) so it's somewhat less interesting -- "hey great, we won!" -- whereas SM is currently 2-3x faster at compiling the same Wasm module as Wasmtime-with-Cranelift, so there's a lot to learn

view this post on Zulip Chris Fallin (Oct 10 2022 at 16:08):

(faster build time, to be clear, not code quality)

view this post on Zulip Chris Fallin (Oct 10 2022 at 16:08):

there are active efforts to dig further into this and drive improvements based on it

view this post on Zulip Olivier FAURE (Oct 11 2022 at 11:00):

We're faster than LLVM (sometimes by a lot) so it's somewhat less interesting

Faster for code of equivalent quality? I'm not sure how much that's been measured.

I think it's still interesting for cg_clif adoption. If people know that Cranelift is X% faster than LLVM for equivalent code, there'll be more enthusiasm for the project.

most of my work in this area has been driven by comparisons against SpiderMonkey

Are there Cranelift vs SpiderMonkey (vs V8 vs LLVM) benchmarks?

view this post on Zulip Chris Fallin (Oct 11 2022 at 17:39):

Olivier FAURE said:

We're faster than LLVM (sometimes by a lot) so it's somewhat less interesting

Faster for code of equivalent quality? I'm not sure how much that's been measured.

No, definitely not, that's a much stronger claim than what I had said :-)

I don't think it's very realistic to expect LLVM-quality code out of a JIT-speed compiler; it's worth aiming for, and getting as close as we can, but we'll never hit that threshold in all likelihood. And given that we can't generate LLVM-quality code today, we can't measure that datapoint.

I think it's still interesting for cg_clif adoption. If people know that Cranelift is X% faster than LLVM for equivalent code, there'll be more enthusiasm for the project.

I do agree it would be fantastic if it existed, but again, it's not a very realistic expectation, IMHO. We can possibly push the "code goodness per unit of compile time" efficiency metric further, through smart algorithms; but LLVM has engineer-decades of optimization work poured into it, and there is no shortcut to getting all of the edge cases and isel details and niche optimizations right.

Are there Cranelift vs SpiderMonkey (vs V8 vs LLVM) benchmarks?

There's the paper cited in cranelift/README.md. I'm not aware of anyone running continuous/up-to-date benchmarks, but it's not so hard to do oneself (write a JS wrapper that loads a Wasm module and run it in the SpiderMonkey shell, vs. wasmtime compile).

view this post on Zulip Andrew Brown (Oct 11 2022 at 23:27):

@Olivier FAURE, I've done some of those comparisons over in https://github.com/bytecodealliance/sightglass. It currently only measures Wasmtime in main but you may be interested in a PR I submitted to also measure V8: https://github.com/bytecodealliance/sightglass/pull/166. Though unfinished, that is most of the way there to get some numbers. @Yury Delendik also has a similar SpiderMonkey patch but that has not been submitted as a PR yet.

A benchmark suite and tool to compare different implementations of the same primitives. - GitHub - bytecodealliance/sightglass: A benchmark suite and tool to compare different implementations of th...
This change adds the beginnings of a new V8 engine to Sightglass. It uses V8's libwee8 library as the backing engine and constructs a libengine.so in C++ that is compatible with Sightglass. As-is, ...

view this post on Zulip Petr Penzin (Oct 12 2022 at 01:16):

Chris Fallin said:

Olivier FAURE said:

Faster for code of equivalent quality? I'm not sure how much that's been measured.

No, definitely not, that's a much stronger claim than what I had said :-)

As an example, there is report on using LEA instruction for arithmetic on x86. I am curious what would be the best way to measure effects of change that would implement this suggestion today? Does sightglass have universal benchmarks that would cover this? And how much would wasmtime-bench-api affect the results:

The wasmtime-bench-api intentionally does things that will likely hurt its absolute performance numbers but which help us more easily get statistically meaningful results, like randomizing the locations of heap allocations.

(taken from This is NOT a General-Purpose WebAssembly Benchmark Suite section in sightglass' README)

Feature Not all architectures has a fast 64-bit imul + imm. But even on modern like SnB-family and AMD Ryzen it takes 3 cycle latency, 1c throughput which not always faster lea + shl / add combinat...
A benchmark suite and tool to compare different implementations of the same primitives. - GitHub - bytecodealliance/sightglass: A benchmark suite and tool to compare different implementations of th...

view this post on Zulip Chris Fallin (Oct 12 2022 at 01:24):

Re: how to measure -- Sightglass benchmarks are a good start, yeah. The main thing that we've found Sightglass to do that perturbs results in undesirable ways is its randomized allocator; you can build bench-api without that (turn off the Cargo feature; I forget the exact incantation offhand). One has to be pretty careful with variance otherwise -- we've found that limiting to one thread, pinning to a single core, and doing all the other usual system-quieting things for benchmarking (disable frequency scaling and hyperthreading, etc) is necessary to get good results. Sightglass will otherwise happily tell you "no statistically significant difference" when there is a small swing but it's buried in noise.

view this post on Zulip Olivier FAURE (Oct 12 2022 at 09:08):

Chris Fallin said:

Olivier FAURE said:

Faster for code of equivalent quality? I'm not sure how much that's been measured.

No, definitely not, that's a much stronger claim than what I had said :-)

I feel like we're talking past each other a bit.

To be clear, I'm not asking whether Cranelift can beat LLVM at -O1 or -O2 (though I'm not sure why you're being so pessimistic; when LLVM came out, GCC was the compiler with a decade of history behind it, and it only took them 10 years to catch up). But I do wonder if Cranelift can generate better code than LLVM at -O0? Because if it can't, then yeah, there's not much of a point of comparing between Cranelift and LLVM.

I mean, ultimately the hope is for a Rust backend that produces roughly usable binaries much faster than LLVM. I don't know if "roughly usable" necessarily means "on par with LLVM -O0" or if you can go even lower though.

view this post on Zulip fitzgen (he/him) (Oct 12 2022 at 12:45):

No one has done any sort of in-depth comparison with LLVM at -O0 AFAIK. Closest would probably be whatever benchmarking @bjorn3 has done with cg_clif mentioned upthread, but I don't know the details of that and whether that was comparing against LLVM -O0 or what. I think that, unfortunately, the answers to your questions mostly don't exist because no one has done the work to design the experiment and gather the data.

view this post on Zulip bjorn3 (Oct 12 2022 at 12:47):

Yeah, that was comparing against rustc with LLVM -O0.

view this post on Zulip Chris Fallin (Oct 12 2022 at 16:00):

@Olivier FAURE thanks, that makes much more sense; wasn't clear to me that you had meant "unoptimizing LLVM"! This is indeed an interesting data point; actually I'd be curious about both optimizing Cranelift vs unoptimizing LLVM (probably reasonably improved code, and maybe comparable compile time?) and unoptimizing Cranelift vs unoptimizing LLVM (cg_clif comparisons are this one)

view this post on Zulip Carlo Kok (Oct 12 2022 at 18:05):

I do think that there's quite a lot of "cheap" optimizations that can be done and keep it fast enough for jjt purposes


Last updated: Oct 23 2024 at 20:03 UTC