Has there been any benchmarking of cranelift against LLVM recently? I know maintainers are reluctant to make these kinds of comparisons, and that it's easy to compare apples against oranges, etc; but at the same time, short build times are an explicit goal of the project and build time improvements are often mentioned in progress reports, so it makes sense to wonder how they stack up against the state of the art.
The only data I can find is the often-cited 20-30% figure from bjorn3 from 2020 (no associated benchmarks) and this arxiv paper from 2021 mentioned on the Cranelift readme.
Are there any more recent benchmarks to be found?
Compiling https://github.com/ebobby/simple-raytracer/ from scratch in debug mode is what I used as benchmark for that ~20-30% figure. This covers the full compilation. The time spent in the codegen backend is about half of that.
@Olivier FAURE most of my work in this area has been driven by comparisons against SpiderMonkey, rather than LLVM. We're faster than LLVM (sometimes by a lot) so it's somewhat less interesting -- "hey great, we won!" -- whereas SM is currently 2-3x faster at compiling the same Wasm module as Wasmtime-with-Cranelift, so there's a lot to learn
(faster build time, to be clear, not code quality)
there are active efforts to dig further into this and drive improvements based on it
We're faster than LLVM (sometimes by a lot) so it's somewhat less interesting
Faster for code of equivalent quality? I'm not sure how much that's been measured.
I think it's still interesting for cg_clif adoption. If people know that Cranelift is X% faster than LLVM for equivalent code, there'll be more enthusiasm for the project.
most of my work in this area has been driven by comparisons against SpiderMonkey
Are there Cranelift vs SpiderMonkey (vs V8 vs LLVM) benchmarks?
Olivier FAURE said:
We're faster than LLVM (sometimes by a lot) so it's somewhat less interesting
Faster for code of equivalent quality? I'm not sure how much that's been measured.
No, definitely not, that's a much stronger claim than what I had said :-)
I don't think it's very realistic to expect LLVM-quality code out of a JIT-speed compiler; it's worth aiming for, and getting as close as we can, but we'll never hit that threshold in all likelihood. And given that we can't generate LLVM-quality code today, we can't measure that datapoint.
I think it's still interesting for cg_clif adoption. If people know that Cranelift is X% faster than LLVM for equivalent code, there'll be more enthusiasm for the project.
I do agree it would be fantastic if it existed, but again, it's not a very realistic expectation, IMHO. We can possibly push the "code goodness per unit of compile time" efficiency metric further, through smart algorithms; but LLVM has engineer-decades of optimization work poured into it, and there is no shortcut to getting all of the edge cases and isel details and niche optimizations right.
Are there Cranelift vs SpiderMonkey (vs V8 vs LLVM) benchmarks?
There's the paper cited in cranelift/README.md. I'm not aware of anyone running continuous/up-to-date benchmarks, but it's not so hard to do oneself (write a JS wrapper that loads a Wasm module and run it in the SpiderMonkey shell, vs. wasmtime compile
).
@Olivier FAURE, I've done some of those comparisons over in https://github.com/bytecodealliance/sightglass. It currently only measures Wasmtime in main
but you may be interested in a PR I submitted to also measure V8: https://github.com/bytecodealliance/sightglass/pull/166. Though unfinished, that is most of the way there to get some numbers. @Yury Delendik also has a similar SpiderMonkey patch but that has not been submitted as a PR yet.
Chris Fallin said:
Olivier FAURE said:
Faster for code of equivalent quality? I'm not sure how much that's been measured.
No, definitely not, that's a much stronger claim than what I had said :-)
As an example, there is report on using LEA instruction for arithmetic on x86. I am curious what would be the best way to measure effects of change that would implement this suggestion today? Does sightglass have universal benchmarks that would cover this? And how much would wasmtime-bench-api
affect the results:
The
wasmtime-bench-api
intentionally does things that will likely hurt its absolute performance numbers but which help us more easily get statistically meaningful results, like randomizing the locations of heap allocations.
(taken from This is NOT a General-Purpose WebAssembly Benchmark Suite section in sightglass' README)
Re: how to measure -- Sightglass benchmarks are a good start, yeah. The main thing that we've found Sightglass to do that perturbs results in undesirable ways is its randomized allocator; you can build bench-api without that (turn off the Cargo feature; I forget the exact incantation offhand). One has to be pretty careful with variance otherwise -- we've found that limiting to one thread, pinning to a single core, and doing all the other usual system-quieting things for benchmarking (disable frequency scaling and hyperthreading, etc) is necessary to get good results. Sightglass will otherwise happily tell you "no statistically significant difference" when there is a small swing but it's buried in noise.
Chris Fallin said:
Olivier FAURE said:
Faster for code of equivalent quality? I'm not sure how much that's been measured.
No, definitely not, that's a much stronger claim than what I had said :-)
I feel like we're talking past each other a bit.
To be clear, I'm not asking whether Cranelift can beat LLVM at -O1
or -O2
(though I'm not sure why you're being so pessimistic; when LLVM came out, GCC was the compiler with a decade of history behind it, and it only took them 10 years to catch up). But I do wonder if Cranelift can generate better code than LLVM at -O0
? Because if it can't, then yeah, there's not much of a point of comparing between Cranelift and LLVM.
I mean, ultimately the hope is for a Rust backend that produces roughly usable binaries much faster than LLVM. I don't know if "roughly usable" necessarily means "on par with LLVM -O0" or if you can go even lower though.
No one has done any sort of in-depth comparison with LLVM at -O0
AFAIK. Closest would probably be whatever benchmarking @bjorn3 has done with cg_clif
mentioned upthread, but I don't know the details of that and whether that was comparing against LLVM -O0
or what. I think that, unfortunately, the answers to your questions mostly don't exist because no one has done the work to design the experiment and gather the data.
Yeah, that was comparing against rustc with LLVM -O0.
@Olivier FAURE thanks, that makes much more sense; wasn't clear to me that you had meant "unoptimizing LLVM"! This is indeed an interesting data point; actually I'd be curious about both optimizing Cranelift vs unoptimizing LLVM (probably reasonably improved code, and maybe comparable compile time?) and unoptimizing Cranelift vs unoptimizing LLVM (cg_clif comparisons are this one)
I do think that there's quite a lot of "cheap" optimizations that can be done and keep it fast enough for jjt purposes
Last updated: Nov 22 2024 at 16:03 UTC