cfallin commented on issue #5382:
I'll see about splitting this into logical pieces in separate commits tomorrow; I wanted to get the rebased / cleaned-up state up first. I'll run some more benchmarks as well.
bjorn3 commented on issue #5382:
How are alias values printed in the textual format of clif ir? And can it be read back?
cfallin commented on issue #5382:
How are alias values printed in the textual format of clif ir? And can it be read back?
They currently aren't; because union values aren't present in the layout, the usual CLIF writer won't see them (it ignores all insts and values not "in the function" as per the layout). I agree it would be great to come up with a text format for egraph-stage CLIF; maybe as a followup to the initial PR?
bjorn3 commented on issue #5382:
How are alias values printed in the textual format of clif ir? And can it be read back?
They currently aren't; because union values aren't present in the layout, the usual CLIF writer won't see them (it ignores all insts and values not "in the function" as per the layout). I agree it would be great to come up with a text format for egraph-stage CLIF; maybe as a followup to the initial PR?
Sure.
cfallin commented on issue #5382:
Updated and addressed everything, I think! (Left the one thread about multi-result instructions open; that's something we'll want to address somehow eventually, but I think isn't a quick/easy thing to address in this PR.) Thanks for the review!
cfallin commented on issue #5382:
Interesting benchmark discovery:
- This PR's work branched from main on Nov 19, which was between #5231 landing and #5335 landing, i.e., when we accidentally had turned on explicit bounds-checks unconditionally;
- On the
egraph-in-dfg
branch in my fork (this PR's original branch), I continue to see the speedups above;- On the rebase (
egraph-in-dfg-rebase
in my fork), the speedups disappear into the noise.This seems to indicate to me that on worse code (explicit bounds checks), this approach is a little more efficient, while otherwise it's more-or-less equivalent.
Given the two other benefits of flexibility (writing rewrite rules in a simple way) and verifiability, on balance I suspect this will still make sense to turn on by default once fuzz-clean. (I'm doing a followup PR to add it to fuzzing configs.) But I'm very open to discussion on that; one alternative is to keep adding rewrite rules until we see a very clear code-quality (runtime) win.
bjorn3 commented on issue #5382:
I just did some benchmarking of egraphs and the perf improvement is huge on the benchmark I tried:
Benchmark 1: ./raytracer_cg_clif Time (mean ± σ): 8.553 s ± 0.010 s [User: 8.539 s, System: 0.014 s] Range (min … max): 8.543 s … 8.568 s 10 runs Benchmark 2: ./raytracer_cg_clif_egraph Time (mean ± σ): 6.068 s ± 0.017 s [User: 6.057 s, System: 0.011 s] Range (min … max): 6.047 s … 6.108 s 10 runs Benchmark 3: ./raytracer_cg_clif_release Time (mean ± σ): 6.450 s ± 0.021 s [User: 6.439 s, System: 0.012 s] Range (min … max): 6.410 s … 6.482 s 10 runs Benchmark 4: ./raytracer_cg_clif_release_egraph Time (mean ± σ): 5.853 s ± 0.053 s [User: 5.841 s, System: 0.012 s] Range (min … max): 5.779 s … 5.908 s 10 runs Summary './raytracer_cg_clif_release_egraph' ran 1.04 ± 0.01 times faster than './raytracer_cg_clif_egraph' 1.10 ± 0.01 times faster than './raytracer_cg_clif_release' 1.46 ± 0.01 times faster than './raytracer_cg_clif'
(release uses cranelift's speed_and_size opt_level as well as mir inlining and other rustc mir optimizations)
Last updated: Jan 24 2025 at 00:11 UTC