I've been looking at how Rust emits Ord::cmp
for primitives, and I think I can turn some of that exploration into a nice Cranelift PR, thanks to ISLE. Basically, Rust ends up with lots of code that could benefit from roughly
(rule (simplify (icmp ty eq (spaceship ty x y) 0)))
(icmp ty eq x y))
(And similarly for (a <=> b) == 1
and (a <=> b) > 0
and such.)
I wanted to check in for approach feedback before just showing up with something, though :)
Would (an) instruction(s) for this be reasonable? It needs signed and unsigned versions, so my first thought was to propose a new IntCC, but that looks like it's probably a horrible idea, since various things assume that icmp
returns {0, 1}
, and updating all those places doesn't seem like a net win. Strawman: add scmp3
and ucmp3
instructions that work over integer types and return {-1, 0, 1}
in I8
.
Alternatively, would it be better to start with making this a ISLE-only thing? I don't really know what's possible, but having it just be something that a simplify
rule can generate, rather than a "real" instruction, would probably be fine. This wants to have a couple dozen simplify
patterns to normalize all the equivalent ways it can be written -- Rust and Clang use different ones today, for example -- whether or not there's an instruction to emit for it.
Is there a way to add a fallback-like lower
rule so this can still emit something reasonable without me needing to become an expert in all the instruction sets?
Do also let me know if I'm thinking about this all wrong; I'm very much a noob here :smile:
Do you have examples of codegen you're trying to achieve? E.g. snippets of corresponding Rust-to-x86 or something like that? That might help inform how best to model this in CLIF. Depending on the instructions needed if they're fancy enough a new CLIF instruction is probably the way to go, but if it's "just a new pattern" of existing clif instructions then your (2) route is probably the way to go (with rules per-backend as-opposed to in the middle-end optimizer where simplify
comes from)
For (3) the way things work right now is that ISLE rules have priorities and they may or not match a term. Given a term (e.g. what you're lowering) you can think of it as first finding all matching rules and then using the highest priority rule to execute the lowering. This means that a fallback can be implemented by having a less-general low-priority rule. For example there's a rule that lowers icmp
but probably isn't hit because there's another rule for (br (icmp ...))
(e.g. branch-on-comparison). It's there though if it's needed.
Well today it's a dozen MIR statements across multiple blocks for a single u32::cmp
, which is part of the problem, but I'm working on changing that.
I guess I could treat (sub (icmp gt x y) (icmp lt x y))
as the canonical form? So simplify other select
-based patterns into that one, and write the simplifications against that as a canonical form. That probably makes sense as a starting point as lower-impact than an instruction, and it'd be easy enough to change that to an instruction later if it turns out to be better.
Then that's pretty good on riscv already (sltu
+sltu
+sub
), and I could special-case x86 (cmp
+seta
+sbb
for unsigned) and aarch64 (cmp
+cset
+csinv
) if the existing flags patterns don't catch it.
Do you have docs for the expected canonical forms in simplifications I should follow or add to? <https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/codegen/src/opts/README.md> didn't mention anything. (I'm thinking of things like how LLVM canonicalizes a <= C
into a < (C+1)
and such, to then not have to handle other IntCCs in other simplification rules.)
ah ok in that case using sub(icmp, icmp) sounds reasonable to me - one perhaps important note is that pattern matching in Cranelift with ISLE doesn't always work well when values span basic blocks, so cutting down the MIR to get this to trigger will probably be required.
That being said, a bit of extra background is that the egraph passes in the opts
folder are all optional. Without optimizations for example Cranelift doesn't execute anything there. This means that Cranelift backends can't rely on a canonical form as-produced by the egraph-based optimizations, so as long as the egraph optimizations produce valid CLIF you're good to go, backends should be able to handle everything thrown at them at this time (otherwise that's a bug)
So one thing you can do is that if rustc produces a "messy basic block" then you can rely on egraph optimizations to clean up a good chunk of it before it reaches backends (when optimizations are enabled).
So given all that I'd probably (a) try to get rustc to get into a single basic block, (b) edit egraph optimizations as necessary to produce "ideal clif" for your use case, and then (c) add extra lowering rules to this backend for this pattern to generate the code you'd expect
alternatively, if you're extra ambitious, one thing missing in Cranelift right now is an optimization pass that works with basic blocks. For example Cranelift doesn't merge basic blocks or do anything like jump threading, and that'd remove the need for (a), but I suspect it'd also require more investment than updating rustc
Sounds good! Thanks for the advice.
CFG simplification in Cranelift does sound like a fun project, but I have way too many pies in the oven already to take on something that size :upside_down: For a specific thing in libcore, just changing rustc to not be so verbose is way easier, yeah.
Thanks for tracking down the merge queue problem this hit, Alex Crichton! I'm so relieved that it wasn't some subtle problem in one of my patterns :sweat_smile:
EDIT: Oh, apparently I can't resolve this topic. Dunno what the policy here is; feel free to leave it alone or resolve as you see fit.
nah yeah just blowing the stack by accident with soem new rules
Till Schneidereit has marked this topic as resolved.
Last updated: Nov 22 2024 at 17:03 UTC