rfcs / PR #15 RFC: design of ISLE instruction-selector DSL. · rfc-notifications

Stream: rfc-notifications

Topic: rfcs / PR #15 RFC: design of ISLE instruction-selector DSL.

RFC notifications bot (Aug 19 2021 at 05:48):

cfallin opened PR #15 from cranelift-isel-isle-peepmatic to main:

Rendered

This RFC proposes a new DSL for the instruction selectors in Cranelift,
automating the generation of the backends based on a list of term-rewriting
rules provided by the backend author.

This is based on extensive discussions surrounding the ideas in
bytecodealliance/rfcs#13 and is meant to be a refinement of earlier
ideas that is compatible with our goals related to verification, and is
able to build on top of Peepmatic's automaton abstraction.

RFC notifications bot (Aug 19 2021 at 08:37):

afonso360 submitted PR review.

RFC notifications bot (Aug 19 2021 at 08:37):

afonso360 created PR review comment:

For a concrete example, let us assume we have a term type representing a

RFC notifications bot (Aug 19 2021 at 10:34):

sparker-arm submitted PR review.

RFC notifications bot (Aug 19 2021 at 10:34):

sparker-arm submitted PR review.

RFC notifications bot (Aug 19 2021 at 10:34):

sparker-arm created PR review comment:

So does this mean we can write generic, and machine-level, pattern reducers as part of isel then? This suggests that 'B' could be a machine node and so we can use machine nodes as inputs. If so, could this be viewed as multiple peepmatic passes: (1) IR-to-IR reductions, (2) IR-to-MachInst ISel, (3) MachInst reductions? And then would having multiple passes have benefits over inlining while the tree is still being selected? Maybe splitting it into the two later phases would ease implementation and verification. And could we do some lowering with the same method - especially stuff like expanding unsupported SIMD operations?

RFC notifications bot (Aug 19 2021 at 10:34):

sparker-arm created PR review comment:

Sounds cool, do you know if someone else is doing this? Sounds like it would be a hard problem for instructions that read/write global state.

RFC notifications bot (Aug 19 2021 at 13:31):

bjorn3 submitted PR review.

RFC notifications bot (Aug 19 2021 at 13:31):

bjorn3 created PR review comment:

It might make sense to tell when multiple rules can apply to the same input and maybe allow hints to acknowledge the overlap. These hints could then also contain the advisory priority or the rules.

RFC notifications bot (Aug 19 2021 at 13:33):

bjorn3 submitted PR review.

RFC notifications bot (Aug 19 2021 at 13:33):

bjorn3 created PR review comment:

A suggestion for an alternative syntax to consider:
(Iadd a b) => (X86Add a b);

RFC notifications bot (Aug 19 2021 at 13:36):

bjorn3 submitted PR review.

RFC notifications bot (Aug 19 2021 at 13:36):

bjorn3 created PR review comment:

Might be nice to infer this using "typechecker" that takes constraints from the output instructions and propagates them to the input.

RFC notifications bot (Aug 19 2021 at 19:33):

cfallin updated PR #15 from cranelift-isel-isle-peepmatic to main.

RFC notifications bot (Aug 19 2021 at 19:33):

cfallin updated PR #15 from cranelift-isel-isle-peepmatic to main.

RFC notifications bot (Aug 19 2021 at 19:39):

cfallin submitted PR review.

RFC notifications bot (Aug 19 2021 at 19:39):

cfallin created PR review comment:

Yes, the DSL itself is agnostic to whatever the input and output formats are (that depends on the "bindings" one writes with external extractor and constructor definitions), so in principle one could have all of the above.

Reductions in the legalization sense, e.g. lowering a 64-bit add to two 32-bit adds (to take a simple example), should be possible as well. This has implications for the term stratification approach to ensuring termination, so we would need to expand that a bit (the RFC notes that allowing substructural recursion, or maybe just "recurse during inlining up to limit k") to allow the lowering code to be generated properly. But that's absolutely a goal -- writing fallback rules like this then composing with machine-specific rules is much more powerful than the support-all-features-on-all-machines approach we currently have (e.g. adding i128 separately on x64 and aarch64).

We could also have multiple passes; @fitzgen and I discussed this a bit yesterday actually. For now I think it makes sense to start with the single lowering pass, because we have the existing machinery for that, but once the DSL compiler exists, it should be relatively straightforward to use it to generate another pass as well.

RFC notifications bot (Aug 19 2021 at 19:41):

cfallin submitted PR review.

RFC notifications bot (Aug 19 2021 at 19:41):

cfallin created PR review comment:

There are various research folks working on compiler verification; e.g. this paper from recent ASPLOS 2021. The general term is "translation validation" when it involves proving a specific input and output are equivalent, I think. I don't see the core Cranelift team diving into this fulltime but enabling it is very important, and there are academics who might be interested!

RFC notifications bot (Aug 19 2021 at 19:43):

cfallin submitted PR review.

RFC notifications bot (Aug 19 2021 at 19:43):

cfallin created PR review comment:

Yes, this gives me an idea for a DSL-compiler mode that produces some "explainer" output -- which overlaps occurred and which prioritization decisions were made. That could possibly be useful. Also if a rule is never used (because it's overshadowed by some other higher-priority rule that completely overlaps it) that is probably worth a warning.

RFC notifications bot (Aug 19 2021 at 19:45):

cfallin submitted PR review.

RFC notifications bot (Aug 19 2021 at 19:45):

cfallin created PR review comment:

I'm generally hoping to stay true to S-expression form here (so the toplevel forms/clauses are (KEYWORD ...)) but => as the keyword ((=> (A x) (B x))), which is what Peepmatic does, could work. Others please let us know your favorite bikeshed color below!

RFC notifications bot (Aug 19 2021 at 19:48):

cfallin submitted PR review.

RFC notifications bot (Aug 19 2021 at 19:48):

cfallin created PR review comment:

That feels a bit too magic/implicit, but what can be done instead is that constructors that build the final instructions take strongly-typed values that can only be produced by extractors that check the relevant conditions. So the plan is to have e.g. an Aarch64Imm12 value type, and this can only represent legal values. Similar type-encoded conditions can be used for other properties.

RFC notifications bot (Aug 19 2021 at 19:50):

bjorn3 submitted PR review.

RFC notifications bot (Aug 19 2021 at 19:50):

bjorn3 created PR review comment:

One more suggestion: Please enforce a specific formatting of the S-expressions on CI in rustfmt style. That is a 1-to-1 map from parsed AST to the resulting string. Not like eg clang-format which tries to preserve newlines and things like that and only normalizes things like spaces.

RFC notifications bot (Aug 20 2021 at 08:25):

afonso360 submitted PR review.

RFC notifications bot (Aug 20 2021 at 08:25):

afonso360 created PR review comment:

We may also want to use this for things that are not typecheckable, such as a processor extension predicate.
(rule (iadd (i8x16 a) (i8x16 b))
      (when (has-3dnow))
      (3dnow-add a b))

RFC notifications bot (Aug 20 2021 at 08:25):

afonso360 edited PR review comment.

RFC notifications bot (Aug 20 2021 at 08:27):

afonso360 edited PR review comment.

RFC notifications bot (Aug 20 2021 at 09:40):

afonso360 submitted PR review.

RFC notifications bot (Aug 20 2021 at 09:40):

afonso360 submitted PR review.

RFC notifications bot (Aug 20 2021 at 09:40):

afonso360 created PR review comment:

In the aarch64 backend we have emit_shl_i128 which emits 9 instructions, and is used 4 times while lowering other instructions. In the rotr/rotl case we end up emitting 21 instructions (1 i128 shr + 1 i128 shl + 3 instructions), it would be nice not to have to repeat the shl/shr instructions 4 times in different orders.

Would we do this via a (shl-i128 a b) constructor? Or can we do this as a separate rule which gets matched as well?

This feels like an edge case, not sure if it happens elsewhere.

RFC notifications bot (Aug 20 2021 at 09:40):

afonso360 created PR review comment:

A similar, but somewhat different case here would be popcnt, where we emit a number of instructions, and then match a single instruction depending on the type of the input.

But here, since the total size is about 6 instructions in 5 different input types it isn't as bad.

RFC notifications bot (Aug 20 2021 at 10:44):

sparker-arm created PR review comment:

Thanks for the paper - interesting stuff! From my brief interactions with cranelift so far, the key differences that I see with cranelift and LLVM are that neither levels of IR are defined as well in cranelift, for example:
1) I don't think that the aarch64 backend actually models pstate flag setting.
2) The CLIF icmp has some ugly target defined behaviour...
So I think having some verification would be an excellent driver for improving our internal semantics.

RFC notifications bot (Aug 20 2021 at 10:44):

sparker-arm submitted PR review.

RFC notifications bot (Aug 20 2021 at 10:44):

sparker-arm edited PR review comment.

RFC notifications bot (Aug 20 2021 at 11:59):

afonso360 submitted PR review.

RFC notifications bot (Aug 20 2021 at 11:59):

afonso360 created PR review comment:

Re-reading the RFC now, it looks like the shl case would be a good use of Virtual Internal Terms. popcnt might be too small for this to make sense.

RFC notifications bot (Aug 20 2021 at 18:00):

cfallin submitted PR review.

RFC notifications bot (Aug 20 2021 at 18:00):

cfallin created PR review comment:

Yes, exactly, that's the intent of the ability to freely (in terms of runtime cost) create these virtual terms that chain rules together -- they allow additional factoring and abstraction.

RFC notifications bot (Aug 23 2021 at 15:25):

bnjbvr created PR review comment:

The wildcard only exists in the grammar for LHS -- is it expected that it would always be discarded entirely (and if so, do you have an example of actual rules that would unconditionally discard one input), or is there a way to bind it and reuse it in the RHS that i've missed?

RFC notifications bot (Aug 23 2021 at 15:25):

bnjbvr submitted PR review.

RFC notifications bot (Aug 23 2021 at 15:27):

bnjbvr submitted PR review.

RFC notifications bot (Aug 23 2021 at 15:27):

bnjbvr created PR review comment:

ah wait, in this example the wildcarded input is actually bound to x, so nevermind me.

RFC notifications bot (Aug 23 2021 at 16:57):

fitzgen submitted PR review.

RFC notifications bot (Aug 23 2021 at 16:57):

fitzgen created PR review comment:

do you have an example of actual rules that would unconditionally discard one input

It could be used in something like "anything times zero is zero":
(rule (Imul _ 0)
      (Iconst 0))

RFC notifications bot (Aug 31 2021 at 16:09):

bnjbvr submitted PR review.

RFC notifications bot (Aug 31 2021 at 16:09):

bnjbvr submitted PR review.

RFC notifications bot (Aug 31 2021 at 16:09):

bnjbvr created PR review comment:

From looking at this, I'm thinking that constant propagation / strength reduction (maybe other optimizations) could be carried out within this framework as well, but those would operate in higher-level IR types (CLIF, not VCode) that aren't tied to a specific architecture. If I understand correctly, one could have two sets of declared patterns (i.e. in two files), each declaring its IR type, the set of rules, etc. and we could call the two resulting lower functions at different times during the optimization passes.

RFC notifications bot (Aug 31 2021 at 16:09):

bnjbvr created PR review comment:

It's going to be harder to inspect some characteristics of the given resulting Rust enum: what are its size? which variant is the biggest (and thus determines the whole enum's size)? Reducing the memory footprint of IR data structures has been a fruitful exercise in the past, causing reductions in the number of malloc calls and thus compile-time speedups, and I'm a bit worried that this kind of optimization opportunities, that was very easy to implement in the past, will become unbearable within this framework, except for the framework author(s).

On the other hand, one could argue that any improvement to the generic framework would benefit all the backends at once (but that's just a benefit from a generic framework: errors/bugs would be shared too!). Still, it doesn't seem like it's fitting the initial goal of being welcoming and simple for newcomers.

RFC notifications bot (Aug 31 2021 at 16:09):

bnjbvr created PR review comment:

Seems this FromInstOnlyUse has disappeared below?

RFC notifications bot (Aug 31 2021 at 16:54):

cfallin submitted PR review.

RFC notifications bot (Aug 31 2021 at 16:54):

cfallin created PR review comment:

Ah, so a key thing here is that this extern type decl is referring to an enum that's already defined in Rust code elsewhere. Perhaps we could find a better way of doing it later, but for now the idea is to declare the view of the types that we need to generate Rust code against them, but not to generate the type definitions themselves. So during the migration, we still have the enum Inst in isa/*/inst/mod.rs, and we can still e.g. write tests against its size and play with its packing.

One could argue that it'd be better to generate both from a single source of truth, and I think that has to be the eventual answer; in that case one could still write a unit-test against std::mem::size_of::<x64::Inst>() or whatnot but changing it would mean editing the DSL.

A key factor in approachability IMHO (this may be controversial; please discuss!) could be to include the generated code in the git repo. Aside from implications on build-time above, it would mean that (i) the actual Rust definitions, at the stage in our evolution when they are all generated from the DSL, should be easy to find; and (ii) we can add a note to the top of the file saying "autogenerated, please see file.isle" and maybe on the actual enum a comment like "autogenerated from definition Foo in file.isle line 123".

I do think that there is an interesting philosophical debate about the definition of "welcoming and simple": building some DSL framework means that if we're careful, the "happy path" of adding a new instruction or a new lowering rule is much easier, and also means that there is less cognitive load / fewer subtle invariants to keep in mind. That's pretty much the driving factor I had in mind at least. But I agree that it can make systemic optimizations, like "make the Inst enum smaller", more difficult it it means one has to step up into a meta-level, and it also puts a lot of pressure on the framework to be correct, lest we have difficult-to-find bugs. IMHO the tradeoff is worth it and does make the codebase more welcoming for the most common sort of contribution we anticipate in the future, but... please do discuss!

RFC notifications bot (Nov 01 2021 at 16:32):

alexcrichton submitted PR review.

RFC notifications bot (Nov 01 2021 at 17:21):

bjorn3 submitted PR review.

RFC notifications bot (Nov 03 2021 at 09:04):

sparker-arm submitted PR review.

RFC notifications bot (Nov 15 2021 at 16:11):

cfallin merged PR #15.

Last updated: Apr 14 2025 at 14:03 UTC