jameysharp opened issue #8599:
Feature
ISLE terms should just be functions, from any number of arguments to any number of results. Internal terms should all be defined using
rule
. Terms with exactly one argument can be used in pattern matches like extractors are today; all terms can be used on the right-hand side of a rule or if-let, like constructors are today. Possibly a form of type-directed overloading should be added to the language to allow the current pattern of using the same term name for both constructor and extractor.Benefit
- Simplify the trie-again and serialized representations of ISLE rules. The current distinction between constructor and extractor makes no difference to code generation.
- Using
rule
to define "extractors" allows a form of "or-patterns" (#6128), without requiring us to figure out how priorities should interact when these patterns are inlined as internal extractors are today.Implementation
I'd start by unifying constructors and extractors in the trie-again representation and ensuring that the generated code doesn't change.
Then maybe introduce something like
(overload iadd extract_iadd construct_iadd)
to declare that when the termiadd
is used, whichever ofextract_iadd
orconstruct_iadd
has the right type for that context should be used instead. I'm less sure about this.Next, extend constructor terms to allow multiple results. A pattern binding the result of such a term needs some extra syntactic care to handle tuples. Semantically, though, we can approximate this today by wrapping up multiple results in an enum with a single variant, so I think we should just make the language support it directly.
Finally, remove the restriction that patterns can only use extractors and expressions can only use constructors, and replace
(extractor ...)
with(rule ...)
. Optionally, we could add aninline
flag on terms and use it for all current extractors to ensure we still generate the same code, since internal extractors are currently inlined automatically. It is difficult to define inlining for all current combinations of ISLE features, though.Alternatives
There are so many alternatives. Let's talk about it…
cfallin commented on issue #8599:
I definitely think this would be a welcome change! As you know I've been saying similar things for a while; my take is that extractors can be lowered somehow to a sugar on constructors and if-lets, now that we have them. This kind of fulfills the original dream (that I had had, anyway!) of a simple core Prolog-like semantics of matching steps in the backtracking phase (originally LHS) prior to committing.
My take on a core calculus would be:
- Rewrite any mention of a term in the LHS into a dual (that can be specified) that flips args and returns -- "apply" to "unapply".
- Turn this application into an if-let clause.
- Build surface syntax-level desugaring of the
extractor
definition form into a definition of an ordinary rule and the "dual" declaration.Then we can accept the existing language as-is; and as a next step, devise additional surface syntax for or-patterns and other things as needed.
Concretely I imagine the existing surface syntax example (slight liberties assumed in unifying inst-helper signatures to include
ty
everywhere)(decl inst_data (InstructionData) Inst) (extern extractor inst_data inst_data) (decl iadd (Type Value Value) Inst) (extractor (iadd ty x y) (inst_data (InstructionData.BinaryOp (Opcode.Iadd) ty x y))) (rule (lower (iadd ty x y)) (do_the_isa_thing x y))
would desugar into core syntax something like:
(decl inst_data (InstructionData) Inst) (decl inst_data_dual (Inst) InstructionData) (dual inst_data inst_data_dual) (extern inst_data_dual inst_data) ;; note that extern no longer specifies etor/ctor (decl iadd (Type Value Value) Inst) (decl iadd_dual (Inst) (Type Value Value) (dual iadd iadd_dual) (rule (iadd_dual inst) ;; all rules have simple bindings only in argument positions (if-let instdata (inst_data_dual inst)) ;; inst_data_dual used here because original pattern used `inst_data` and in etor position we rewrite by looking up the dual (if-let (InstructionData.BinaryOp (Opcode.Iadd) ty x y) instdata) (tuple ty x y)) (rule (lower inst) (if-let (tuple ty x y) (iadd_dual inst)) (do_the_isa_thing ty x y))
It looks a little verbose but it has really nice properties: simple args only on toplevel rules; only "structural destructuring" (as it were?) of tuples and true enums in left-hand sides of if-lets; all invocations to other rules that can fail in right-hand sides of if-lets; and after the prelude of if-lets, a sequence of constructor calls as normal. And finally and most importantly: all forms in a rule that are invocations are invocations to a "constructor", so we can stop calling them that and just call them terms :-)
(To be maximally clear, I see my sketch here as adding onto your points above; strong +1 to tuple-returns, and the implementation details in the trie-again IR; this is mostly sketching my ideas in the place where you noted "less sure about [
overload
]".)
jameysharp commented on issue #8599:
Ah, you're filling in two things I was unsure of: term overloading and a syntax for terms with multiple return values.
Regarding
dual
, that works for me. I gather it's purely syntactic: If a term is used in a pattern, then its dual is looked up instead. I find some appeal in type-directed overloading instead of syntax-directed, especially in conjunction with the existing support for implicit conversions, but this does the job and I can run with it.I'd like to tweak it a little bit though: If a term currently has only an extractor, like
inst_data
or the lowering version ofiadd
, then I don't want to have adual
declaration for it. I think we can just allow terms that have only one parameter to be used in patterns. If adual
declaration exists for that term, then rewrite to it, but otherwise use the original term itself. Then the main sugar here is that rewriting an extractor-only term into the core version ofdecl
has the parameters swapped with the results.I think the "core" form should be a subset of the full language, so that we can use it directly when we want to. That might mean picking a different name for the core version of
decl
so the existing version's syntax can coexist with it. (Bikeshedding the name: how aboutterm
instead ofdecl
?) We might choose to incrementally migrate existing ISLE source to parts of the core syntax; I'd certainly findterm
/dual
less confusing than the current syntax for extractors, which always feel "backwards" to me.As for tuple syntax: I had noticed that we need some sort of special form for matching tuples, like your suggested
(tuple ...)
. I like your suggestion better than my idea, which was that you'd use the term name again:(if-let (iadd_dual ty x y) (iadd_dual inst))
. And syntactically I was thinking we could get away without a special form for constructing tuples, because we could allow multiple expressions on the right-hand side of a rule, but I like the consistency of your approach. In particular, explicitly constructing a tuple as an expression means it can be on the right-hand side of alet
expression, and that allows us to use the same impure binding in multiple tuple fields, which we could not do withif-let
.Also while we're at it I'd like to make our
Unit
type andunit
external constructor into simply 0-tuples,()
and(tuple)
respectively.I'm intrigued that you want the core syntax to prohibit function calls in patterns. We have to do that internally when building up the "serialized" representation just before generating Rust source, but the "trie-again" representation doesn't make that distinction, and it's not too hard to build that representation from our current surface syntax. So I'm not sure what that restriction on the core syntax would buy us.
By the way, we absolutely need some kind of inlining for some "extractor" terms or our pattern-matching efficiency will be completely shot. We have to be able to see that
iadd
andimul
both match different enum variants on the result of the sameinst_data
call or we'll test all variants in some sequence on every instruction. We might gain something by inlining some existing constructors that are used inif-let
right now too.To summarize, I think a reasonable synthesis of our proposals is:
- Add
(tuple ...)
syntax in both expressions and patterns, and allow the result type indecl
to be a tuple type, though such terms can't declare either internal or external extractors.- Add
term
declarations, which are likedecl
but only support "constructors". Whenextern
is used withterm
, it does not support any flags, which have equivalents specified on theterm
instead, or theconstructor
/extractor
keywords. Allowterm
declarations to be used in patterns if they have exactly one parameter.- Add an
inline
flag toterm
. Not all terms can be inlined right now; we'll have to figure out the exact rules for when we want to allow this flag. As an alternative, we could add apub
flag indicating that a term is meant to be called externally, or just anoinline
flag, and for everything else, inline anything we can. Or something in between?- Add
dual
declarations; if a pattern uses a term that has adual
declaration then invoke the associated dual term instead. Otherwise if the term was declared withdecl
then use its extractor, or if it was declared withterm
then just use it.- Turn
decl
,extractor
, andextern constructor/extractor
into syntactic sugar for various combinations ofterm
,dual
,extern
, andrule
. Or perhaps just rewrite these at the source level and delete support for them.Introducing
term
alone would allow us to convert a variety of simple extractor predicates from Rust to ISLE; more if we also addtuple
. Addinginline
as well and implementing it when constructing the trie-again representation would be semantically visible because it would make the overlap check more precise and allow us to remove a bunch of rule priorities. Lots of incremental benefits from doing different parts of this work.
jameysharp added the isle label to Issue #8599.
Last updated: Jan 24 2025 at 00:11 UTC