I'm curious what algorithm ISLE uses to check patterns for coverage and compile to rust expressions. I know rust uses an extended version of https://dl.acm.org/doi/10.1145/1411304.1411311 and http://moscova.inria.fr/~maranget/papers/warn/warn.pdf (extended to handle ranges and uninhabited types), but that doesn't handle non-linear pattern variables. Was there a paper you used as a guide or is it a home-grown algorithm?
We actually don't have a "completeness" check; it's valid for a constructor to be partial and to fail to match (in which case it returns None
). We do have an "overlap" check however that @Trevor Elliott built here: https://github.com/bytecodealliance/wasmtime/pull/4906; he may be able to say more about how it corresponds to the literature/common algorithms!
originally it used an algorithm described in a paper from Xavier Leroy (https://xavierleroy.org/bibrefs/Leroy-ZINC.html), but @Jamey Sharp realized that it would be significantly less computationally expensive to check each rule in a decl group against every other rule.
the check itself comes down to a pretty simple comparison of the patterns being matched: if they match the same known constructor than we recursively check each argument. if we make it all the way through the arguments without finding a pattern that differs between two rules, we raise an error. this is somewhat complicated by extractors defined in rust, as there's logic that we can't inspect in the ISLE compiler. originally i treated these as wildcards unless both rules used them, in which case the check would recurse. i believe that Jamey might have come up with some better handling for those when he worked through the changes for the trie_again ir.
Yeah, I still basically treated external terms as wildcards unless they match syntactically. There's also complexity due to equality constraints between terms. However, the "trie again" representation made dealing with and-patterns much easier, if I recall correctly. And there's an additional trick where if we can prove that a higher-priority rule matches every input that a lower-priority rule does then we can report that the lower-priority rule is unreachable, which I added as an extension of the previous definition of "overlap".
I put a lot of thought into the algorithm for building the Rust match tree as well; it is not based on any paper, but is based on my past experience with combinatorial search in general. I'd be interested in talking about it more at some point.
We actually don't have a "completeness" check; it's valid for a constructor to be partial and to fail to match (in which case it returns
None
)
That's concerning. Does that mean lowering could fail due to IR that is not matched by any pattern?
Yes, and in practice we have a bunch of them.
Here's a list of all the instructions that the cranelift fuzzer is not allowed to generate. And there are even more than that, such as vectors where the length is not 128 bits will be unimplemented by most backends. And some other instructions that the fuzzer doesen't generate.
No more concerning than any other part of the compiler that has not been proved to be total :-)
It's historically a result of the "gradual migration" approach that we took with ISLE's design: we wanted to be able to move over lowering patterns that override the legacy code, then eventually delete the legacy code. Now that everything is moved over (has been for two years or so) it would be a reasonable next step to add a total
mode to decls and check coverage; but that's a large project (you or anyone else is welcome to work on it!)
In the meantime because we have fuzzing (as Afonso mentions above at the CLIF level, also at the Wasm level) we tend to catch "holes" pretty quickly; and the failure mode is loud and obvious in any case (compiler panic rather than a miscompile)
There are already limited cases where we prove that an ISLE constructor is total and tweak the code we generate for it accordingly, though it's currently only when there's a catch-all low-priority rule. There might be other situations where we could generate better pattern matches if we proved that some fall-through case is unreachable. (I think I left a comment somewhere in the serialize module about that, even.) And we could do better error reporting if we synthesized a rule for each missing case and could tell the user exactly what pattern failed to match; I think Trevor tried to do that a couple years ago but it was more complicated than it seemed at first glance.
Last updated: Dec 23 2024 at 12:05 UTC