Initial ISLE verification notes · cranelift

I have started playing around with approaches to verifying the correctness of ISLE rewrite rules using an SMT solver, and wanted to post some initial notes here for anyone interested. Some written notes live here. A manually-constructed SMTLIB file for a single rule that rewrites an iadd to a subtract with immediate is here. This file currently hard-codes that the type is b64 and doesn't do anything interesting yet for register reads/writes.

fitzgen (he/him) (Jan 10 2022 at 17:27):

I guess I was imagining that we would have a separate query per type, but that is only because I didn't think it would be possible to fold it all together into one query. Does SMTLIB let you say "this is a bitvector of a dynamic bit width" and then feed a variable defined elsewhere in the query in as a value's bit width? How would you fold multiple bit-widths into the same query?

As discussed during the cranelift meeting today, I think we would want to specify extractors in the same order as the Rust: from "return" type to "parameter" types. This is because extractors can extract multiple values (e.g. has_type extracts both a type and an inst from an inst).

Adrian Sampson (Jan 10 2022 at 17:45):

The thing about multiple parameter types is a pretty strong argument… just to state the obvious once again, another argument in favor of "the Rust direction" for specifying extractors is that it would (maybe?) create an easier path to someday verifying that the Rust implementations are correct.

Chris Fallin (Jan 10 2022 at 17:48):

@sampsyo just to make sure we're on the same page -- this is a notational question I think? I'm not sure how (decl A (T U) V) vs (decl A (V) (T U)) affects whether it's possible to e.g. verify the Rust, but possibly I'm missing some nuance here

Adrian Sampson (Jan 10 2022 at 18:00):

I meant something about the "guts" of the spec rather than the notation itself: taking @Alexa VanHattum's imm12_from_negated_value example above, one direction consists of extending (and negating) a 12-bit value, while the other consists of truncating (and negating) a 64-bit value. The idea is that the verifier could either quantify over the 64-bit value in the original CLIF instruction and rely on the "forward" spec to obtain the 12-bit value, or it could quantify over the 12-bit value and use the "backward" spec to obtain the corresponding original CLIF literal. You can imagine various efficiency & "trustworthiness" trade-offs depending on the choice.

Chris Fallin (Jan 10 2022 at 18:05):

At least the intention at the semantic level is for the meaning to be the same in both directions -- I had been thinking of things in a sort of Prolog way, where we would say imm12_from_negated_value(a, b) and this is just a predicate on a and b; one could imagine a translation from that to SMT formulae or whatnot. So I guess I see the ISLE terms as "directionless"; a Rust implementation of one direction or the other could then be verified against that (always quantify over the Rust args, get symbolic result, check against predicate). But you all will have more detailed thoughts on the implementation/lowering than me; just wanted to communicate the language-level intent above :-)

Adrian Sampson (Jan 10 2022 at 18:09):

Right, that directionless/Prolog-esque idea very much came through when reading through ISLE files! It's a really neat idea (and makes for an interesting decision about how to set up the verifier).

Alexa VanHattum (Feb 04 2022 at 16:48):

Making good progress on an SMT-condition generator! Here's the output (modulo cleaning up some comments/formatting) for two iadd rules. Currently, this is hard-coding the term semantics rather than the eventual goal of pulling them from annotations. This is also using a fairly high level model for RHS terms, which we'll want to replace soon.

The overall strategy: we have a symbolic interpreter for Patterns for the LHS of rules that recursively lowers terms to bit vectors, while building up a set of bound SMT variables and assumptions on those variables. Starting with the outer-most instruction pattern, this creates fresh variables for each argument and maps ISLE term indices to bound names. Then a symbolic interpreter for Exprs lowers RHS expressions to bitvector, using the assumption context built from the LHS.

Currently, we run a separate query for each bit width, so has_type mostly falls out without reasoning about the type itself. In the example below, we run for 32-bit. Eventually, we'll want the RHS to always reason on the appropriate 32-bit or 64-bit BV, even if the query's type is something like i8.

;; ----------------------------------simple iadd-----------------------------------

;; Running verification for rule:
(rule
    (lower (has_type (fits_in_64 ty)
                     (iadd x y)))
    (value_reg (add ty (put_in_reg x) (put_in_reg y))))

;; Declaring constants:
    ty : BitVector(32)
    arg_0 : BitVector(32)
    arg_1 : BitVector(32)
;; Adding assumptions:
    (= (_ bv32 32) ty)
;; Running query:
    ;; Query structure: we want `<assumptions> implies <LSH> = <RSH>`
    ;; Negate the query, unsat implies no counterexamples exist.
    (not
        ;; Implication
        (=>
            ;; Assumptions
            (and
                (= (_ bv32 32) ty))
            ;; LHS = RHS
            (= (bvadd arg_0 arg_1) (bvadd arg_0 arg_1))))

;; Verification succeeded
;; ----------------------------------iadd to sub-----------------------------------

;; Running verification for rule:
(rule
    (lower (has_type (fits_in_64 ty)
                     (iadd x (imm12_from_negated_value y))))
    (value_reg (sub_imm ty (put_in_reg x) y)))

;; Declaring constants:
    ty : BitVector(32)
    arg_0 : BitVector(32)
    arg_1 : BitVector(32)
    arg_imm12_from_negated_value : BitVector(12)
;; Adding assumptions:
    (= (_ bv32 32) ty)
    (= (bvand (bvnot (_ bv4095 32)) arg_1) (_ bv0 32))
    (= arg_1 (bvneg ((_ zero_extend 20) arg_imm12_from_negated_value)))
;; Running query:
    ;; Query structure: we want `<assumptions> implies <LSH> = <RSH>`
    ;; Negate the query, unsat implies no counterexamples exist.
    (not
        ;; Implication
        (=>
            ;; Assumptions
            (and
                (= (_ bv32 32) ty)
                (= (bvand (bvnot (_ bv4096 32)) arg_1) (_ bv0 32))
                (= arg_1 (bvneg ((_ zero_extend 20) arg_imm12_from_negated_value))))
            ;; LHS = RHS
            (=
                (bvadd arg_0 (bvneg ((_ zero_extend 20) arg_imm12_from_negated_value)))
                (bvsub arg_0 ((_ zero_extend 20) arg_imm12_from_negated_value)))))
;; Verification succeeded

Alexa VanHattum (Feb 04 2022 at 16:49):

I can try and wrap this up into a neater demo to share on Monday's meeting, if there's interest.

fitzgen (he/him) (Feb 04 2022 at 17:36):

fitzgen (he/him) (Feb 04 2022 at 17:45):

Do you mean lowering the RHS to the width of the register classes that it is using to implement the LHS? In x64, for example, we don't differentiate between e.g. rax and eax, even though we will at times emit instructions that just use eax/the lower half of rax. I wonder if it makes sense to represent the full bit width of e.g. rax` even when we are lowering a 32-bit or smaller LHS. This could be a small step towards catching bugs like our CVE where we spilled a 32-bit value and then reloaded it with the wrong kind of extension before using it as the address for a memory access.

Chris Fallin (Feb 04 2022 at 18:35):

@Alexa VanHattum yes, we'd love to hear more about this at the next meeting! (Please feel free to create a PR to add to the meeting agenda any time; happy to do it now as well if you'd like)

I'm mostly curious about the longer-term story for integration of semantics models on both ends (CLIF and the target ISA); I imagine that making this generic would also largely take care of the specific questions of types. I would hope that in the eventual design we don't have to pattern-match on has_type or others as a special case, but that it just falls out of the semantics that we understand about each term.

Another random thought that comes to mind re bitvector widths is that maybe at the SMT-variable level, it would make sense for all bindings to be a bitvector large enough for any value (128 bits currently), but then the specific constraints emitted for terms use only the appropriate bits? This could be a way of avoiding the need to do a kind of type inference in your lowering to SMT, at the cost of a small semantic gap

Alexa VanHattum (Feb 04 2022 at 20:06):

Re: the currently-less-generic nature of how we're handling types, we anticipant that types may need to be a special case to facilitate verification scaling. We could, as you say, over-approximate everything as an i128, but doing so might cause us to hit total timeouts in more complicated cases where smaller bitwidths are tractable. But I totally agree that the current handling of has_type is way more of a special case than our goal.

One open question with respect to the semantics models for CLIF and the target ISA is how to handle rules that write to intermediate ISLE terms not in the target ISA. While we could provide semantics for every intermediate term (for example, the sub_imm term is further rewritten later), it might be nice for symbolic execution to continue across multiple rules. This would let us only provide (initially hand-specified) semantics for starting CLIF terms and final ISA terms. This is part of why we are curious about the depths of the trie for combining ISLE rules.

Chris Fallin (Feb 04 2022 at 20:53):

That's a good point, one model could just be to statically elaborate/inline rules from input language to output. At one point I actually had wanted to build the codegen in islec this way; I decided to go instead to a model that actually generates a separate Rust function per intermediate term to keep the code-size expansion down, but I didn't get to a point where I'd be able to actually measure that expansion

Chris Fallin (Feb 04 2022 at 20:54):

one potential complexity is that the language semantics allow, at least in theory, for statically-unbounded recursion (cycles can occur in the callgraph). In practice though we don't use that ability (though I think Ulrich may have in some prototype call-ABI stuff)

Chris Fallin (Feb 04 2022 at 20:55):

In any case, static elaboration and bailing on cycles feels like a good first-pass answer here, and we can solve any scalability problems (maybe by refactoring code, maybe by doing a conditionally-context-sensitive thing or using intermediate invariants) as they come up

Stream: cranelift

Topic: Initial ISLE verification notes

Alexa VanHattum (Jan 10 2022 at 15:46):