Stream: git-wasmtime

Topic: wasmtime / issue #6090 Verification of addressing / memor...


view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2023 at 01:40):

cfallin opened issue #6090:

I've been working on VeriWasm in Wasmtime the last few weeks, and decided to dump a summary of my investigation so far and ideas for the right approach to take. (I'm doing this partly because I've been asked to context switch back to something else, now that it's clear that VeriWasm proper cannot work and we need a new tool; I estimate it would take ~1 month to build what I describe in the last section below.)

(This may be better as an RFC, and I can clean it up, expand it, and post it as one once I have time to properly do so!)

Summary

We should include built-in support in Wasmtime to verify that compiled code from Wasm modules is accessing only the memory that it is supposed to access. Such verification would take the form of ahead-of-time static analysis on the code, so that it would not require dynamic checks; and it should trust as little as possible in the existing compiler stack, to increase the confidence we have in its conclusions.

The ideal would be to verify machine code bytes with no other input, but in practice, Wasmtime will have to tell the tool what to assume about details such as runtime data structures; and certain kinds of analysis, like full recovery of a CFG in the presence of indirect branches, may require whole-program analysis that is too expensive to be practical.

This issue proposes a scheme that embeds a notion of "memory capabilities" in CLIF, along with (static) assertions on values (mostly with respect to ranges), and carries these through to the VCode level in Cranelift. The Wasm lowering would be modified to annotate every load/store with an associated memory capability that permits it; and we can then check the uses of these capabilities at the other end of the pipeline, after instruction selection is done. This is inspired by, but in some sense the dual to, the approach taken by VeriWasm.

History: VeriWasm

In Johnson et al.[^1], the VeriWasm tool is described, with an approach that performs a static analysis of disassembled machine code to prove that all heap accesses in AOT-compiled Wasm code are properly to the Wasm heap, and cannot escape the sandbox. (The tool also checks other properties, such as control-flow integrity, but that is outside the scope of this issue.)

[^1]: Evan Johnson et al. "Довер ́яй, но провер ́яй: SFI Safety for native compiled Wasm." In NDSS 2021. <https://cseweb.ucsd.edu/~lerner/papers/wasm-sfi-ndss2021.pdf>

VeriWasm was designed to verify code produced by Lucet, another AOT Wasm compiler that has since been subsumed by Wasmtime and EOL'd.

VeriWasm works by lifting machine code to a simple IR that describes the semantics of the code in a partial way -- recovering just enough of the dataflow to be able to see all address computations -- and then performing iterative dataflow analysis over that IR using an analysis lattice.

The analysis values computed to describe program values include:

The basic idea of the analysis is to mark a result of any 32-bit instruction as Bounded4GB in the abstract interpretation, and then allow only memory accesses of the form [HeapBase + Bounded4GB].

Issues with porting VeriWasm to Wasmtime

The obvious next step is to port VeriWasm to work with Wasmtime instead of Lucet. To understand why this is difficult, it may be useful to know how Lucet's generated code differed from Wasmtime's. Both use Cranelift, but there are two key deltas:

Taken together, these two facts mean that we need a much more general analysis. In principle, we need to be able to:

Example

Here is the simplest possible dynamic bounds-check example, compiled for x86-64 by Wasmtime, with annotations:

0000000000000000 <_wasm_function_0>:
       0:   55                      push   %rbp
       1:   48 89 e5                mov    %rsp,%rbp
       4:   44 8b c2                mov    %edx,%r8d                    ;; save Wasm address, and truncate to 32 bits
       7:   4c 89 c0                mov    %r8,%rax
       a:   48 83 c0 04             add    $0x4,%rax                    ;; compute end: address, plus 4-byte load size
       e:   0f 83 02 00 00 00       jae    16 <_wasm_function_0+0x16>   ;; if that overflowed, trap
      14:   0f 0b                   ud2
      16:   48 8b 57 58             mov    0x58(%rdi),%rdx              ;; load memory-0 length from vmctx+0x58
      1a:   48 31 c9                xor    %rcx,%rcx                    ;; clear rcx, to use in Spectre guard below
      1d:   4c 03 47 50             add    0x50(%rdi),%r8               ;; add memory-0 base (loaded from vmctx+0x50)
      21:   48 39 d0                cmp    %rdx,%rax                    ;; compare computed end-offset to length
      24:   4c 0f 47 c1             cmova  %rcx,%r8                     ;; if out-of-bounds, move zeros (in rcx) over pointer; Spectre guard
      28:   41 8b 00                mov    (%r8),%eax                   ;; do the load
      2b:   48 89 ec                mov    %rbp,%rsp
      2e:   5d                      pop    %rbp
      2f:   c3                      ret

This is compiled from

(module
  (memory 0 10)
  (func (param i32) (result i32)
        local.get 0
        i32.load))

with wasmtime compile --dynamic-memory-guard-size 0 --static-memory-guard-size 0 --static-memory-maximum-size 0 test.wat.

There are several things going on here that we need to be able to analyze:

The basic issue here is that we need to (i) build up a massive body of facts about all register values in the program, and (ii) only later pattern-match uses of certain combinations of them to generate legal pointers into dynamically-bounded memory regions. So we need to generally symoblically summarize expressions, and we need some algebraic language of additions, scaling operations, and bounding operations that can be simplified, normalized, and used to do this recognition.

Dead-end Prototype: Symbolic Expressions

In this branch of VeriWasm (which goes with this branch of Wasmtime and this branch of yaxpeax-core), I prototyped a system that basically attempted the above. It defined an analysis lattice that could represent additions, scaling (multiplication by a constant), umin (unsigned min, a bounding operator, the basic nonlinear element), loads, known constant values, and a few miscellaneous atoms like vmctx, "some pointer into .text", and Unknown.

The analysis used a set of rewrite rules to try to simplify expressions: e.g., an arbitrary 32-bit instruction might produce UMin(Unknown, Const(0xffffffff)), then adding 4 might produce Add(Const(4), UMin(Unknown, Const(0xffffffff))) which simplifies to UMin(Unknown, Const(0x100000003)) because Add distributes over UMin. (Wrapping is handl
[message truncated]

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2023 at 13:27):

martin-fink commented on issue #6090:

Hi!
Seeing that you have started to work on this, I just wanted to chime in that I've been working on a 'memory access verifier' too! :)

My approach follows what veriwasm and your prototype do: I build 'symbolic values' that model address computations/bounds checks and solve inequalities on them. As you have mentioned above, that is quite complex and not super scalable.
The code is quite hacky and there's much to be improved in terms of performance and elegance, but it works for static and dynamic heaps (with or without spectre guards) and detects if e.g. https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-ff4p-7xrq-q5r8 is reintroduced into the code base when fuzzing with the checker enabled.
You can find it at https://github.com/martin-fink/wasmtime.

I'm very excited to see how your PCC approach will differ from what I've built. :)

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2023 at 16:14):

cfallin commented on issue #6090:

Woah, that's really neat -- I wish I had known about this earlier!

Looking it over, I definitely agree that SymbolicValue is close to what I had tried with my first "veriwasm 2.0" branch, an AST of of operator nodes. I'm hopeful that the current PCC approach will be more scalable since the assertions (facts) aren't recursive ASTs but shallow facts instead, and the producer is expected to leave a flattened sequence of assertions along each intermediate value in the CLIF. I've sketched out the dynamic-memories approach I want to take and it should keep to a fairly narrow set of assertion types I think, without a general inequality sublanguage or solver, tailored to the code that cranelift-wasm actually generates.

It's really cool to see a working design point that can validate dynamic memories as well; thanks for this and I'll study it in more detail!

view this post on Zulip Wasmtime GitHub notifications bot (Nov 04 2023 at 05:52):

cfallin closed issue #6090:

I've been working on VeriWasm in Wasmtime the last few weeks, and decided to dump a summary of my investigation so far and ideas for the right approach to take. (I'm doing this partly because I've been asked to context switch back to something else, now that it's clear that VeriWasm proper cannot work and we need a new tool; I estimate it would take ~1 month to build what I describe in the last section below.)

(This may be better as an RFC, and I can clean it up, expand it, and post it as one once I have time to properly do so!)

Summary

We should include built-in support in Wasmtime to verify that compiled code from Wasm modules is accessing only the memory that it is supposed to access. Such verification would take the form of ahead-of-time static analysis on the code, so that it would not require dynamic checks; and it should trust as little as possible in the existing compiler stack, to increase the confidence we have in its conclusions.

The ideal would be to verify machine code bytes with no other input, but in practice, Wasmtime will have to tell the tool what to assume about details such as runtime data structures; and certain kinds of analysis, like full recovery of a CFG in the presence of indirect branches, may require whole-program analysis that is too expensive to be practical.

This issue proposes a scheme that embeds a notion of "memory capabilities" in CLIF, along with (static) assertions on values (mostly with respect to ranges), and carries these through to the VCode level in Cranelift. The Wasm lowering would be modified to annotate every load/store with an associated memory capability that permits it; and we can then check the uses of these capabilities at the other end of the pipeline, after instruction selection is done. This is inspired by, but in some sense the dual to, the approach taken by VeriWasm.

History: VeriWasm

In Johnson et al.[^1], the VeriWasm tool is described, with an approach that performs a static analysis of disassembled machine code to prove that all heap accesses in AOT-compiled Wasm code are properly to the Wasm heap, and cannot escape the sandbox. (The tool also checks other properties, such as control-flow integrity, but that is outside the scope of this issue.)

[^1]: Evan Johnson et al. "Довер ́яй, но провер ́яй: SFI Safety for native compiled Wasm." In NDSS 2021. <https://cseweb.ucsd.edu/~lerner/papers/wasm-sfi-ndss2021.pdf>

VeriWasm was designed to verify code produced by Lucet, another AOT Wasm compiler that has since been subsumed by Wasmtime and EOL'd.

VeriWasm works by lifting machine code to a simple IR that describes the semantics of the code in a partial way -- recovering just enough of the dataflow to be able to see all address computations -- and then performing iterative dataflow analysis over that IR using an analysis lattice.

The analysis values computed to describe program values include:

The basic idea of the analysis is to mark a result of any 32-bit instruction as Bounded4GB in the abstract interpretation, and then allow only memory accesses of the form [HeapBase + Bounded4GB].

Issues with porting VeriWasm to Wasmtime

The obvious next step is to port VeriWasm to work with Wasmtime instead of Lucet. To understand why this is difficult, it may be useful to know how Lucet's generated code differed from Wasmtime's. Both use Cranelift, but there are two key deltas:

Taken together, these two facts mean that we need a much more general analysis. In principle, we need to be able to:

Example

Here is the simplest possible dynamic bounds-check example, compiled for x86-64 by Wasmtime, with annotations:

0000000000000000 <_wasm_function_0>:
       0:   55                      push   %rbp
       1:   48 89 e5                mov    %rsp,%rbp
       4:   44 8b c2                mov    %edx,%r8d                    ;; save Wasm address, and truncate to 32 bits
       7:   4c 89 c0                mov    %r8,%rax
       a:   48 83 c0 04             add    $0x4,%rax                    ;; compute end: address, plus 4-byte load size
       e:   0f 83 02 00 00 00       jae    16 <_wasm_function_0+0x16>   ;; if that overflowed, trap
      14:   0f 0b                   ud2
      16:   48 8b 57 58             mov    0x58(%rdi),%rdx              ;; load memory-0 length from vmctx+0x58
      1a:   48 31 c9                xor    %rcx,%rcx                    ;; clear rcx, to use in Spectre guard below
      1d:   4c 03 47 50             add    0x50(%rdi),%r8               ;; add memory-0 base (loaded from vmctx+0x50)
      21:   48 39 d0                cmp    %rdx,%rax                    ;; compare computed end-offset to length
      24:   4c 0f 47 c1             cmova  %rcx,%r8                     ;; if out-of-bounds, move zeros (in rcx) over pointer; Spectre guard
      28:   41 8b 00                mov    (%r8),%eax                   ;; do the load
      2b:   48 89 ec                mov    %rbp,%rsp
      2e:   5d                      pop    %rbp
      2f:   c3                      ret

This is compiled from

(module
  (memory 0 10)
  (func (param i32) (result i32)
        local.get 0
        i32.load))

with wasmtime compile --dynamic-memory-guard-size 0 --static-memory-guard-size 0 --static-memory-maximum-size 0 test.wat.

There are several things going on here that we need to be able to analyze:

The basic issue here is that we need to (i) build up a massive body of facts about all register values in the program, and (ii) only later pattern-match uses of certain combinations of them to generate legal pointers into dynamically-bounded memory regions. So we need to generally symoblically summarize expressions, and we need some algebraic language of additions, scaling operations, and bounding operations that can be simplified, normalized, and used to do this recognition.

Dead-end Prototype: Symbolic Expressions

In this branch of VeriWasm (which goes with this branch of Wasmtime and this branch of yaxpeax-core), I prototyped a system that basically attempted the above. It defined an analysis lattice that could represent additions, scaling (multiplication by a constant), umin (unsigned min, a bounding operator, the basic nonlinear element), loads, known constant values, and a few miscellaneous atoms like vmctx, "some pointer into .text", and Unknown.

The analysis used a set of rewrite rules to try to simplify expressions: e.g., an arbitrary 32-bit instruction might produce UMin(Unknown, Const(0xffffffff)), then adding 4 might produce Add(Const(4), UMin(Unknown, Const(0xffffffff))) which simplifies to UMin(Unknown, Const(0x100000003)) because Add distributes over UMin. (Wrapping is handl
[message truncated]


Last updated: Nov 22 2024 at 16:03 UTC