cfallin opened issue #6090:
I've been working on VeriWasm in Wasmtime the last few weeks, and decided to dump a summary of my investigation so far and ideas for the right approach to take. (I'm doing this partly because I've been asked to context switch back to something else, now that it's clear that VeriWasm proper cannot work and we need a new tool; I estimate it would take ~1 month to build what I describe in the last section below.)
(This may be better as an RFC, and I can clean it up, expand it, and post it as one once I have time to properly do so!)
Summary
We should include built-in support in Wasmtime to verify that compiled code from Wasm modules is accessing only the memory that it is supposed to access. Such verification would take the form of ahead-of-time static analysis on the code, so that it would not require dynamic checks; and it should trust as little as possible in the existing compiler stack, to increase the confidence we have in its conclusions.
The ideal would be to verify machine code bytes with no other input, but in practice, Wasmtime will have to tell the tool what to assume about details such as runtime data structures; and certain kinds of analysis, like full recovery of a CFG in the presence of indirect branches, may require whole-program analysis that is too expensive to be practical.
This issue proposes a scheme that embeds a notion of "memory capabilities" in CLIF, along with (static) assertions on values (mostly with respect to ranges), and carries these through to the VCode level in Cranelift. The Wasm lowering would be modified to annotate every load/store with an associated memory capability that permits it; and we can then check the uses of these capabilities at the other end of the pipeline, after instruction selection is done. This is inspired by, but in some sense the dual to, the approach taken by VeriWasm.
History: VeriWasm
In Johnson et al.[^1], the VeriWasm tool is described, with an approach that performs a static analysis of disassembled machine code to prove that all heap accesses in AOT-compiled Wasm code are properly to the Wasm heap, and cannot escape the sandbox. (The tool also checks other properties, such as control-flow integrity, but that is outside the scope of this issue.)
[^1]: Evan Johnson et al. "Довер ́яй, но провер ́яй: SFI Safety for native compiled Wasm." In NDSS 2021. <https://cseweb.ucsd.edu/~lerner/papers/wasm-sfi-ndss2021.pdf>
VeriWasm was designed to verify code produced by Lucet, another AOT Wasm compiler that has since been subsumed by Wasmtime and EOL'd.
VeriWasm works by lifting machine code to a simple IR that describes the semantics of the code in a partial way -- recovering just enough of the dataflow to be able to see all address computations -- and then performing iterative dataflow analysis over that IR using an analysis lattice.
The analysis values computed to describe program values include:
HeapBase
: this value is the heap base;Bounded4GB
: this value is less than 2^32;- and some others (e.g. for RIP-relative constant loads) that we won't discuss here.
The basic idea of the analysis is to mark a result of any 32-bit instruction as
Bounded4GB
in the abstract interpretation, and then allow only memory accesses of the form[HeapBase + Bounded4GB]
.Issues with porting VeriWasm to Wasmtime
The obvious next step is to port VeriWasm to work with Wasmtime instead of Lucet. To understand why this is difficult, it may be useful to know how Lucet's generated code differed from Wasmtime's. Both use Cranelift, but there are two key deltas:
Lucet used simpler data structures: its implicit first argument to every Wasm function was a pointer to the start of the (one single) Wasm memory, and this memory was always statically-bounded; there was no separate
vmctx
, and no multi-memory or multi-table support. Furthermore Lucet was normally used (and VeriWasm only supported) static bounds-checking, i.e., use of guard regions up to 4GiB heap size.The version of Cranelift used at the time that Lucet and VeriWasm were developed was much simpler than what we have today. In particular, the address-mode support was fairly primitive and predictable: heap accesses were always of the form
[rBase + rOffset]
, whererBase
was the heap base andrOffset
was the offset (into the Wasm heap), computed separately. In contrast, Wasmtime uses modern Cranelift which can fold more work into addressing modes and perform more clever instruction selection. It also includes Spectre-guard logic, which is difficult to analyze from first principles without pattern-matching the whole thing.Taken together, these two facts mean that we need a much more general analysis. In principle, we need to be able to:
- Compute a summary of an accessed memory address that symbolically encodes its base (possibly from a field in a vmctx structure), its offset, and the range of that offset (possibly from another field in a vmctx structure);
- Reason about value ranges, additions, and scaling (left-shift/multiply) operations, all on the field of integers modulo 2^32 or 2^64 (i.e., accounting for wraparound);
- Solve inequalities, to show that values are within dynamically-defined bounds.
Example
Here is the simplest possible dynamic bounds-check example, compiled for x86-64 by Wasmtime, with annotations:
0000000000000000 <_wasm_function_0>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 44 8b c2 mov %edx,%r8d ;; save Wasm address, and truncate to 32 bits 7: 4c 89 c0 mov %r8,%rax a: 48 83 c0 04 add $0x4,%rax ;; compute end: address, plus 4-byte load size e: 0f 83 02 00 00 00 jae 16 <_wasm_function_0+0x16> ;; if that overflowed, trap 14: 0f 0b ud2 16: 48 8b 57 58 mov 0x58(%rdi),%rdx ;; load memory-0 length from vmctx+0x58 1a: 48 31 c9 xor %rcx,%rcx ;; clear rcx, to use in Spectre guard below 1d: 4c 03 47 50 add 0x50(%rdi),%r8 ;; add memory-0 base (loaded from vmctx+0x50) 21: 48 39 d0 cmp %rdx,%rax ;; compare computed end-offset to length 24: 4c 0f 47 c1 cmova %rcx,%r8 ;; if out-of-bounds, move zeros (in rcx) over pointer; Spectre guard 28: 41 8b 00 mov (%r8),%eax ;; do the load 2b: 48 89 ec mov %rbp,%rsp 2e: 5d pop %rbp 2f: c3 ret
This is compiled from
(module (memory 0 10) (func (param i32) (result i32) local.get 0 i32.load))
with
wasmtime compile --dynamic-memory-guard-size 0 --static-memory-guard-size 0 --static-memory-maximum-size 0 test.wat
.There are several things going on here that we need to be able to analyze:
- We need to be able to represent each value symbolically:
- Wasm address, which we know is within 32-bit range (
Bounded4GB
from VeriWasm's Lucet analysis);- Wasm address plus 4, which we'll connect with the size of the load later;
- somehow preserve that property by understanding the jae/ud2 trap sequence and flags output of
add
;- know that
rcx
being zero is significant later (so, track either aZero
state or all known constant values);- represent "memory base plus offset known to be up to 4GB plus 4";
- understand the comparison between the length and
rax
(which up to this point is just an arbitrary 32-bit value in the program), and join this with thecmova
below ti understand the "clamp to zero if out of bounds" idiom; then represent "pointer to this offset in heap, unless this offset plus 4 is above this other value"... symbolically- use all of the above facts to validate that the load of
r8
at offset 0x28 is legal.The basic issue here is that we need to (i) build up a massive body of facts about all register values in the program, and (ii) only later pattern-match uses of certain combinations of them to generate legal pointers into dynamically-bounded memory regions. So we need to generally symoblically summarize expressions, and we need some algebraic language of additions, scaling operations, and bounding operations that can be simplified, normalized, and used to do this recognition.
Dead-end Prototype: Symbolic Expressions
In this branch of VeriWasm (which goes with this branch of Wasmtime and this branch of
yaxpeax-core
), I prototyped a system that basically attempted the above. It defined an analysis lattice that could represent additions, scaling (multiplication by a constant),umin
(unsigned min, a bounding operator, the basic nonlinear element), loads, known constant values, and a few miscellaneous atoms likevmctx
, "some pointer into.text
", andUnknown
.The analysis used a set of rewrite rules to try to simplify expressions: e.g., an arbitrary 32-bit instruction might produce
UMin(Unknown, Const(0xffffffff))
, then adding4
might produceAdd(Const(4), UMin(Unknown, Const(0xffffffff)))
which simplifies toUMin(Unknown, Const(0x100000003))
becauseAdd
distributes overUMin
. (Wrapping is handl
[message truncated]
martin-fink commented on issue #6090:
Hi!
Seeing that you have started to work on this, I just wanted to chime in that I've been working on a 'memory access verifier' too! :)My approach follows what veriwasm and your prototype do: I build 'symbolic values' that model address computations/bounds checks and solve inequalities on them. As you have mentioned above, that is quite complex and not super scalable.
The code is quite hacky and there's much to be improved in terms of performance and elegance, but it works for static and dynamic heaps (with or without spectre guards) and detects if e.g. https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-ff4p-7xrq-q5r8 is reintroduced into the code base when fuzzing with the checker enabled.
You can find it at https://github.com/martin-fink/wasmtime.I'm very excited to see how your PCC approach will differ from what I've built. :)
cfallin commented on issue #6090:
Woah, that's really neat -- I wish I had known about this earlier!
Looking it over, I definitely agree that
SymbolicValue
is close to what I had tried with my first "veriwasm 2.0" branch, an AST of of operator nodes. I'm hopeful that the current PCC approach will be more scalable since the assertions (facts) aren't recursive ASTs but shallow facts instead, and the producer is expected to leave a flattened sequence of assertions along each intermediate value in the CLIF. I've sketched out the dynamic-memories approach I want to take and it should keep to a fairly narrow set of assertion types I think, without a general inequality sublanguage or solver, tailored to the code that cranelift-wasm actually generates.It's really cool to see a working design point that can validate dynamic memories as well; thanks for this and I'll study it in more detail!
cfallin closed issue #6090:
I've been working on VeriWasm in Wasmtime the last few weeks, and decided to dump a summary of my investigation so far and ideas for the right approach to take. (I'm doing this partly because I've been asked to context switch back to something else, now that it's clear that VeriWasm proper cannot work and we need a new tool; I estimate it would take ~1 month to build what I describe in the last section below.)
(This may be better as an RFC, and I can clean it up, expand it, and post it as one once I have time to properly do so!)
Summary
We should include built-in support in Wasmtime to verify that compiled code from Wasm modules is accessing only the memory that it is supposed to access. Such verification would take the form of ahead-of-time static analysis on the code, so that it would not require dynamic checks; and it should trust as little as possible in the existing compiler stack, to increase the confidence we have in its conclusions.
The ideal would be to verify machine code bytes with no other input, but in practice, Wasmtime will have to tell the tool what to assume about details such as runtime data structures; and certain kinds of analysis, like full recovery of a CFG in the presence of indirect branches, may require whole-program analysis that is too expensive to be practical.
This issue proposes a scheme that embeds a notion of "memory capabilities" in CLIF, along with (static) assertions on values (mostly with respect to ranges), and carries these through to the VCode level in Cranelift. The Wasm lowering would be modified to annotate every load/store with an associated memory capability that permits it; and we can then check the uses of these capabilities at the other end of the pipeline, after instruction selection is done. This is inspired by, but in some sense the dual to, the approach taken by VeriWasm.
History: VeriWasm
In Johnson et al.[^1], the VeriWasm tool is described, with an approach that performs a static analysis of disassembled machine code to prove that all heap accesses in AOT-compiled Wasm code are properly to the Wasm heap, and cannot escape the sandbox. (The tool also checks other properties, such as control-flow integrity, but that is outside the scope of this issue.)
[^1]: Evan Johnson et al. "Довер ́яй, но провер ́яй: SFI Safety for native compiled Wasm." In NDSS 2021. <https://cseweb.ucsd.edu/~lerner/papers/wasm-sfi-ndss2021.pdf>
VeriWasm was designed to verify code produced by Lucet, another AOT Wasm compiler that has since been subsumed by Wasmtime and EOL'd.
VeriWasm works by lifting machine code to a simple IR that describes the semantics of the code in a partial way -- recovering just enough of the dataflow to be able to see all address computations -- and then performing iterative dataflow analysis over that IR using an analysis lattice.
The analysis values computed to describe program values include:
HeapBase
: this value is the heap base;Bounded4GB
: this value is less than 2^32;- and some others (e.g. for RIP-relative constant loads) that we won't discuss here.
The basic idea of the analysis is to mark a result of any 32-bit instruction as
Bounded4GB
in the abstract interpretation, and then allow only memory accesses of the form[HeapBase + Bounded4GB]
.Issues with porting VeriWasm to Wasmtime
The obvious next step is to port VeriWasm to work with Wasmtime instead of Lucet. To understand why this is difficult, it may be useful to know how Lucet's generated code differed from Wasmtime's. Both use Cranelift, but there are two key deltas:
Lucet used simpler data structures: its implicit first argument to every Wasm function was a pointer to the start of the (one single) Wasm memory, and this memory was always statically-bounded; there was no separate
vmctx
, and no multi-memory or multi-table support. Furthermore Lucet was normally used (and VeriWasm only supported) static bounds-checking, i.e., use of guard regions up to 4GiB heap size.The version of Cranelift used at the time that Lucet and VeriWasm were developed was much simpler than what we have today. In particular, the address-mode support was fairly primitive and predictable: heap accesses were always of the form
[rBase + rOffset]
, whererBase
was the heap base andrOffset
was the offset (into the Wasm heap), computed separately. In contrast, Wasmtime uses modern Cranelift which can fold more work into addressing modes and perform more clever instruction selection. It also includes Spectre-guard logic, which is difficult to analyze from first principles without pattern-matching the whole thing.Taken together, these two facts mean that we need a much more general analysis. In principle, we need to be able to:
- Compute a summary of an accessed memory address that symbolically encodes its base (possibly from a field in a vmctx structure), its offset, and the range of that offset (possibly from another field in a vmctx structure);
- Reason about value ranges, additions, and scaling (left-shift/multiply) operations, all on the field of integers modulo 2^32 or 2^64 (i.e., accounting for wraparound);
- Solve inequalities, to show that values are within dynamically-defined bounds.
Example
Here is the simplest possible dynamic bounds-check example, compiled for x86-64 by Wasmtime, with annotations:
0000000000000000 <_wasm_function_0>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 44 8b c2 mov %edx,%r8d ;; save Wasm address, and truncate to 32 bits 7: 4c 89 c0 mov %r8,%rax a: 48 83 c0 04 add $0x4,%rax ;; compute end: address, plus 4-byte load size e: 0f 83 02 00 00 00 jae 16 <_wasm_function_0+0x16> ;; if that overflowed, trap 14: 0f 0b ud2 16: 48 8b 57 58 mov 0x58(%rdi),%rdx ;; load memory-0 length from vmctx+0x58 1a: 48 31 c9 xor %rcx,%rcx ;; clear rcx, to use in Spectre guard below 1d: 4c 03 47 50 add 0x50(%rdi),%r8 ;; add memory-0 base (loaded from vmctx+0x50) 21: 48 39 d0 cmp %rdx,%rax ;; compare computed end-offset to length 24: 4c 0f 47 c1 cmova %rcx,%r8 ;; if out-of-bounds, move zeros (in rcx) over pointer; Spectre guard 28: 41 8b 00 mov (%r8),%eax ;; do the load 2b: 48 89 ec mov %rbp,%rsp 2e: 5d pop %rbp 2f: c3 ret
This is compiled from
(module (memory 0 10) (func (param i32) (result i32) local.get 0 i32.load))
with
wasmtime compile --dynamic-memory-guard-size 0 --static-memory-guard-size 0 --static-memory-maximum-size 0 test.wat
.There are several things going on here that we need to be able to analyze:
- We need to be able to represent each value symbolically:
- Wasm address, which we know is within 32-bit range (
Bounded4GB
from VeriWasm's Lucet analysis);- Wasm address plus 4, which we'll connect with the size of the load later;
- somehow preserve that property by understanding the jae/ud2 trap sequence and flags output of
add
;- know that
rcx
being zero is significant later (so, track either aZero
state or all known constant values);- represent "memory base plus offset known to be up to 4GB plus 4";
- understand the comparison between the length and
rax
(which up to this point is just an arbitrary 32-bit value in the program), and join this with thecmova
below ti understand the "clamp to zero if out of bounds" idiom; then represent "pointer to this offset in heap, unless this offset plus 4 is above this other value"... symbolically- use all of the above facts to validate that the load of
r8
at offset 0x28 is legal.The basic issue here is that we need to (i) build up a massive body of facts about all register values in the program, and (ii) only later pattern-match uses of certain combinations of them to generate legal pointers into dynamically-bounded memory regions. So we need to generally symoblically summarize expressions, and we need some algebraic language of additions, scaling operations, and bounding operations that can be simplified, normalized, and used to do this recognition.
Dead-end Prototype: Symbolic Expressions
In this branch of VeriWasm (which goes with this branch of Wasmtime and this branch of
yaxpeax-core
), I prototyped a system that basically attempted the above. It defined an analysis lattice that could represent additions, scaling (multiplication by a constant),umin
(unsigned min, a bounding operator, the basic nonlinear element), loads, known constant values, and a few miscellaneous atoms likevmctx
, "some pointer into.text
", andUnknown
.The analysis used a set of rewrite rules to try to simplify expressions: e.g., an arbitrary 32-bit instruction might produce
UMin(Unknown, Const(0xffffffff))
, then adding4
might produceAdd(Const(4), UMin(Unknown, Const(0xffffffff)))
which simplifies toUMin(Unknown, Const(0x100000003))
becauseAdd
distributes overUMin
. (Wrapping is handl
[message truncated]
Last updated: Nov 22 2024 at 16:03 UTC