Stream: wasm

Topic: WebAssembly workshop talk on correctness+security


view this post on Zulip Chris Fallin (Jan 21 2025 at 06:49):

I just gave a talk today at the WebAssembly Workshop (WAW 2025) at POPL on correctness efforts in Wasmtime+Cranelift (and a tiny bit about how weval can also be seen as a correctness effort). The recording is here if anyone is interested in watching; I'll post slides later as well.

(That's a timestamped link at 40m53s into the whole day workshop recording, which has a bunch of other interesting talks I'd recommend watching too -- full program here)

WebAssembly is a general-purpose low-level virtual machine. It was the first programming language to be introduced to the Web since JavaScript, and has since been adopted in many other environments, such as edge and cloud computing, mobile computing, blockchains, and embedded systems. Unusually for an industrial language, WebAssembly’s normative specification is stated fully in terms of a pen-and-paper formal semantics. In addition, multiple mechanisations of this semantics have been created and used to prove the soundness of the WebAssembly type system. The rigor of WebAssembly’s semanti ...

view this post on Zulip Robin Freyler (Jan 21 2025 at 13:34):

Great talk and very interesting topic!
One thing that confused me a bit was here: YouTube - - YouTube
You spoke about how important is linear time verifiability because one would not want a Wasm module to load in quadratic time or worse. However, from what I know (and I might very well be wrong) is that Wasm validation (part of loading) already is quadratic since multi-value Wasm was enabled.
E.g. checking the type of stack manipulating control flow and calls is non-linear with multi-value Wasm. At least in all implementations I know it is quadratic in the worst-case.

view this post on Zulip Chris Fallin (Jan 21 2025 at 16:24):

@Robin Freyler good question! I actually was not aware of quadratic complexity of multi-value validation, though I can see how that could be the case with a particular implementation (linear-time equality on signature rather than constant interned signature ID). I believe wasmparser should be able to do the check in constant time (so linear-time validation overall) because it has an interned ID for the block type (here) rather than an explicit list of params/results. @Alex Crichton could confirm for sure that the implementation is as I suspect...

I'll note in the talk though that I was coming from the Wasm standards angle, where the spec has definitely been careful to ensure linear-time validation is possible, and that's a topic that comes up from time to time in discussion of extensions. Cranelift overall is definitely not a linear-time compiler but with a validator it's a desirable property not to add new superlinear blowups, which was my main point :-)

view this post on Zulip Alex Crichton (Jan 21 2025 at 16:31):

Oh I wasn't aware of any quadratic behavior in wasmparser's validator myself, but I could also very well have missed something

view this post on Zulip Chris Fallin (Jan 21 2025 at 16:33):

A slightly more nuanced additional thought as well here after reading some code: a block with a signature, when branched to, will involve pushing/popping types on the type stack (e.g. here for block); but when branching, a valid module will also have needed to produce those values somehow, and the block consume them; those instructions overall are linear, so the module size also must scale up. In other words, the validation cost is still linear overall in code size

CLI and Rust libraries for low-level manipulation of WebAssembly modules - bytecodealliance/wasm-tools

view this post on Zulip Chris Fallin (Jan 21 2025 at 16:33):

(seen another way: consider any per-parameter work when handling block signature validation to be amortized/accounted against the producer and consumer of that particular value instead)

view this post on Zulip Chris Fallin (Jan 21 2025 at 16:34):

so I think we're good, and I'd be very surprised if the Wasm spec itself missed a superlinear validation issue like this (@Robin Freyler please do let us know if we're missing something!)

view this post on Zulip Robin Freyler (Jan 21 2025 at 17:53):

I will try to come up with a Wasm file that I think could be problematic:

(module
    (func
        (param
            i32 i32 i32 i32 i32
            i32 i32 i32 i32 i32
        )
        (result
            i32 i32 i32 i32 i32
            i32 i32 i32 i32 i32
        )

        (local.get 0)
        (local.get 1)
        (local.get 2)
        (local.get 3)
        (local.get 4)
        (local.get 5)
        (local.get 6)
        (local.get 7)
        (local.get 8)
        (local.get 9)
        (block ;; block is not even necessary but makes it simpler to follow maybe
            (param
                i32 i32 i32 i32 i32
                i32 i32 i32 i32 i32
            )
            (result
                i32 i32 i32 i32 i32
                i32 i32 i32 i32 i32
            )
            ;; Some conditional returns with multiple inputs/outputs
            ;; A Wasm validation implementation might be quadratic here,
            ;; because encoding a new conditional branch will have constant
            ;; size but validation overhead in terms of the number of inputs
            ;; and outputs defined once above. The validation stack has to be
            ;; manipulated for each conditional branch in the general case for
            ;; each input and output to check if all inputs and outputs match
            ;; their expected types.
            ;;
            ;; This is a super simplified example that could easily be mutated
            ;; in order to make it hard to use heuristics to avoid such quadratic
            ;; cases.
            (br_if 0 (local.get 0))
            (br_if 0 (local.get 0))
            (br_if 0 (local.get 0))
            ;; etc..
            (br_if 0 (local.get 0))
            (br_if 0 (local.get 0))
        )
    )
)

(Not checked if this is syntactically valid Wasm tbh.)

view this post on Zulip Chris Fallin (Jan 21 2025 at 18:45):

Fascinating, yeah, I had forgotten that br_if effectively duplicates its args (leaves them on the stack for the not-taken path). I might recommend detaching this thread / putting it under a separate topic since it is a general Wasm validation performance question/issue rather than specific to my talk.

It'd be interesting to see how much blowup this can actually cause (aside from the bad theoretical asymptotic bound) -- I tried expanding your example to 50 params and 1M lines of br_if, and wasm-tools validate still validates the wat in 0.41s on my (M1 Air) laptop

view this post on Zulip Alex Crichton (Jan 21 2025 at 23:24):

Wanted to say as well great talk Chris! Just listened to it and it's a great explanation of the space!

view this post on Zulip Chris Fallin (Jan 22 2025 at 04:08):

thanks! now I just need to acquire free time somehow to tie up all the loose ends and actually ship PCC...

view this post on Zulip fitzgen (he/him) (Jan 22 2025 at 20:42):

great talk!


Last updated: Feb 28 2025 at 03:10 UTC