Stream: cranelift

Topic: Running `wasmtime compile`-ed WASM in the alternative VM


view this post on Zulip Andrew Borg (aborg-dev) (Nov 28 2024 at 19:19):

Hello! I'm trying to use Cranelift infrastructure to compile WASM modules (generated from Rust) to run on a home-grown virtual machine (specialized for Zero-knowledge proving).

I'm still in an exploratory phase and I would really appreciate some feedback on the feasibility of my approach before I invest multiple months of work into it.

Motivation

This work is a continuation of zkAsm project that aims to generate Zero-Knowledge proofs for execution of WASM functions. There is an extended motivation document that goes into why this is useful if you are interested.

For the purpose of this discussion thread, I have a RISC-V like ISA called ZisK and I want to compile WASM modules to bare-metal binaries for this platform. I understand that WASM code expects to be executed in a VM, but solutions like wasm2c and w2c2 give me hope that generating a standalone executable is possible.

Plan

I know that writing a new Cranelift backend is a big undertaking, so right now I'm trying to guess-timate feasibility, complexity and performance characterisics of the resulting solution before diving into implementation.

Luckily, I already have a RISC-V -> ZisK translator, so I thought I can assemble the end-to-end prototype that goes like this:

  1. Compile Rust code to WASM module with rustc
  2. Compile WASM module to RISC-V ELF with wasmtime compile
  3. Translate ELF to ZisK bytecode
  4. Run the resulting code on ZisK emulator

Problems and ideas for solutions

I ran into a few problems:

  1. The bytecode generated in step (2) is relying on WASM VM, e.g. through passing and using vmctx in all functions
  2. The ELF file generated by wasmtime compile is not really a standalone executable, but more of a container for data necessary for the execution, so I need to re-assemble a ZisK binary out of it

For 1, I imagine I have two options:
A. Implement the support for vmctx in ZisK VM - this seems quite fragile, as I need to replicate an implementation of Vmctx struct and calls from wasmtime in ZisK
B. Get rid of/change vmctx usage by rewriting wasmtime_environ::Compiler and related structs/traits (FuncEnvironment, Call) - this seems like the right way to go and might even be less work

For 2, my naive idea was just to take .text section from the ELF, put it at address 0, strip out other executable sections. Then lay out data sections, heap and stack at the expected addresses.

Questions

If this is not the best format for this discussion, please let me know and I'm happy to create a Google Doc/Hackmd/GitHub Issue for this.

or
Contribute to 0xPolygonHermez/zisk development by creating an account on GitHub.

view this post on Zulip Chris Fallin (Nov 28 2024 at 19:36):

Hi @Andrew Borg (aborg-dev) -- very interesting project, and it's clear you've done some homework already. Unfortunately I don't think trying to repurpose the Wasmtime compilation pipeline to create artifacts usable in your own VM is a good long-term strategy.

As you've noted the generated code is intertwined with our VM data structures, accessing them directly. Not only would you have to write "another wasmtime" with exactly compatible data structures, but you'd have to keep it up to date as we evolve our engine -- and we do, without any concerns about backward compatibility, because these are internal implementation details. The code generator and runtime are meant to go together and aren't meant to be consumed separately.

At some point one might ask: why not adapt Wasmtime itself to suit your needs? (Port it to your custom ISA, or pull out the relevant pieces and adapt them, or ...)

Regarding your second question about the ELF sections -- honestly I sometimes wish we had used another container format because it's a constant temptation created by the "just close enough to native" images we generate to try to consume them like normal ELFs. It is convenient to be able to use objdump, and for gdb to understand them when loaded in-process, but otherwise as above they aren't meant to be consumed by anything but Wasmtime. The other sections you want to strip out are sometimes necessary for actual execution correctness: for example, when using Wasm GC, the metadata includes stackmaps that are needed by the runtime to trace GC refs on the stack; and when a Wasm trap occurs, metadata to indicate what it is; and metadata to describe what memories, tables and other data structures to create at startup (feeding into your first question). For the last one in particular: there is no one vmctx format, it depends on the particular module with a dynamically computed layout based on this metadata.

Overall I suspect there's a bit of an X-Y problem here: I can't help but wonder why your own VM is needed vs Wasmtime. (Or if it is, the right path is to avoid the shortcut and develop your own code generator for your own VM.) Maybe you could say more there?

view this post on Zulip Andrew Borg (aborg-dev) (Nov 29 2024 at 13:41):

Thank you for a detailed answer, @Chris Fallin .
It overall confirms my feeling that my current approach is straying quite far from the well-supported path and I would very much like to avoid this.

I agree, there is a X-Y problem here, so let me step back and state my high level goals.

My primary goal is to efficiently generate Zero-Knowledge proofs for Rust programs. Specifically, Rust programs with a really narrow interface to the outside system: ability read and write bytes from stdin/stdout and a few other simple interactions like access to a source of randomness.
For the purpose of this discussion, you can think of ZK proof as a succinct cryptographic commitment (a few hundreds of bytes) that claims that running a given Rust program on a given input results in a given output. The important property is that this commitment can't be falsified - if it can be created, then with a very high probability the Rust program indeed behaved as stated.

We know how to create these proofs for facts about mathematical objects like high-degree polynomials. But the gap between polynomials and Rust programs is quite large, so we introduce an intermediate step - custom CPU architectures (usually called ZKVMs) like ZisK or Valida that still can be efficiently "lowered" to the polynomial representation but are a better target for compilation of Rust program.

One notable approach is to use an existing ISA like RISC-V, which is still simple enough to convert to polynomials but is already a common compilation target. This approach has a lot of nice properties, but we think it leaves a lot of performance (1.5x-5x) on the table because of the mismatch between cost models in real RISC-V CPUs and the one implemented as ZKVM.
So we think we'll be able to get a better performance with a custom ISA called ZisK (although I'm open to revise this thesis).

To be precise, we ultimately care about performance of proof generation. But it is proportional to c * steps(P), where steps(P) is a number of instructions executed to run the program P, and the constant c depends on the complexity of ZKVM instruction set (overhead of conversion to polynomials). For simple ISAs like RISC-V the constant will be close to 1.

Now the question effectively becomes: How do you build a compiler from Rust to ZisK that optimizes for performance of ZisK programs?

My thinking process from this point that leads to Cranelift/Wasmtime is:

I hope this provides the necessary context. I'm not tied to any of technical choices made so far and ready to make as many steps back as necessary in that thinking process and try other avenues.

Valida: a STARK-based VM focused on code reuse, performance, and modularity
The RISC Zero zero-knowledge virtual machine [zkVM] (zkVM) lets you [prove]
RISC Zero's zkVM is designed and built to act like a physical CPU. We did this
Cranelift based backend for rustc. Contribute to rust-lang/rustc_codegen_cranelift development by creating an account on GitHub.
What is Cranelift's job (in the context of Wasmtime)? To take Wasm that is produced by LLVM and already optimized 99% of the time and do the architecture-specific code generation that LLVM cannot d...

view this post on Zulip Chris Fallin (Nov 29 2024 at 19:28):

Very interesting, thanks for the additional detail!

I suspect that trying to repurpose LLVM+Wasm+Wasmtime's compiler frontend without Wasmtime runtime+Cranelift as an "easier" compiler pipeline than plain LLVM might be a bit of a false promise here: what you're finding is that there are impedance mismatches due to taking the program through Wasm form. In particular, all the details that the Wasmtime-compiled code requires of the runtime will basically require you to write a new Wasm runtime, and make that runtime work with respect to your new VM. You didn't set out to build all the Wasm abstractions, you set out to run user Rust code; Wasmtime is engineer-years of work and even mocking a fraction of it will be a significant effort, nevermind the issues with keeping your "fake Wasmtime" up to date as mentioned above. Alternately, actually port Wasmtime to run within your zkVM, but that seems like a huge effort as well.

In summary I'd recommend pursuing a straight port of LLVM -- it's superficially scary yes, but conceptually that approach is way way way simpler than what you're trying to do here. All the best!

view this post on Zulip Andrew Borg (aborg-dev) (Dec 03 2024 at 17:10):

Alright, you convinced me to seriously investigate the LLVM direction :)

To close this off, let me make sure that I understand the main costs of Cranelift-based solution so that I can communicate it properly to my team.

The main intention of existing Wasmtime/Cranelift compilation pipeline is to generate bytecode that will run inside wasmtime VM.
For ZisK project we will need to break that core assumption, as we agree that maintaining compatibility with a moving wasmtime target is a non-starter.

The three main components that we will need to (re)write and maintain are:

  1. WASM frontend that generates ZisK-compatible CLIF - ~5k lines of Rust code
  2. ZisK backend/codegen - ~5k-20k lines of code (Rust + ISLE)
  3. ZisK assembler/linker - outputs a ZisK binary from a WASM module and ZisK bytecode. Not sure about the size here.

These components are very unlikely to make it back to upstream due to specialized nature, so we will need to maintain a wasmtime fork and periodically sync it with the upstream.

Would you say this sums it up? I see that you mentioned building a new WASM runtime, but in my mind it is implicitly generated in steps 1-3?

view this post on Zulip Chris Fallin (Dec 03 2024 at 17:13):

Yes, that's a lower bound on the work I think. My "new Wasm runtime" phrase was referring to the missing piece implied by your point 1: when the Wasm module contains a Wasm memory access, or an indirect call through a table, or a funcref, what do you do? These are handled by exactly the parts of the current Wasm-to-CLIF translator that refer to Wasmtime data structures (vmctx and everything reachable from it) that we've talked about above.


Last updated: Dec 23 2024 at 13:07 UTC