Hey all,
If you've been keeping a close eye on the pull request list, you may have noticed https://github.com/bytecodealliance/wasmtime/pull/6898. This PR was an accident of sorts, but the experiments with the zkASM backend did continue making progress since. We've made enough progress that the backend is now able to produce valid code for very basic wasm programs (although the code is pretty MVP quality.)
As we're getting closer to dedicating more time to this, I wanted to do a temperature check of sorts. How comfortable you all would be with a backend like this being developed in the upstream repository? Or maybe perhaps there is another, better to develop a target, other than maintaining a fork, that we haven't thought of?
This target would be unique first and foremost because it is not targeting, and is unlikely to ever target, any particular piece of hardware. Instead, the machine executing the produced code is implemented in software. It is also a pretty unique machine in that unlike regular architectures it does not actually execute machine instructions in the traditional sense; it also does not only produce effects of executing a program, but also generates additional piece of data that allows to prove (more quickly than by executing the program itself) that the effects produced are indeed a result of executing a that program. Hence zero-knowledge in zkASM.
From top of my head the caveats are that our current prototype:
Why do we want to work upstream so relatively soon? Well, there are a couple of things we hope to achieve by doing this: first most of us working on the backend don't really have any working experience with the cranelift internals, so it would be great to have some guidance, and we felt that having the changes to show up as PRs on the upstream repository would hopefully give us an opportunity to ask for insights more easily (although we don't necessarily expect anybody to dedicate their precious time actually responding;) and more importantly because it would make it easier for us to keep up with the relevant changes in cranelift as they occur upstream.
Any questions? Glad to try to answer or, failing that, get people who know the answers chime in :smile:
From a project perspective two things that may interest you are:
ah, and meetings link
One other thing that would be good to understand (it's a little unclear from the zkASM link you gave): what's the expected use-case?
For "exists as silicon in the physical world" ISAs, this is fairly self-evident, as long as the architecture is still in use: e.g. I don't think there's any question that we'd accept an arm32 backend, or even something a little rarer now like mips32, as long as all the other tier 3 reqs (maintainers, ...) are met. But for more niche virtual ISAs, we'd want to consider the use-cases against the maintenance cost: e.g. are we enabling one very specific user or is this something that could enable a bunch of folks to experiment and make use of Cranelift in new ways.
Another thing to consider with less-common ISAs is what tooling and documentation exists: for all our ports currently, we can use qemu to test, we can build ELF binaries of Wasmtime for linux/<CPU> and test them, we can debug, there are ISA manuals. It'd be good to understand where all of that stands with zkASM as well
Chris Fallin said:
One other thing that would be good to understand (it's a little unclear from the zkASM link you gave): what's the expected use-case?
Great question :)
At a really high-level, WASM -> zkASM compiler paired with zkASM prover (already exists) would allow generating short proofs that a given WASM program produced a given result and the proof could be cheaply verified by anyone.
The main use-case is verifiably outsourcing computation - you can ask someone to run an expensive program for you and give you the result together with a proof of a computation. This way you can be sure that they indeed ran your program and not just gave you some random output. This turns out to be useful in many different contexts, to name a few:
I wrote a more concrete explanation in: https://hackmd.io/DK6S9vsbQ9GAmBJEOIwh4w?view#Background, let me know if it makes sense!
Chris Fallin said:
Another thing to consider with less-common ISAs is what tooling and documentation exists: for all our ports currently, we can use qemu to test, we can build ELF binaries of Wasmtime for linux/<CPU> and test them, we can debug, there are ISA manuals. It'd be good to understand where all of that stands with zkASM as well
There is already a decent amount of tooling around ZK ASM:
We will most likely extend this tooling with Rust implementation/bindings to make testing in Rust-based projects simpler.
Alex Crichton said:
- There's weekly Cranelift meetings which might be a good venue to discuss this with other Cranelift maintainers (if necessary beyond Zulip, which of course also works)
Will you be open for us to give a quick (10-15 min) presentation to provide more context on one these meetings (e.g. September 27th) ?
Andrei Kashin (akashin) said:
Will you be open for us to give a quick (10-15 min) presentation to provide more context on one these meetings (e.g. September 27th) ?
You're always welcome to add agenda items -- please feel free to submit a PR to the bytecodealliance/meetings
repo to add an item under cranelift/2023/
in the file for whichever date you prefer.
Chris Fallin said:
Andrei Kashin (akashin) said:
Will you be open for us to give a quick (10-15 min) presentation to provide more context on one these meetings (e.g. September 27th) ?
You're always welcome to add agenda items -- please feel free to submit a PR to the
bytecodealliance/meetings
repo to add an item undercranelift/2023/
in the file for whichever date you prefer.
I see! Sent https://github.com/bytecodealliance/meetings/pull/161, let me know if you need any more details there.
Hi @Andrei Kashin (akashin) , excuse me if my question is off topic but what prover are you using?
Really nice initiative btw.
Andrei Kashin (akashin) said:
At a really high-level, WASM -> zkASM compiler paired with zkASM prover (already exists) would allow generating short proofs that a given WASM program produced a given result and the proof could be cheaply verified by anyone.
I think there might be some hidden assumptions here that are going to affect how well you can prove that a compiled Wasm program is computing anything. I want to help uncover them and make a bunch of stuff explicit so that we are all on the same page and you understand what you're asking and what you're getting into.
The biggest thing to note is that cranelift doesn't compile Wasm in isolation:
cranelift-wasm
crate to translate Wasm to CLIF. but CLIF by itself can't implement Wasm's semantics. Wasm translated to CLIF needs access to the external world to, for example, load and store to the native memory regions that implement the Wasm linear memories. How to get those regions and where they are placed in the native address space is something you have to configure in cranelift-wasm
and is very specific to the runtime you're using.memory.grow
instruction is implemented in Rust code as a libcall that the compiled Wasm calls out toI know I am coming across somewhat pessimistic, but I really want folks to understand that this isn't a thing where one could "just" add a new zkasm Cranelift backend to get provable Wasm computations. There is a lot more involved here.
Thank you for taking the time to bring these concerns to our attention!
this also implies that all of Wasmtime needs to be available in zkasm since the compiled code relies on calling directly to Wasmtime's runtime functionality.
As a background, we both come from a team that's been working on a (fork of a) WASM runtime, so we do have the potential complications in the areas you've raised on our mind as well.
zkasm, specifically, does have ways to call out to external code for complicated behaviours and mechanisms to write down proofs for those call outs. As far as I am aware, even more complicated operations that are traditionally regular instructions, such as multiplication and division, are expected to be implemented using this mechanism at this moment in time (although this might change in the future, thus basically shifting the burden of proof to a different place.)
I guess the major thing that remains to be seen is how to best approach providing/linking in proofs for not only the libcalls but also the user provided imports/exports. For our immediate use-case that wasn't a big deal since our interface is locked-in, while for something general purpose like wasmtime proper there would need to be a generic solution that anybody can use. And I have no idea how to best approach this at this moment in time!
Juan Bono said:
Hi Andrei Kashin (akashin) , excuse me if my question is off topic but what prover are you using?
Really nice initiative btw.
Thank you :)
We will be using PIL (Polynomial Identity Langugage) to describe constraints and generate STARKs and Plonky2 as a proving system (based on FRI).
I think we'll try to stay reasonably flexible on this front as new proving systems appear at a fast pace, but for the first version, we'll focus on this proving stack.
So my summary takeaway is:
memory.grow
and make sure it works well and is generally applicable.Filed the issue at https://github.com/bytecodealliance/wasmtime/issues/7124
If this is possible to verify an expected computation, why does it need to be a different isa?
The more I think about the whole purpose of the project, the more I think it just isn't possible. In order to verify a cryptic hash function its input and output, you must run the hash function. Otherwise, it wasn't a very good hash function. If its just using the public key to verify the digest, that operation is cheap.
Moreover, the zkasm is specific to things related to cryptography, not a real isa.
@Chris Clark the "not a real [I might say 'physical'] ISA" bit is the main reason we prefer not to upstream it at the moment, since it's so different from other ISAs with a full OS stack, ABI, linkers and debuggers, and the like. In that sense I'd agree.
However I would also note and defend the idea a bit that zero-knowledge proofs are a pretty deep research area, and there's something novel and useful there. (It's pretty counterintuitive -- it really is "zero knowledge", i.e. you don't know the input, but it's probabilistic.) Not to pass any judgment on whether or how it fits into Cranelift but I want to make sure we are fair to the idea and those who work hard on it :-)
My apologies. Countering my statement of it "just isn't possible", is probably that entire area of research. I will leave that up to the experts, and meant no ill-will.
To answer "why does it need to be a different ISA" is probably quite multifaceted. To motivate "why ISA at all" first: Zero Knowledge proofs use some pretty complicated math behind the scenes I am not really able to fully grok despite my involvement in a relevant project. The way I view zkASM is that should be considered largely as a mechanism to simplify expressing the computations to prove in a way that’s more familiar to the broader groups (such as that of Software Engineers). One major benefit, just like with many simplifying abstractions, is that it is now possible to verify that the conversion is correct once and that makes it scalable for the experts in the field to pool their attention to verifying just those few equations. “Why a different ISA” then? I don’t see why we couldn’t be converting x64 or RISC-V down to those complicated math expressions. It is just that making a new one doesn’t cost very much and gives the ones working on the machine some additional flexibility to adapt it to what the underlying maths can express.
Overall I think zkASM is quite similar to all of the traditional [hardware/physical] ISAs, from the perspective of motivation, benefits and tradeoffs.
On the point of it being specific to cryptography – zkASM is still a project in heavy development and it’s direction up to this point was indeed driven largely by crypto field needs. We don’t think there’s a huge reason it wouldn’t be possible to make it more generally applicable than it currently is, and is one of the things that is being worked on right now. In principle the machine is already a linear bounded automaton complete, although lack of some operations can make it quite painful to use in certain contexts.
On the point of hash function computations: I unfortunately am enabled to be ignorant enough of the underlying research to be able to counter this observation. I do know that (some?) hash functions are quite compatible and comparatively easy to integrate into broader zk-proved computations, and that it isn’t necessary to recompute the hash to prove it. I do _suspect_ that it may weaken some probablistic guarantees of the hash functions, but I never thought about this hard enough to actually ask the people in-the-know to verify this. That’s pretty much all I can say about it :shrug:
Maybe Andrei will be able to add more to my response when they’re back in ~month.
This sounds quite interesting, personally. Although I do not possess the skills to do this by myself
I'm working on zkasm backend and have a question about cranelift filetests. Prerequisite: we don't change many parts of cranelift, we only add new isa and write code to lower clif -> zkasm.
Currently, for testing, in general we use next approach -- we generate some runnable wasm files from wasm spectest, and have rust #[test]
function which compiles them. Than, we run generated zkasm by separate script. It was an idea to move our wasm files from separate folder to cranelift filetests. Any thoughts about it, is it worth to do?
Last updated: Dec 23 2024 at 12:05 UTC