It's not clear to me from the RFC whether Pulley will provide a similar validation mechanism as Wasm to guarantee sandboxing (reducing the number of runtime tests). I would guess no, because the goal is to be a target, so validation is done before compiling to Pulley. But I would like to know whether this interpretation is correct. And also what would be the cost to restore sandboxing (where are runtime checks needed, what additional data the runtime needs). Thanks!
Could you clarify what you mean by validation in this context? Usually in Wasmtime when we say "validation" we mean Wasm validation, which happens at translation time (to machine code for Cranelift and Winch, or Pulley bytecode for Pulley) regardless
You also mention sandboxing however, which has to do with strictly runtime properties like concrete memory access patterns. Those undergo the same set of optimizations our other Cranelift targets do, because we use Cranelift to compile to PBC
Yes I mean validation as in type checking
Which provides some soundness property
Necessary for proper sandboxing
(at least as I understand it)
OK, yes, so as above, that always happens at compilation time -- whether compilation to machine code or Pulley bytecode.
Sandboxing also needs some runtime checks, but those are reduced because we don't need to track types at runtime
Ok I see. I was planning to use Pulley as an untrusted "source". So I'll need some additional mechanism to guarantee its behavior
Yes, the bytecode has the same security properties as Wasmtime's precompiled machine code (cwasm files) -- that is, we trust it absolutely, and the raw PBC opcode set includes raw loads and stores
so you'll need to have a trusted way of e.g. vetting hashes, if you don't trust the store or the transport from wasmtime compile
to the execution context
(essentially a notion of Pulley applets, because I can't have a Wasm to Pulley compiler on device)
Yes exactly, thanks!
One other thing to note here is that we haven't really given thought, as far as I know, to stability of the format: the expectation is that the versions match between compiler and interpreter backend. If you have a fleet of devices in the field with different versions you'll need to keep all the compiler versions active
and manage versioning appropriately
I see, that's good to know. I assumed the format would eventually be stable
We might go that way if there's demand and if we're satisfied that it includes what we need; but so far the thing is very early stages
(I'll let @fitzgen (he/him) speak more to his long-term vision here, I'm mostly repeating what I've understood about it)
Sounds good, thanks for your answers!
the emitted pulley bytecode will have things like hard-coded vmctx offsets in it, so even if the pulley instructions themselves don't change, the bytecode will not really be usable across Wasmtime versions
this is the same as how the machine code that wasmtime creates via cranelift for a compiled wasm module is not reusable across wasmtime versions, despite x86 (for example) instructions not changing between wasmtime versions
you always have to match the wasmtime version that runs the pre-compiled machine code or bytecode with the wasmtime that did the compilation to produce that code
re validation and sandboxing:
if anything is not clear, please let me know which parts, and I can try to clarify
Thanks! This is pretty clear. I was thinking of Pulley as a portable bytecode like Wasm (but interpreter friendly). But that's actually not the case. It is really more of an internal implementation detail of wasmtime.
Last updated: Dec 23 2024 at 13:07 UTC