I've got a question for folks who manage cranelift-in-spidermonkey, currently cranelift does no validation of the input wasm module assuming that the embedder does the actual validation (e.g. wasmtime calls wasmparser::validate
and presumably spidermonkey validates elsewhere). I think though this may not be tenable too much further into the future.
One example is that ref.is_null
no longer encodes its type in the instruction, so to figure out whether you're testing externref
or funcref
you need to look at the type stack
wasmtime gets away with this by using a different representation so we can look at the type of the cranelift value, but there may be instructions in the future which need non-local information about how to codegen something
Additionally validation has a nontrivial cost in wasmtime right now because we simply "run it up front" and throw away all the results, so for Wasmtime specifically it'd be best actually if the module was validated during translation (this would also make validation parallel, which it currently isn't in wasmtime)
so all of that to ask, what would folks think about adding validation interleaved with cranelift-wasm's translation?
Is this a cost specifically avoided in the spidermonkey integration? For example do we need to make validation a compile time or runtime switch for cranelift? Or would spidermonkey be ok if cranelift double-validated
as in, if cranelift validated what spidermonkey has already validated
AFAICT, the wasm module has already been parsed by other bits of SM's wasm frontend by the time that Cranelift is invoked to compile functions; there are a bunch of shims to move the types/signatures over that I haven't internalized. So validation that happens during wasm parsing in Cranelift would be an extra cost; but intuitively it seems to me that it might not be too much extra cost if the work is essentially shared by parsing and validation
My instinct at first, at least, is to say that it's better for CL to remain simple, rather than trying to make all validation conditional. If it's too much latency later, we can definitely come back and refactor. But @Benjamin Bouvier probably has better thoughts on this! (AFAIK he's out today and tomorrow)
That's a good point that I'm also not familiar with the spidermonkey integration, I'm assuming it just hands a blob of bytes to cranelift but I could be wrong
And yeah doing this conditionally would be tricky, mainly because I'm not sure how to conditionalize the translation phase where we actually need the type information
well, eventually would probably need the type information, today we're getting by ok
Yup, so here's the loop that drives function compilations -- uses a Decoder
which is a SM-side wasm parser: https://searchfox.org/mozilla-central/source/js/src/wasm/WasmCraneliftCompile.cpp#476 .
I'd need to read/think a bit more before saying what would make the most sense long-term... we definitely still need to parse for baseline compilation, before Cranelift is involved... but for now it seems parsing costs would be dwarfed by the other compilation stages so it doesn't seem like a big issue to me!
oh ok nice, I'll dig into that function
if cranelift-wasm changes in ways that spidermonkey doesn't even touch that's presumably A-OK heh
yup, for sure
Right now this is conceptually what's happening:
My thought is that ideally, we'd have a way to provide cranelift-wasm with module-level information (type section info, etc.), while cranelift-wasm would do the function validation.
(Spidermonkey's Wasm.validate is super fast, but that's still an extra 100ms overhead for very large modules that only Cranelift has)
(also Cranelift-in-SM does validate function's bodies in parallel, so the cost is a bit amortized)
That actually sounds perfect, and sounds like my conclusions reading code yesterday were indeed correct! I'll see if I can work on wasmparser to enable this (wasmparser basically already has support so it should be easy)
Ok a late follow-up but I've now implemented this at https://github.com/bytecodealliance/wasmtime/pull/2059
/me will review first tomorrow
that PR now forcibly validates all wasm functions fed to cranelift-wasm, so this will affect Gecko (if that lands as-is). I'd definitely want folks familiar with the spidermonkey integration of Cranelift to take a look!
the knobs are now there for validation to be entirely deferred to cranelift (spidermonkey can provide its own hooks to learn about the module's other types), but that may not be sufficient. If so just let me know!
Looks great! I'll be happy to integrate and test it in spidermonkey when the time comes !
Thanks for working on this!
@Alex Crichton hi! let me know if you want to chat about https://github.com/bytecodealliance/wasmtime/pull/2059, maybe I'm missing a way to properly use a primitive type as the Self::Type
in this case!
Oh i actually had another question about this: right now, Baldrdash will request module-level information lazily, so it'd make sense that we'd now implement a cache for this information, that we'd fill in incrementally as we compile the function (and share this information with other function compilations, since at this point the module-level info is immutable). Since the traits mention &self
and not &mut self
, I'd need to use interior mutability (RefCell) to be able to build up the cache information; are there other better ways that would avoid the dynamic RefCell checks on borrows?
@Benjamin Bouvier hey back now, thanks for taking a look! Is Self::Type
the only thing you had an issue with?
If so I think I might just remove that trait and return wasmparser::Type
instead since I'm not sure having an associated type for Type
is all that worth it (all you can do is get wasmparser::Type
anyway)
also self vs mut self, but it's probably more open to discussion (and i can solve it locally with a RefCell iiuc, with an additional dynamic cost unfortunately)
oh? what was needing mutability for your implementation?
er right sorry I see now
this would probably use a OnceCell
rather than a RefCell
since you'll want to get a long-lived borrow out of it
but that should work to lazily initialize these pieces
FWIW supporting both APIs would be somewhat tricky, but possible
I'd prefer though if we could find something that works from the get-go
/me looks at OnceCell
oh and to clarify, by that I mean this type
Ok! Is the OnceCell::get_mut
method cheaper to call than a RefCell::borrow_mut
? (I imagine yes, but not sure)
nah it's basically the same cost AFAIK
it's a boolean check of a flag basically
Ok! Since the cache is filled up as the trait methods are called, i'm not sure it's going to need a long-lived borrow, but yeah, both would work. So it's not a strong requirement for Spidermonkey.
Thanks for checking this! having just return wasmparser::Type
would indeed be ideal
ok, I'll make that change
Will this allow returning some kind of references, though?
if you're returning wasmparser::Type
it'll just be a Type
no need for references
ah right, i was wondering if this would be future proof since the GC type proposal might introduced structured types that won't be primitive types
but that might be irrelevant at this poitn
yeah I'm not sure how that will play out myself
I figured we can probably wait to see what happens there
@Benjamin Bouvier https://github.com/bytecodealliance/wasm-tools/pull/68 is what I'm thinking for your issues
@Alex Crichton thanks! I'm worried about the reverting for TableType and other non-primitive types, though; this will require straight conversions from spidermonkey's types to wasmparser's types everywhere, or creating the converted types and caching them...
I think it should be cheap to do though? They're basically just Copy
containers of types
and wasmparser validation would do the conversion anyway basically regardless
(and with monomorphization it should all go away in terms of overhead)
Sounds good to me! If it's an issue, profiles will tell us anyways :-)
Last updated: Dec 23 2024 at 12:05 UTC