I've noticed this PR (https://github.com/bytecodealliance/wasmtime/pull/5288) hasn't had any activity since Mar 3. Function references is listed as a pre-requisite to Wasm GC, so seems pretty important to get this in. Is anybody working on this? Any idea if this is still getting merged any time soon?
@Luna Phipps-Costin @Daniel Hillerström: are you still working on that pull request? anything we can do to unblock you?
@fitzgen (he/him) yes, I am actively working on it. I am currently blocked on a paper deadline this Saturday :-)) I think what I need help with addressing is where to add the error paths for the embedder side of things (c.f. https://github.com/bytecodealliance/wasmtime/pull/5288#discussion_r1126569577). I am quite eager to land the patch as it will make it much easier for me to keep up with changes to wasmtime on my continuations branch
I've been thinking a lot about the embedder API side of things as I work on the RFC for wasm-gc, will share more soon
unfortunately, I don't think Alex's suggestion to skip the embedder API as a temporary stopgap to unblock other work is actually feasible since we will need the embedder API to get .wast
tests running
OK, what path forward do you suggest?
I think we will need to properly address the embedder API, I have a couple ideas, but need to talk it over with folks to decide which path is best. should have more guidance by the time you're done with your paper deadline :)
Excellent thanks! Maybe we can meet at the end of next week, or the week after, such that I can learn more about your ideas?
how would monday at 11:30am pacific work?
@fitzgen (he/him) that ought to work for me, thanks! (Monday April 17)
(deleted)
@fitzgen (he/him) re https://github.com/bytecodealliance/wasmtime/pull/7943 and the FuncType
field of TypedFunc
, might be faster to do this here -- two comments of yours indicate we can remove it via WasmTy for TypedFunc<...>
, but isn't the type still needed to thread through to call_raw
?
e.g. somehow I thought we needed to have type information to flow into the runtime type-checks of the arguments
the only reason the type is needed to thread through to call_raw
is for runtime checks, which would no longer be needed if we stop doing runtime checks :)
and we could stop doing runtime checks, but still allow static typing of functions that use concrete type refernces via impl WasmTy for TypedFunc
we'll still need runtime checks for gc types though right?
e.g. (ref $struct_type)
so not for anything currently but needed in the future soon
I think we could have TypedStruct<(...)>
oh you're thinking we can skip the derive?
or make it much simpler?
I suppose you're thinking like TypedStruct<(i32, Mut<i32>)>
for mutable fields?
yeah, it would not be turning repr(C)
rust structs into GC types and vice versa
it would be "I have type checked that the instance of this struct has the type parameters fields"
something like that
unclear what to do wrt recursion groups and nominal types
I was just thinking that heh
e.g. all the canonicalization that wasm-gc does
that'd be hard to reflect into Rust
I guess even if we restricted it to structural stuff, and had cyclic edges be supertypes, which would allow reading offsets of the underlying object directly, we would still need a dynamic type check when passing into a typed function
ok so maybe this isn't worth pursuing
or at least, not worth undoing what we have now
so while TypedFunc
typechecks could go away we'll still need the type information in the future
might still be worth it even if it doesn't 100% solve the problem
Alex Crichton said:
so while
TypedFunc
typechecks could go away we'll still need the type information in the future
right
it might be useful to have it available for simple cases yeah
right, but we can cross that bridge when we get to it
so back to this PR
realloc
doesn't have an Instance
do you think there's any way we could like pass around the type index and then lazily turn that into a whole function type?
I should go look at realloc again
it has a *mut wasmtime_runtime::ComponentInstance
oh wait though realloc is a vmfuncref?
so there's a type in there?
we could stash the raw u32
of the VMSharedTypeIndex
in there
oh is it?
checking
er wait hang on
this is true of all call_raw
we pass in vmfuncref so we can probably drop the &FuncType
parameter
b/c we can load it lazily?
well going from VMSharedTypeIndex
to FuncType
still involves locking
I wonder if that may be the way to go though so long as it's lazy
to increment the associated registered reference count
keep integers/etc on the fast path
it is really annoying that wasmtime::FuncType
and wasmtime_types::WasmFuncType
are not the same thing
is there anything actually preventing them from being the sa,me?
we could probably make them the same at this point
or well
I think ValType
and WasmValType
would still have to be different
FuncType
has ValType
which recursively includes FuncType
yeah
time to whip out ValType<T>
again
but that's okay because we only expose params/returns publicly via impl ExactSizeIterator
so we could lazily do the translation at that level
I'm always down for merging the layers of abstration
it always feels like we have too many
(also it is annoying that we don't have VMSharedTypeIndex
in the wasm_types
crate for use in EngineOrModuleInternalIndex
)
okay, I agree this should be explored, but
we could move that though?
that one's easy to move
I don't really want to do that in this PR
true
Alex Crichton said:
we could move that though?
Yeah I think so
fitzgen (he/him) said:
so we could lazily do the translation at that level
lmao we already do this lazily
we get the referenced WasmFuncType
via our Arc
and then lazily translate
so if it works for this PR I think it would be best to drop the &FuncType
from call_raw
and somehow lazily load the type info from Engine
if needed
alternatively you could eagerly load and see if you can measure a perf hit
okay so maybe I can:
WasmFuncType
instead of FuncType
or your idea, but:
do we have a script or something for measuring RPS of wasmtime serve
?
nah
I've never acrtually done that
"pick your favorite benchmarking tool"
you could also measure the call
benchmark here too
that will probably show it being disproportionately bad though given how streamlined it is today
yeah, should do that either way (I suspect the way I implemented the dynamic checks could be improved to help the compiler boil some stuff away)
I think the call
benchmarks aren't sufficient here tho
I feel like it's best though that we start with an index+engine combo and then derive info from there
because they are single threaded
oh you'd be surprised
uncontended locks are surprisingly expensive in that context
and if we are lazily constructing FuncType
s from VMSharedTypeIndex
s that involves locking at the engine level
"how good is your rwlock with only readers"
right, it might show some overhead, but not the "full" overhead
agreed
we could in theory build a fancier data structure though
e.g. VMSharedTypeIndex
could be proof of "this slot in this data structure is unchanging"
and then we could go from &Engine
to &Slot
via that proof
so with some elbow grease I think we could get the locks out of the way
I don't think so: that would require RAII which isn't possible with integrating with JIT code
I'm thinking the Module
is "the RAII"
where e.g. we keep Module
in a Store
to keep it alive today
we could remove the locking by putting the registration count behind the Arc
, and then having some other thing sweep the registry's hash consing map
(remove the locking for cloning and dropping and such, at least)
ish
not look up
I haven't thought this through fully
but like if we allowed 1K types maximum we could just have a Vec
preallocated and you index into that
obviously not suitable but ... hm...
well anyway I think we should try the route of "don't pass in type info"
and see how that fares perf-wise
we can at least be perf-neutral with scalars
Alex Crichton said:
I'm thinking the
Module
is "the RAII"
but then VMSharedTypeIndex
itself isn't proof of anything, re are relying on the module existing and that uses have an associated module or whatever, which is pretty much teh situation we are already in
well I don't want to get too distracted with a half-baked idea I have
brb
Alex Crichton said:
we can at least be perf-neutral with scalars
we do have to call .next()
on the param types iter for every param, so to be actually neutral it would require LLVM to be able to boil that away, which I am not sure it can or not
okay well lets start with looking up the funcref's FuncType
and doing the read lock and measuring perf of that
and then we can decide what to do from there
(and I'm not gonna get to the benchmarking until after I finish writing these missing embedder API subtyping tests)
I'm imagining we'd pass in &LazyParam
which has &Engine
, usize
, and VMSharedTypeIndex
and then .get()
loads everything
so that way outside the loop is "just" bumping usize
yeah that makes sense
Last updated: Jan 24 2025 at 00:11 UTC