@Luke Wagner hey some questions about the module-types explainer
I think we'll probably want a separate index space for each kind of type, right?
like right now we only have the "type index space", but that's more realistically the "function type index space"
and module types are adding 2 more index spaces, the module types and instance types index space
does that sound right?
Yup, index space per type. In MVP wasm we have have 5 index spaces (type, function, table, memory, global), and Module Linking would add 2 more (module and instance)
technically it's 4 more?
the "type" index space will grow to contain not just function typedefs, but also module typedefs, instance typedefs, and later GC typedefs
er
yeah so I may be confused about this type index space
if I define one function, one module, and one instance type, all three indices are zero?
or indices are 0, 1, 2?
the indices for the type that is
(I thought this was adding 4 index spaces, {module, instance} x {type, def})
and I thought the existing "type" index space was going to be basically renamed to "function type" index space
in that situation you have 4 index spaces at play: a type index space with 3 types (0,1,2), and then single-element module index space (which defines the module, but also refers to the module's type in the type index space), and similarly a single-element instance and function index space
hm so the I can declare a module with a function type (erroneously)
and that'd have to be caught in validation?
where the type of a module is indeed a module type?
correct
why not have separate index spaces though?
like one index space per type
fwiw, this "type" vs. "declaration of a function's type" distinction is already started in the MVP: https://webassembly.github.io/spec/core/syntax/modules.html#indices
well, the type section is all incestuously circular
(in the limit, once we have first-class references to all the things)
but that's still allowed with multiple index spaces?
so you have this one index space of "type definitions" where all the types can refer to each other
oh you're thinking like "here's the N types of this module", and you verify all sub-indices are less than N ?
and then all the other definitions can point into it
I was imagining we'd still have just one type section, but it'd add to each respective index space depending on what's defined
sort of like the import section
but because types refer to each other we can't validate until everything is parsed
and we'd rather validate in a streaming fashion?
well, "everything is parsed" as in "the type section is fully parsed"
yeah, in general, the type section will have to be validated all-at-once
b/c, eventually at least, it'll be circular
hm wait then I still don't understand why we wouldn't have a separate index space for each class of types
if we assume you parse the whole type section, then you validate the whole thing
incidentally, i'm polishing up a change to the "Binary Format Considerations" section of the Module Linking PR which explains that we actually need to loosen up the section rules a bit so that you can split up and interleave the Type, Import, Module, Function and Alias Sections. The reason being that there is no longer a simple ordering of sections that prevents forward references. So rather, you can just have "some types" then "some modules" then .... and the index spaces grow monotonically, and validation is just relative to the current index space contents
(answering why not a separate index space per class of type:) i think that could be an isomorphic alternative; you'd basically have to put the "what class of type is this" next to the index. with the current design (already in the MVP), the "what class of type is this" enum is part of the typedef itself
Isn't the index space of the index specified always implicit though?
e.g. call_indirect
always looks up in the function type index space, not the module type index space
and module declarations would be relative to the module type index space
right, but if you have a (ref $typeindex)...
that's an index in the type section
wouldn't that be relative to the "w/e the reference types index space" is?
could be to a function or module or instance ...
hm ok, I basically know no context about gc
so $typeindex is (eventually) one of {function, struct, array, module, instance, ...}, which isn't a reference to the thing, but the thing itself
and (ref $typeindex) is building a new type, which is a reference to the thing
not sure I fully understand but "wave hands gc changes things" is good enough for me for now
a different example, in interface types-land is when you have (record (field "x" $sometype))
$sometype could be one of several different "classes" of type
(in this case there's no reference, it's just a record value embedding some other value as a field)
it's ok, I'll just take this as a given for now :)
FWIW, if you look at the binary encoding of function types (https://webassembly.github.io/spec/core/binary/types.html#function-types), the first byte, 0x60, is the "what class of typedef is this" byte
it's just explicitly called out b/c there is currently only one class of typedef
right yeah, so more types fits well
but it is carefully chosen to be distinct from every other type code: https://searchfox.org/mozilla-central/source/js/src/wasm/WasmConstants.h#63
I just don't fully understand how gc changes things to require one index space as opposed to multiple type index spaces
well even now we only have 1 type index space
I know yeah, but I don't know why we wouldn't want more with this proposal
the "function" index space isn't a space of types, it's a space of function definitions
right yeah, but with module types I don't know why we don't rename the type index space to "function type" index space
you're mentioning because of gc (ref $ty)
things, but I don't fully get that, but that's also ok I don't really need to at this point
i guess b/c we've already planned to have a single index space, discriminating with the 0x60 prefix code
it just means extra validation rules for type references
which technically avoids needing to put that discriminator on every use
in some cases, yes, but in other cases, i think you need the discriminant in one place or the other
e.g., function definitions can only refer to function types, so yeah, there it's extra checking
but other cases can reference any type
it's mostly poTAYto poTAHto though :)
we've just have established a precedent in one direction
shrug
@Luke Wagner can you also explain a bit more this alias
thing?
I don't really understand much about it other than "it can refer to parent stuff"
(sorry meetings) yeah, alias
lets you inject a definition in either (1) your parent, (2) the export of an imported or nested instance
I'll post an update to the PR later today
mk I'll wait for that
@Alex Crichton https://github.com/WebAssembly/module-types/pull/3/commits/73950e7f977fc601107b4bdc5058389772cd8d45
@Luke Wagner can you give an example of what you're thinking with an alias
referring to something in the enclosing module?
It also feels a bit odd to have so many alias/instance sections, do you think it'd be possible to fold alias definitions into the instance section? The same idea just a different binary structure where the instance section would be a bit more "meaty". Each element in the instance section would either instantiate a module or create an alias for a previous instance.
eh maybe not, you have to alias imports as well... anyway
So the intention is that (alias $name (func $instance $fname))
-- $name
is the name of the function we're creating (optional), $instance
is an instance index, and $fname
, is that encoded as "fname"
the string or an index where it's the nth exported function (or nth export?) of $instance
?
For the first question: the reason for aliases referring to the outer module is just to remove redundancy; in package linking scenarios, the same module/instance types get repeated a ton.
For the second: yeah, that was the alternative I liked for a while. The thing that's a bit odd about it is that an alias in the instance section doesn't add a new kind of instance, so it's not really the "instance" section but the "instance and alias of instance exports" section. Aliases are a bit like Imports, so it seems vaguely symmetric to give them their own section. But it's not like that's the only way to do it.
For the third: yes, that's right. $fname is an index into the exports array of $instance's type definition (which is local tot he module)
@Luke Wagner oh referring to the parent makes sense, I was wondering if you had thoughts on the binary format? If aliases can only refer to exports you can't define exports before instantiating right? I'm just murky on how the specifics of semantics and binary encoding would work
@Alex Crichton Aliases can refer to any type/module defs in the parents' index spaces, not just exports. It's only aliases to nested instances that can only refer to exports
@Luke Wagner ok so here's what I've got so far, mind skimming it over and see if I misinterpreted or am missing anything? -- https://gist.github.com/alexcrichton/506ac6d2f7d505d556d68fb969489183
Looking at importdesc, one could reconsider the existing 0x00 case from always being a func to instead being "whatever the indexed type is", so that as soon as you add module/instance types, the 0x00 can refer to them just as well as func. E.g., that's what (ref $typeIndex) will do
But actually, that's super-bikeshed-y, so n/m
Different, but perhaps-more-justifiable bikeshed: in the Alias section, for the parent
aliases, could the second byte of module be 0x04 (to match import and export)?
oh sure
although you may be able to help me clarify that
when you refer toy your parent's index
that's indexing within your parent's module index space, right? not your parent's export index space?
correctamundo
ok yeah I can change the alignment
also great idea with using (parent $x)
as the text format, i'm going to update the Explainer to match
I do want to document the text form at as well, but I'll probably do that in tandem with writing the text parser
One other tweak: in components, I think there aren't any Table, Memory or Global sections allowed, so perhaps you can explicitly say these are disallowed
table/global makes sense, but for memory, do you mean no defined memory, but you can still import memory, right?
(e.g. string.lift
needs an index space to operate on)
yes, you can still alias all of the memories/tables/globals of nested instances
ah ok, so you still have an index space, just no local definitions
makes sense
one last change: i think, unlike the core section rules, we can allow the Function section to be intermingled with the Instance/Module/Type sections
so that adapter functions can be imported by nested instances
only for components, though?
correct
i thought about it a lot for core modules and i think it needs the stricter separation
part of what makes it make sense for adapter functions is that component instances are (mostly, get to that in a sec) stateless
so we can say that the component instance is created before its nested instances
the one bit of state a component instance has is: which of my nested instances have been created
but we'd still have the trap-if-you-call-too-early semantics?
so what we can say is that it is a dynamic error to call an export of an instance that hasn't been created
and so if a component has nested instances A B and C
heh yeah that works too
then A can call adapter functions which call component imports,
but if it tries to call into B or C, that'll trap
there is that subtle detail of are A's exports visible if A's start function calls the adapter function
"meh"
i think for now we can say "yes", and then later talk about the "after start" function thing later
b/c import adapters definitely need to be able to reenter the core module caller, to call malloc()
ok updated the gist with this
ah, i see you're writing to cover both the module and component cases
in that case, you might want to add a caveat to the version word noting that the 0x1 is only for components
oh sorry yeah this is sort of like a diff of what we expect to land in wasmparser
which would parse both
anyhow, just wanted to say great job on this!
do you mind if i resolve your open conversations in the Module Linking PR? or happy for you to comment on them
oh sorry of course
were you waiting on me to do that?
or is it expected that I do that?
they were resolved like the second you replied weeks ago lol
haha, no worries, i was making progress on other discussions in the PR, so no waiting. i was just tidying up since it's looking like we're almost done. i'm still going to present at the next CG meeting before merging, i think
well, i dunno
btw, i was working through an example with transitive dependencies to see how the link step would work. my main interest was finding a scheme whereby the generated Linked Module simply imports unmodified Packaged Modules and avoiding the module type of a given Packaged Module needing to capture all of its transitive dependencies in its module type. i ended up finding a wrapping scheme that uses nested modules and parent aliases to essentially "curry" module imports: https://gist.github.com/lukewagner/d662cbe7b58281672053dab4118d25b7
might be an interesting test case ;)
@Luke Wagner here's something to think about:
(type $t (instance (export "x" (instance (type $t)))))
I think that needs to be an error of some kind
or something like for the text format we have to figure out a DAG of how to visit type definitions
I'm assuming we're doing the same thing for instances as we do for functions, which is when you declare the type you can specify both the (type xx)
reference as well as the type inline (e.g. (func (type 0) (param i32))
)
but when both are specified, we need to validate they're equal
(e.g. type 0
is indeed (func (param i32))
)
so like I think this is technically valid:
(type $t (instance
(export "x" (instance (type $t)
(export "x" (instance (type $t)
(export "x" (instance (type $t)
(export "x" (instance (type $t)
(export "x" (instance (type $t)
(export "x" (instance (type $t)
(export "x" (instance (type $t)))
))
))
))
))
))
))
))
but validating that at the text level isn't really easy to do because as we're determining the canonical value for $t
we need to know what $t
is to compare it to $t
we can perhaps solve this by saying "don't do the function thing, you say the index or you say the inline, not both"
well so let me rephrase, this seems like it could either be a binary error or a text error
I feel like we want it to be a text error
I'm getting tripped up in how circularly recursive this is
Haha, yes, I think no recursive types for now. Perhaps just by requiring all type indices refer to earlier typedefs
Also, agreed that if you write (instance (type $T)) you shouldn't need to be able to re-state all the exports/imports
That's really just a goofy special case for functions so they can assign identifiers to parameters
Actually, for directly-embedded type definitions like you're showing, I think no circularity forever; the only exception is circularity via (ref $T)
@Luke Wagner ok sounds good, I'll need to write some sort of sorting pass in the text parser then too to do a topological sort
ah, bummer, yes. for starters you could require text to be in order too ;)
thinking more on it, you're right, I think the text format should impose the restriction itself
@Luke Wagner here's another interesting thing to think about, so right now you can specify imports/exports inline in the text format, e.g. these are all the same
(func $f)
(export "" (func $f))
(func $f (export ""))
(import "" "" (func $f))
(func $f (import "" ""))
for nested modules, however, the proposed syntax you've got so far doesn't support this
b/c (module ...)
the imports/exports would be ambiguous
or I guess it'd be ambiguous without more lookahead
I'm wondering if it makes sense to do something like:
(nested-module (type $ty) (module ...))
where you can specify inline imports/exports if you want
and (module ...)
bare is sugar for auto-calculating the type, inserting it into the type index space, and not having any implicit imports/exports
I was also wondering how you'd do something like (module (type $module_type) ...)
b/c that's also ambiguous
but at least for printing the binary format we'll need a way to say "the module type was specified at this index, don't inject anything else"
@Alex Crichton sorry, which part was ambiguous?
i would think (module ...) would be mostly symmetric to (func ...) in the ways you mentioned (import, export, explicit type index)
i see your point that it'd be necessary to scan the whole body of (module ...) to determine the module type, unlike (func ...) which tells you that basically up-front
but i think there's still symmetry with func in that, to validate the body of a (func ...), you have to have parsed all the types of all the other functions first (so that you know all the funcs' types)
hm ok so I've got two concerns here
one is how to parse this:
(module
(module (export "x"))
)
another is how to parse this:
(module
(type (module))
(module (type 0))
)
which are sort of the same concern
basically everything in parsing only requires 1 lookahead right now, but those would otherwise require multiple tokens of lookahead
it's not necessarily ambiguous but it is pretty weird
Ohhhhh, I finally get your meaning; you're saying: when I see the tokens (
module
(
export
, I don't know if I'm parsing an export of the module definition, or declaring that this module definition is exported. I was thinking of it as an already-parsed AST which is after this question has been sorted :)
@Luke Wagner oh sorry, but yes
So I suppose in both cases, the (export "x") and (type 0) require a constant amount of lookahead to see what they are
that's why I was thinking of (nested-module ...)
because that would put you in a parsing context where you clearly know what's what
that's true yeah
the constant is just bigger than 1 heh
are you using an LR(1) parser generator?
nah it's all hand-coded so it's easy enough to do
but it also just looks funky
from a readability point of view
not that inline exports/imports are used all that often
well it's a good question, i'll file an issue on the repo after we merge the linking PR (i'm thinking mid next week)
I'll stick to "just do more lookahead" for now
:thumbs_up:
Hm ok so I'm getting really tripped up how to implement alias
statements in text parsing
everything is so circular and I don't know how to untangle things
so right now "elaboration" is a pretty simple 3-phase process
er, 4 I guess
1) expand inline imports/exports
2) expand inline type annotations to actual type declarations
3) record what index each name is at
4) fill in all names with their indexes
during step (4) we also have this extra "validate the inline function type matches the referenced type" if you do something like (func (type 0) (param i32))
but I don't know how to do this for modules
so if you have a nested module like (module (module))
then that needs to be elaborated to
(module
(type (module))
(module (type 0))
)
so when the module type isn't listed, we need to calculate it an inject it as a module type
but I'm not sure how we can calculate it in the face of (alias (parent ...))
because that needs information from the parent, which if we're only in step (2) we don't even have symbolic names yet
much less stable indexes because we're still injecting new type annotations depending on what we're seeing
so like for example
(module
(module
(alias (parent (type 0)))
(func (type 0) (param i32))
)
)
the definition of the parent module's type 0 is going to be the type of the inlined module
but the inlined module references that
Some of this is just complexity I think, but I'm just stuck at what to do
I don't know if I should just throw everything out and start from scratch with a complicated resolver
or try to fit things cleanly into what already exists
I can't tell if all this elaboration/name resolution has to happen in cycles till it reaches some sort of fixed point or otherwise what the precise order of passes is to figure everything out correctly
Another example would be
(module
(module
(alias (parent (type $foo)))
(func (type 0))
)
(type $foo (func))
)
I don't know what order to do things. The index of $foo
is 1, but we don't know that until the type of the nested module is elaborated. We can't do that though until we figure out what it's parent
reference is pointing to
@Alex Crichton So for the text format parsing, my inclination would be to say that, when you see an explicit type declaration, you simply bake in that index, no questions asked; if the module fails to validate, it's the author's fault
oh sure yeah wat does extremely little validation
this is more just about trying to do name resolution where we're figuring out what indexes are assigned to everything
like is this supposed to work?
(module
(type $foo (instance
(export "" (func $bar))
))
(module
(alias (parent (type $foo)))
(import "" (instance $i (type 0)))
(alias ($i (func $bar)))
(func
call 0)
)
)
So one thing we can do for now (and perhaps forever) is to say that aliases can only refer to preceding definitions
but even in the above case
we're doing name resolution across modules
and everything is in preceding order and everything
(alias ($i (func $bar)))
realizes that $i
resolves to local instance 0, which is imported, which has a type defined locally, but that type was aliased from a parent module
like these are kind of "dumb concerns" in that they're only really applicable to the implementation of a text parser and don't really have many implications on the binary format
I'm tripping myself up so much because the text format is so simple today and I can't figure out how to make the addition of modules as simple as it is right now
without having things like global name resolution and tombstones for "this'll get resolved later" and things like that
Yeah, I think the text parser has to maintain a stack of identifier scopes (one per nested module) that it updates in a linear pass over the AST
I suppose what this means is that some identifiers (e.g., calls to $functions in function bodies) get resolved at the end (b/c they are allowed to be circular), while some get filled in as part of a linear pass
yeah that's also the weird part for me
the text format is super loose today in that you just throw things in a soup and a valid module almost always pops out
but this is starting to place lots of restrictions of "no everything has to be very strictly ordered"
well, i think every identifier that exists today would be in this "fill in at the end when all identifiers are known" category,
and we're just introducing a new "kind" of identifier that gets resolved in a new, earlier, linear pass
that makes sense, yeah, and I'm getting tripped up trying to not do that
it feels wrong to have "oh these identifiers work only linearly" and "oh but those identifiers can work anywhere"
and I can't figure out how to prove "yes the linear stuff is required due to this design constraint"
heh, i guess that's sortof the case with C++; in struct C { typedef int X; X foo() { bar(); } X bar() { foo(); }
, the names bar() and foo() can be cyclic whereas the reference to X has to be in order
I'll just try to work something out
But yeah, I see what you mean, the current parse rules simply parse every field in isolation and then do name resolution as a wholly separate, order-independent pass
Trying to think a bit more about what the general rule is, I think it's this: you do most name resolution in a linear pass, and any time a name is un-resolved:
Then, in a second pass, go over the placeholders and check that they resolve to a name and that name is not an alias or instance definition
Thus, only name cycles involving aliases or instance definitions would be disallowed; everything else should get resolved as it is today
I think that sounds reasonable, yeah, I got further today in implementing all this, I think this is the last step for the text format
I'll work on getting this implemented next week to see if it all works
I'm still trying to wrap my head around the set of changes here
I'm definitely getting from the point that wat
was previously a pretty simple parser with a single pass to resolve names, but now it's becoming more of a compiler almost where it's got a type resolutoin pass and such
but to confirm, @Luke Wagner the encoding of this module:
(module $outer
(module $inner
(module $child (export "a"))
)
)
would look like this?
(module $outer
(type $child_type (module))
(type $inner_type (module
(export "a" (module (type $child_type)))
))
(module $inner (type $inner_type)
(alias $child_type_inner (parent (type $child_type)))
(module $child (type $child_type_inner))
(export "a" (module $child))
)
)
Yeah, that looks right, and yes, I can see how this makes the text-to-binary a lot more complicated
@Luke Wagner (export $some_instance)
is only intended to be sugar for the text format, right? not reflected in the binary format?
Good question! For now: yes. At some point in the future, when one can import instances of imported instance types (O_O), they may need to become first-class things b/c it won't be possible to desugar them at text-to-binary time
@Luke Wagner The'
There's no way to use interface types with a global, right?
And as a related question, in theory commands could have immutable global exports, which could be a way for commands to export metadata, however without the ability to export strings or other higher-level types, that may not be very valuable.
Good question! Coincidentally I was just thinking about this and how it might work for stuff like metadata. It seems like one could have, as a component import/export an interface value (not const global, just a pure value) and this could be lowered into a core const global import, and this could allow one to import compound JSONesque values
@Dan Gohman
Ah, and by being a value export, rather than a global export, you'd read it with an adapter function, and not with global.get
Yup! And it lets you perform a one-time conversion (performing malloc etc)
Without thinking about global.set
@Luke Wagner here's an interesting question, how should this be encoded?
(module
(import "" (module))
(type (module)))
the import
annotation has an "inline" type annotation
but this inline type annotation is listed later
with functions this works today because you just encode all types first
but with the first 5 sections in any order, it's unclear what the text format is supposed to do here in that regard
e.g. does this encode as:
(module
(type (module)) ;; injected
(import "" (module (type 0)))
(type (module)) ;; original type annotation
)
or like this:
(module
(type (module)) ;; original annotation reordered first
(import "" (module (type 0)))
I suppose I'm answering my question as I'm writing this down, it basically has to be the former
this is just weird because now this doesn't behave as functions do
Yeah, good question. So in the analogous function situation, when the inline type def goes before the explicit type def... do you get two type defs or 1?
/me goes to look at spec
By my reading, an inline func type followed by explicit func type def will produce two type defs: only inline funcs "reach back"; type defs don't
Thus, I think the your former module case would be symmetric to funcs
... and yet, wabt's wat2wasm seems to merge them
I think technically that's a bug in wabt
ah yeah I've mostly gone by wabt's behavior which may be a bug
well, practically speaking, i'd do whatever was easiest for now, but if it's the former, i wouldn't feel bad about it
order of items previously in the text format have largely been irrelevant, but with the 5 sections at the front that can be all interleaved I think it's a lot more imporatnt now
so I don't think there's actually any opiton other than the first, injecting a duplicate annotation
it's a relatively niche concern anyway though, it generally needs to just work
@Yury Delendik did you want to join a video chat about https://github.com/bytecodealliance/wasm-tools/pull/26 ?
sure
k cool, @fitzgen (he/him) would you/yury be free this afternon?
/me is
Yeah, free in roughly an hour and then for the rest of the day
k cool I'll send an invite
Last updated: Dec 23 2024 at 12:05 UTC