Stream: wasm

Topic: modules types binary format


view this post on Zulip Alex Crichton (May 18 2020 at 16:08):

@Luke Wagner hey some questions about the module-types explainer

view this post on Zulip Alex Crichton (May 18 2020 at 16:09):

I think we'll probably want a separate index space for each kind of type, right?

view this post on Zulip Alex Crichton (May 18 2020 at 16:09):

like right now we only have the "type index space", but that's more realistically the "function type index space"

view this post on Zulip Alex Crichton (May 18 2020 at 16:09):

and module types are adding 2 more index spaces, the module types and instance types index space

view this post on Zulip Alex Crichton (May 18 2020 at 16:09):

does that sound right?

view this post on Zulip Luke Wagner (May 18 2020 at 16:14):

Yup, index space per type. In MVP wasm we have have 5 index spaces (type, function, table, memory, global), and Module Linking would add 2 more (module and instance)

view this post on Zulip Alex Crichton (May 18 2020 at 16:15):

technically it's 4 more?

view this post on Zulip Luke Wagner (May 18 2020 at 16:15):

the "type" index space will grow to contain not just function typedefs, but also module typedefs, instance typedefs, and later GC typedefs

view this post on Zulip Alex Crichton (May 18 2020 at 16:15):

er

view this post on Zulip Alex Crichton (May 18 2020 at 16:15):

yeah so I may be confused about this type index space

view this post on Zulip Alex Crichton (May 18 2020 at 16:16):

if I define one function, one module, and one instance type, all three indices are zero?

view this post on Zulip Alex Crichton (May 18 2020 at 16:16):

or indices are 0, 1, 2?

view this post on Zulip Alex Crichton (May 18 2020 at 16:16):

the indices for the type that is

view this post on Zulip Alex Crichton (May 18 2020 at 16:16):

(I thought this was adding 4 index spaces, {module, instance} x {type, def})

view this post on Zulip Alex Crichton (May 18 2020 at 16:16):

and I thought the existing "type" index space was going to be basically renamed to "function type" index space

view this post on Zulip Luke Wagner (May 18 2020 at 16:17):

in that situation you have 4 index spaces at play: a type index space with 3 types (0,1,2), and then single-element module index space (which defines the module, but also refers to the module's type in the type index space), and similarly a single-element instance and function index space

view this post on Zulip Alex Crichton (May 18 2020 at 16:17):

hm so the I can declare a module with a function type (erroneously)

view this post on Zulip Alex Crichton (May 18 2020 at 16:17):

and that'd have to be caught in validation?

view this post on Zulip Alex Crichton (May 18 2020 at 16:17):

where the type of a module is indeed a module type?

view this post on Zulip Luke Wagner (May 18 2020 at 16:17):

correct

view this post on Zulip Alex Crichton (May 18 2020 at 16:17):

why not have separate index spaces though?

view this post on Zulip Alex Crichton (May 18 2020 at 16:18):

like one index space per type

view this post on Zulip Luke Wagner (May 18 2020 at 16:18):

fwiw, this "type" vs. "declaration of a function's type" distinction is already started in the MVP: https://webassembly.github.io/spec/core/syntax/modules.html#indices

view this post on Zulip Luke Wagner (May 18 2020 at 16:18):

well, the type section is all incestuously circular

view this post on Zulip Luke Wagner (May 18 2020 at 16:19):

(in the limit, once we have first-class references to all the things)

view this post on Zulip Alex Crichton (May 18 2020 at 16:19):

but that's still allowed with multiple index spaces?

view this post on Zulip Luke Wagner (May 18 2020 at 16:19):

so you have this one index space of "type definitions" where all the types can refer to each other

view this post on Zulip Alex Crichton (May 18 2020 at 16:19):

oh you're thinking like "here's the N types of this module", and you verify all sub-indices are less than N ?

view this post on Zulip Luke Wagner (May 18 2020 at 16:19):

and then all the other definitions can point into it

view this post on Zulip Alex Crichton (May 18 2020 at 16:19):

I was imagining we'd still have just one type section, but it'd add to each respective index space depending on what's defined

view this post on Zulip Alex Crichton (May 18 2020 at 16:19):

sort of like the import section

view this post on Zulip Alex Crichton (May 18 2020 at 16:20):

but because types refer to each other we can't validate until everything is parsed

view this post on Zulip Alex Crichton (May 18 2020 at 16:20):

and we'd rather validate in a streaming fashion?

view this post on Zulip Alex Crichton (May 18 2020 at 16:20):

well, "everything is parsed" as in "the type section is fully parsed"

view this post on Zulip Luke Wagner (May 18 2020 at 16:20):

yeah, in general, the type section will have to be validated all-at-once

view this post on Zulip Luke Wagner (May 18 2020 at 16:20):

b/c, eventually at least, it'll be circular

view this post on Zulip Alex Crichton (May 18 2020 at 16:21):

hm wait then I still don't understand why we wouldn't have a separate index space for each class of types

view this post on Zulip Alex Crichton (May 18 2020 at 16:21):

if we assume you parse the whole type section, then you validate the whole thing

view this post on Zulip Luke Wagner (May 18 2020 at 16:23):

incidentally, i'm polishing up a change to the "Binary Format Considerations" section of the Module Linking PR which explains that we actually need to loosen up the section rules a bit so that you can split up and interleave the Type, Import, Module, Function and Alias Sections. The reason being that there is no longer a simple ordering of sections that prevents forward references. So rather, you can just have "some types" then "some modules" then .... and the index spaces grow monotonically, and validation is just relative to the current index space contents

view this post on Zulip Luke Wagner (May 18 2020 at 16:24):

(answering why not a separate index space per class of type:) i think that could be an isomorphic alternative; you'd basically have to put the "what class of type is this" next to the index. with the current design (already in the MVP), the "what class of type is this" enum is part of the typedef itself

view this post on Zulip Alex Crichton (May 18 2020 at 16:25):

Isn't the index space of the index specified always implicit though?

view this post on Zulip Alex Crichton (May 18 2020 at 16:25):

e.g. call_indirect always looks up in the function type index space, not the module type index space

view this post on Zulip Alex Crichton (May 18 2020 at 16:25):

and module declarations would be relative to the module type index space

view this post on Zulip Luke Wagner (May 18 2020 at 16:25):

right, but if you have a (ref $typeindex)...

view this post on Zulip Luke Wagner (May 18 2020 at 16:25):

that's an index in the type section

view this post on Zulip Alex Crichton (May 18 2020 at 16:26):

wouldn't that be relative to the "w/e the reference types index space" is?

view this post on Zulip Luke Wagner (May 18 2020 at 16:26):

could be to a function or module or instance ...

view this post on Zulip Alex Crichton (May 18 2020 at 16:26):

hm ok, I basically know no context about gc

view this post on Zulip Luke Wagner (May 18 2020 at 16:27):

so $typeindex is (eventually) one of {function, struct, array, module, instance, ...}, which isn't a reference to the thing, but the thing itself

view this post on Zulip Luke Wagner (May 18 2020 at 16:27):

and (ref $typeindex) is building a new type, which is a reference to the thing

view this post on Zulip Alex Crichton (May 18 2020 at 16:27):

not sure I fully understand but "wave hands gc changes things" is good enough for me for now

view this post on Zulip Luke Wagner (May 18 2020 at 16:28):

a different example, in interface types-land is when you have (record (field "x" $sometype))

view this post on Zulip Luke Wagner (May 18 2020 at 16:28):

$sometype could be one of several different "classes" of type

view this post on Zulip Luke Wagner (May 18 2020 at 16:28):

(in this case there's no reference, it's just a record value embedding some other value as a field)

view this post on Zulip Alex Crichton (May 18 2020 at 16:29):

it's ok, I'll just take this as a given for now :)

view this post on Zulip Luke Wagner (May 18 2020 at 16:30):

FWIW, if you look at the binary encoding of function types (https://webassembly.github.io/spec/core/binary/types.html#function-types), the first byte, 0x60, is the "what class of typedef is this" byte

view this post on Zulip Luke Wagner (May 18 2020 at 16:30):

it's just explicitly called out b/c there is currently only one class of typedef

view this post on Zulip Alex Crichton (May 18 2020 at 16:30):

right yeah, so more types fits well

view this post on Zulip Luke Wagner (May 18 2020 at 16:30):

but it is carefully chosen to be distinct from every other type code: https://searchfox.org/mozilla-central/source/js/src/wasm/WasmConstants.h#63

view this post on Zulip Alex Crichton (May 18 2020 at 16:31):

I just don't fully understand how gc changes things to require one index space as opposed to multiple type index spaces

view this post on Zulip Luke Wagner (May 18 2020 at 16:31):

well even now we only have 1 type index space

view this post on Zulip Alex Crichton (May 18 2020 at 16:31):

I know yeah, but I don't know why we wouldn't want more with this proposal

view this post on Zulip Luke Wagner (May 18 2020 at 16:31):

the "function" index space isn't a space of types, it's a space of function definitions

view this post on Zulip Alex Crichton (May 18 2020 at 16:32):

right yeah, but with module types I don't know why we don't rename the type index space to "function type" index space

view this post on Zulip Alex Crichton (May 18 2020 at 16:32):

you're mentioning because of gc (ref $ty) things, but I don't fully get that, but that's also ok I don't really need to at this point

view this post on Zulip Luke Wagner (May 18 2020 at 16:32):

i guess b/c we've already planned to have a single index space, discriminating with the 0x60 prefix code

view this post on Zulip Alex Crichton (May 18 2020 at 16:32):

it just means extra validation rules for type references

view this post on Zulip Luke Wagner (May 18 2020 at 16:32):

which technically avoids needing to put that discriminator on every use

view this post on Zulip Luke Wagner (May 18 2020 at 16:32):

in some cases, yes, but in other cases, i think you need the discriminant in one place or the other

view this post on Zulip Luke Wagner (May 18 2020 at 16:33):

e.g., function definitions can only refer to function types, so yeah, there it's extra checking

view this post on Zulip Luke Wagner (May 18 2020 at 16:33):

but other cases can reference any type

view this post on Zulip Luke Wagner (May 18 2020 at 16:34):

it's mostly poTAYto poTAHto though :)

view this post on Zulip Luke Wagner (May 18 2020 at 16:34):

we've just have established a precedent in one direction

view this post on Zulip Alex Crichton (May 18 2020 at 16:34):

shrug

view this post on Zulip Alex Crichton (May 18 2020 at 16:36):

@Luke Wagner can you also explain a bit more this alias thing?

view this post on Zulip Alex Crichton (May 18 2020 at 16:36):

I don't really understand much about it other than "it can refer to parent stuff"

view this post on Zulip Luke Wagner (May 18 2020 at 17:00):

(sorry meetings) yeah, alias lets you inject a definition in either (1) your parent, (2) the export of an imported or nested instance

view this post on Zulip Luke Wagner (May 18 2020 at 17:00):

I'll post an update to the PR later today

view this post on Zulip Alex Crichton (May 18 2020 at 17:02):

mk I'll wait for that

view this post on Zulip Luke Wagner (May 18 2020 at 22:57):

@Alex Crichton https://github.com/WebAssembly/module-types/pull/3/commits/73950e7f977fc601107b4bdc5058389772cd8d45

As is, the Module Types proposal tweaks the spec-internal definition of module/instance types and gives them a text format so that module/instance types can be used in toolchains, but there are no ...

view this post on Zulip Alex Crichton (May 18 2020 at 23:41):

@Luke Wagner can you give an example of what you're thinking with an alias referring to something in the enclosing module?

view this post on Zulip Alex Crichton (May 18 2020 at 23:44):

It also feels a bit odd to have so many alias/instance sections, do you think it'd be possible to fold alias definitions into the instance section? The same idea just a different binary structure where the instance section would be a bit more "meaty". Each element in the instance section would either instantiate a module or create an alias for a previous instance.

view this post on Zulip Alex Crichton (May 18 2020 at 23:44):

eh maybe not, you have to alias imports as well... anyway

view this post on Zulip Alex Crichton (May 18 2020 at 23:46):

So the intention is that (alias $name (func $instance $fname)) -- $name is the name of the function we're creating (optional), $instance is an instance index, and $fname, is that encoded as "fname" the string or an index where it's the nth exported function (or nth export?) of $instance?

view this post on Zulip Luke Wagner (May 19 2020 at 00:15):

For the first question: the reason for aliases referring to the outer module is just to remove redundancy; in package linking scenarios, the same module/instance types get repeated a ton.

view this post on Zulip Luke Wagner (May 19 2020 at 00:21):

For the second: yeah, that was the alternative I liked for a while. The thing that's a bit odd about it is that an alias in the instance section doesn't add a new kind of instance, so it's not really the "instance" section but the "instance and alias of instance exports" section. Aliases are a bit like Imports, so it seems vaguely symmetric to give them their own section. But it's not like that's the only way to do it.

view this post on Zulip Luke Wagner (May 19 2020 at 00:21):

For the third: yes, that's right. $fname is an index into the exports array of $instance's type definition (which is local tot he module)

view this post on Zulip Alex Crichton (May 19 2020 at 02:30):

@Luke Wagner oh referring to the parent makes sense, I was wondering if you had thoughts on the binary format? If aliases can only refer to exports you can't define exports before instantiating right? I'm just murky on how the specifics of semantics and binary encoding would work

view this post on Zulip Luke Wagner (May 19 2020 at 03:26):

@Alex Crichton Aliases can refer to any type/module defs in the parents' index spaces, not just exports. It's only aliases to nested instances that can only refer to exports

view this post on Zulip Alex Crichton (May 19 2020 at 16:22):

@Luke Wagner ok so here's what I've got so far, mind skimming it over and see if I misinterpreted or am missing anything? -- https://gist.github.com/alexcrichton/506ac6d2f7d505d556d68fb969489183

GitHub Gist: instantly share code, notes, and snippets.

view this post on Zulip Luke Wagner (May 19 2020 at 16:32):

Looking at importdesc, one could reconsider the existing 0x00 case from always being a func to instead being "whatever the indexed type is", so that as soon as you add module/instance types, the 0x00 can refer to them just as well as func. E.g., that's what (ref $typeIndex) will do

view this post on Zulip Luke Wagner (May 19 2020 at 16:33):

But actually, that's super-bikeshed-y, so n/m

view this post on Zulip Luke Wagner (May 19 2020 at 16:38):

Different, but perhaps-more-justifiable bikeshed: in the Alias section, for the parent aliases, could the second byte of module be 0x04 (to match import and export)?

view this post on Zulip Alex Crichton (May 19 2020 at 16:40):

oh sure

view this post on Zulip Alex Crichton (May 19 2020 at 16:40):

although you may be able to help me clarify that

view this post on Zulip Alex Crichton (May 19 2020 at 16:40):

when you refer toy your parent's index

view this post on Zulip Alex Crichton (May 19 2020 at 16:40):

that's indexing within your parent's module index space, right? not your parent's export index space?

view this post on Zulip Luke Wagner (May 19 2020 at 16:41):

correctamundo

view this post on Zulip Alex Crichton (May 19 2020 at 16:41):

ok yeah I can change the alignment

view this post on Zulip Luke Wagner (May 19 2020 at 16:41):

also great idea with using (parent $x) as the text format, i'm going to update the Explainer to match

view this post on Zulip Alex Crichton (May 19 2020 at 16:42):

I do want to document the text form at as well, but I'll probably do that in tandem with writing the text parser

view this post on Zulip Luke Wagner (May 19 2020 at 16:42):

One other tweak: in components, I think there aren't any Table, Memory or Global sections allowed, so perhaps you can explicitly say these are disallowed

view this post on Zulip Alex Crichton (May 19 2020 at 16:43):

table/global makes sense, but for memory, do you mean no defined memory, but you can still import memory, right?

view this post on Zulip Alex Crichton (May 19 2020 at 16:43):

(e.g. string.lift needs an index space to operate on)

view this post on Zulip Luke Wagner (May 19 2020 at 16:44):

yes, you can still alias all of the memories/tables/globals of nested instances

view this post on Zulip Alex Crichton (May 19 2020 at 16:44):

ah ok, so you still have an index space, just no local definitions

view this post on Zulip Alex Crichton (May 19 2020 at 16:44):

makes sense

view this post on Zulip Luke Wagner (May 19 2020 at 16:45):

one last change: i think, unlike the core section rules, we can allow the Function section to be intermingled with the Instance/Module/Type sections

view this post on Zulip Luke Wagner (May 19 2020 at 16:46):

so that adapter functions can be imported by nested instances

view this post on Zulip Alex Crichton (May 19 2020 at 16:46):

only for components, though?

view this post on Zulip Luke Wagner (May 19 2020 at 16:46):

correct

view this post on Zulip Luke Wagner (May 19 2020 at 16:46):

i thought about it a lot for core modules and i think it needs the stricter separation

view this post on Zulip Luke Wagner (May 19 2020 at 16:47):

part of what makes it make sense for adapter functions is that component instances are (mostly, get to that in a sec) stateless

view this post on Zulip Luke Wagner (May 19 2020 at 16:47):

so we can say that the component instance is created before its nested instances

view this post on Zulip Luke Wagner (May 19 2020 at 16:47):

the one bit of state a component instance has is: which of my nested instances have been created

view this post on Zulip Alex Crichton (May 19 2020 at 16:47):

but we'd still have the trap-if-you-call-too-early semantics?

view this post on Zulip Luke Wagner (May 19 2020 at 16:47):

so what we can say is that it is a dynamic error to call an export of an instance that hasn't been created

view this post on Zulip Luke Wagner (May 19 2020 at 16:48):

and so if a component has nested instances A B and C

view this post on Zulip Alex Crichton (May 19 2020 at 16:48):

heh yeah that works too

view this post on Zulip Luke Wagner (May 19 2020 at 16:48):

then A can call adapter functions which call component imports,

view this post on Zulip Luke Wagner (May 19 2020 at 16:49):

but if it tries to call into B or C, that'll trap

view this post on Zulip Luke Wagner (May 19 2020 at 16:49):

there is that subtle detail of are A's exports visible if A's start function calls the adapter function

view this post on Zulip Alex Crichton (May 19 2020 at 16:50):

"meh"

view this post on Zulip Luke Wagner (May 19 2020 at 16:50):

i think for now we can say "yes", and then later talk about the "after start" function thing later

view this post on Zulip Luke Wagner (May 19 2020 at 16:50):

b/c import adapters definitely need to be able to reenter the core module caller, to call malloc()

view this post on Zulip Alex Crichton (May 19 2020 at 16:51):

ok updated the gist with this

view this post on Zulip Alex Crichton (May 19 2020 at 16:51):

https://gist.github.com/alexcrichton/506ac6d2f7d505d556d68fb969489183/revisions#diff-592762304f59428e5070ae1c46ce9859

GitHub is where people build software. More than 50 million people use GitHub to discover, fork, and contribute to over 100 million projects.

view this post on Zulip Luke Wagner (May 19 2020 at 16:52):

ah, i see you're writing to cover both the module and component cases

view this post on Zulip Luke Wagner (May 19 2020 at 16:52):

in that case, you might want to add a caveat to the version word noting that the 0x1 is only for components

view this post on Zulip Alex Crichton (May 19 2020 at 16:52):

oh sorry yeah this is sort of like a diff of what we expect to land in wasmparser

view this post on Zulip Alex Crichton (May 19 2020 at 16:52):

which would parse both

view this post on Zulip Luke Wagner (May 19 2020 at 16:53):

anyhow, just wanted to say great job on this!

view this post on Zulip Luke Wagner (May 19 2020 at 17:02):

do you mind if i resolve your open conversations in the Module Linking PR? or happy for you to comment on them

view this post on Zulip Alex Crichton (May 19 2020 at 17:04):

oh sorry of course

view this post on Zulip Alex Crichton (May 19 2020 at 17:04):

were you waiting on me to do that?

view this post on Zulip Alex Crichton (May 19 2020 at 17:04):

or is it expected that I do that?

view this post on Zulip Alex Crichton (May 19 2020 at 17:04):

they were resolved like the second you replied weeks ago lol

view this post on Zulip Luke Wagner (May 19 2020 at 17:50):

haha, no worries, i was making progress on other discussions in the PR, so no waiting. i was just tidying up since it's looking like we're almost done. i'm still going to present at the next CG meeting before merging, i think

view this post on Zulip Luke Wagner (May 19 2020 at 17:50):

well, i dunno

view this post on Zulip Luke Wagner (May 19 2020 at 18:07):

btw, i was working through an example with transitive dependencies to see how the link step would work. my main interest was finding a scheme whereby the generated Linked Module simply imports unmodified Packaged Modules and avoiding the module type of a given Packaged Module needing to capture all of its transitive dependencies in its module type. i ended up finding a wrapping scheme that uses nested modules and parent aliases to essentially "curry" module imports: https://gist.github.com/lukewagner/d662cbe7b58281672053dab4118d25b7

GitHub Gist: instantly share code, notes, and snippets.

view this post on Zulip Luke Wagner (May 19 2020 at 18:08):

might be an interesting test case ;)

view this post on Zulip Alex Crichton (May 20 2020 at 23:13):

@Luke Wagner here's something to think about:

(type $t (instance (export "x" (instance (type $t)))))

view this post on Zulip Alex Crichton (May 20 2020 at 23:13):

I think that needs to be an error of some kind

view this post on Zulip Alex Crichton (May 20 2020 at 23:14):

or something like for the text format we have to figure out a DAG of how to visit type definitions

view this post on Zulip Alex Crichton (May 20 2020 at 23:17):

I'm assuming we're doing the same thing for instances as we do for functions, which is when you declare the type you can specify both the (type xx) reference as well as the type inline (e.g. (func (type 0) (param i32)))

view this post on Zulip Alex Crichton (May 20 2020 at 23:17):

but when both are specified, we need to validate they're equal

view this post on Zulip Alex Crichton (May 20 2020 at 23:17):

(e.g. type 0 is indeed (func (param i32)))

view this post on Zulip Alex Crichton (May 20 2020 at 23:18):

so like I think this is technically valid:

(type $t (instance
  (export "x" (instance (type $t)
    (export "x" (instance (type $t)
      (export "x" (instance (type $t)
        (export "x" (instance (type $t)
          (export "x" (instance (type $t)
            (export "x" (instance (type $t)
            (export "x" (instance (type $t)))
            ))
          ))
        ))
      ))
    ))
  ))
))

view this post on Zulip Alex Crichton (May 20 2020 at 23:19):

but validating that at the text level isn't really easy to do because as we're determining the canonical value for $t we need to know what $t is to compare it to $t

view this post on Zulip Alex Crichton (May 20 2020 at 23:19):

we can perhaps solve this by saying "don't do the function thing, you say the index or you say the inline, not both"

view this post on Zulip Alex Crichton (May 20 2020 at 23:20):

well so let me rephrase, this seems like it could either be a binary error or a text error

view this post on Zulip Alex Crichton (May 20 2020 at 23:21):

I feel like we want it to be a text error

view this post on Zulip Alex Crichton (May 20 2020 at 23:21):

I'm getting tripped up in how circularly recursive this is

view this post on Zulip Luke Wagner (May 20 2020 at 23:59):

Haha, yes, I think no recursive types for now. Perhaps just by requiring all type indices refer to earlier typedefs

view this post on Zulip Luke Wagner (May 21 2020 at 00:02):

Also, agreed that if you write (instance (type $T)) you shouldn't need to be able to re-state all the exports/imports

view this post on Zulip Luke Wagner (May 21 2020 at 00:03):

That's really just a goofy special case for functions so they can assign identifiers to parameters

view this post on Zulip Luke Wagner (May 21 2020 at 00:04):

Actually, for directly-embedded type definitions like you're showing, I think no circularity forever; the only exception is circularity via (ref $T)

view this post on Zulip Alex Crichton (May 21 2020 at 00:24):

@Luke Wagner ok sounds good, I'll need to write some sort of sorting pass in the text parser then too to do a topological sort

view this post on Zulip Luke Wagner (May 21 2020 at 00:25):

ah, bummer, yes. for starters you could require text to be in order too ;)

view this post on Zulip Alex Crichton (May 21 2020 at 00:49):

thinking more on it, you're right, I think the text format should impose the restriction itself

view this post on Zulip Alex Crichton (May 21 2020 at 22:30):

@Luke Wagner here's another interesting thing to think about, so right now you can specify imports/exports inline in the text format, e.g. these are all the same

(func $f)
(export "" (func $f))
(func $f (export ""))

(import "" "" (func $f))
(func $f (import "" ""))

view this post on Zulip Alex Crichton (May 21 2020 at 22:30):

for nested modules, however, the proposed syntax you've got so far doesn't support this

view this post on Zulip Alex Crichton (May 21 2020 at 22:30):

b/c (module ...) the imports/exports would be ambiguous

view this post on Zulip Alex Crichton (May 21 2020 at 22:30):

or I guess it'd be ambiguous without more lookahead

view this post on Zulip Alex Crichton (May 21 2020 at 22:30):

I'm wondering if it makes sense to do something like:

view this post on Zulip Alex Crichton (May 21 2020 at 22:31):


view this post on Zulip Alex Crichton (May 21 2020 at 22:31):

(nested-module (type $ty) (module ...))

view this post on Zulip Alex Crichton (May 21 2020 at 22:31):

where you can specify inline imports/exports if you want

view this post on Zulip Alex Crichton (May 21 2020 at 22:31):

and (module ...) bare is sugar for auto-calculating the type, inserting it into the type index space, and not having any implicit imports/exports

view this post on Zulip Alex Crichton (May 21 2020 at 22:32):

I was also wondering how you'd do something like (module (type $module_type) ...) b/c that's also ambiguous

view this post on Zulip Alex Crichton (May 21 2020 at 22:32):

but at least for printing the binary format we'll need a way to say "the module type was specified at this index, don't inject anything else"

view this post on Zulip Luke Wagner (May 21 2020 at 22:37):

@Alex Crichton sorry, which part was ambiguous?

view this post on Zulip Luke Wagner (May 21 2020 at 22:38):

i would think (module ...) would be mostly symmetric to (func ...) in the ways you mentioned (import, export, explicit type index)

view this post on Zulip Luke Wagner (May 21 2020 at 22:39):

i see your point that it'd be necessary to scan the whole body of (module ...) to determine the module type, unlike (func ...) which tells you that basically up-front

view this post on Zulip Luke Wagner (May 21 2020 at 22:40):

but i think there's still symmetry with func in that, to validate the body of a (func ...), you have to have parsed all the types of all the other functions first (so that you know all the funcs' types)

view this post on Zulip Alex Crichton (May 21 2020 at 22:47):

hm ok so I've got two concerns here

view this post on Zulip Alex Crichton (May 21 2020 at 22:47):

one is how to parse this:

(module
  (module (export "x"))
)

view this post on Zulip Alex Crichton (May 21 2020 at 22:47):

another is how to parse this:

(module
  (type (module))
  (module (type 0))
)

view this post on Zulip Alex Crichton (May 21 2020 at 22:47):

which are sort of the same concern

view this post on Zulip Alex Crichton (May 21 2020 at 22:48):

basically everything in parsing only requires 1 lookahead right now, but those would otherwise require multiple tokens of lookahead

view this post on Zulip Alex Crichton (May 21 2020 at 22:48):

it's not necessarily ambiguous but it is pretty weird

view this post on Zulip Luke Wagner (May 22 2020 at 16:44):

Ohhhhh, I finally get your meaning; you're saying: when I see the tokens ( module ( export, I don't know if I'm parsing an export of the module definition, or declaring that this module definition is exported. I was thinking of it as an already-parsed AST which is after this question has been sorted :)

view this post on Zulip Alex Crichton (May 22 2020 at 16:47):

@Luke Wagner oh sorry, but yes

view this post on Zulip Luke Wagner (May 22 2020 at 16:48):

So I suppose in both cases, the (export "x") and (type 0) require a constant amount of lookahead to see what they are

view this post on Zulip Alex Crichton (May 22 2020 at 16:48):

that's why I was thinking of (nested-module ...) because that would put you in a parsing context where you clearly know what's what

view this post on Zulip Alex Crichton (May 22 2020 at 16:48):

that's true yeah

view this post on Zulip Alex Crichton (May 22 2020 at 16:48):

the constant is just bigger than 1 heh

view this post on Zulip Luke Wagner (May 22 2020 at 16:49):

are you using an LR(1) parser generator?

view this post on Zulip Alex Crichton (May 22 2020 at 16:49):

nah it's all hand-coded so it's easy enough to do

view this post on Zulip Alex Crichton (May 22 2020 at 16:49):

but it also just looks funky

view this post on Zulip Alex Crichton (May 22 2020 at 16:49):

from a readability point of view

view this post on Zulip Alex Crichton (May 22 2020 at 16:49):

not that inline exports/imports are used all that often

view this post on Zulip Luke Wagner (May 22 2020 at 16:49):

well it's a good question, i'll file an issue on the repo after we merge the linking PR (i'm thinking mid next week)

view this post on Zulip Alex Crichton (May 22 2020 at 16:50):

I'll stick to "just do more lookahead" for now

view this post on Zulip Luke Wagner (May 22 2020 at 16:50):

:thumbs_up:

view this post on Zulip Alex Crichton (May 22 2020 at 20:56):

Hm ok so I'm getting really tripped up how to implement alias statements in text parsing

view this post on Zulip Alex Crichton (May 22 2020 at 20:56):

everything is so circular and I don't know how to untangle things

view this post on Zulip Alex Crichton (May 22 2020 at 20:56):

so right now "elaboration" is a pretty simple 3-phase process

view this post on Zulip Alex Crichton (May 22 2020 at 20:56):

er, 4 I guess

view this post on Zulip Alex Crichton (May 22 2020 at 20:57):

1) expand inline imports/exports
2) expand inline type annotations to actual type declarations
3) record what index each name is at
4) fill in all names with their indexes

view this post on Zulip Alex Crichton (May 22 2020 at 20:57):

during step (4) we also have this extra "validate the inline function type matches the referenced type" if you do something like (func (type 0) (param i32))

view this post on Zulip Alex Crichton (May 22 2020 at 20:57):

but I don't know how to do this for modules

view this post on Zulip Alex Crichton (May 22 2020 at 20:57):

so if you have a nested module like (module (module))

view this post on Zulip Alex Crichton (May 22 2020 at 20:58):

then that needs to be elaborated to

(module
  (type (module))
  (module (type 0))
)

view this post on Zulip Alex Crichton (May 22 2020 at 20:58):

so when the module type isn't listed, we need to calculate it an inject it as a module type

view this post on Zulip Alex Crichton (May 22 2020 at 20:58):

but I'm not sure how we can calculate it in the face of (alias (parent ...))

view this post on Zulip Alex Crichton (May 22 2020 at 20:58):

because that needs information from the parent, which if we're only in step (2) we don't even have symbolic names yet

view this post on Zulip Alex Crichton (May 22 2020 at 20:58):

much less stable indexes because we're still injecting new type annotations depending on what we're seeing

view this post on Zulip Alex Crichton (May 22 2020 at 20:59):

so like for example

(module
  (module
    (alias (parent (type 0)))
    (func (type 0) (param i32))
  )
)

view this post on Zulip Alex Crichton (May 22 2020 at 20:59):

the definition of the parent module's type 0 is going to be the type of the inlined module

view this post on Zulip Alex Crichton (May 22 2020 at 21:00):

but the inlined module references that

view this post on Zulip Alex Crichton (May 22 2020 at 21:01):

Some of this is just complexity I think, but I'm just stuck at what to do

view this post on Zulip Alex Crichton (May 22 2020 at 21:01):

I don't know if I should just throw everything out and start from scratch with a complicated resolver

view this post on Zulip Alex Crichton (May 22 2020 at 21:01):

or try to fit things cleanly into what already exists

view this post on Zulip Alex Crichton (May 22 2020 at 21:02):

I can't tell if all this elaboration/name resolution has to happen in cycles till it reaches some sort of fixed point or otherwise what the precise order of passes is to figure everything out correctly

view this post on Zulip Alex Crichton (May 22 2020 at 21:05):

Another example would be

(module
  (module
    (alias (parent (type $foo)))
    (func (type 0))
  )
  (type $foo (func))
)

I don't know what order to do things. The index of $foo is 1, but we don't know that until the type of the nested module is elaborated. We can't do that though until we figure out what it's parent reference is pointing to

view this post on Zulip Luke Wagner (May 22 2020 at 21:17):

@Alex Crichton So for the text format parsing, my inclination would be to say that, when you see an explicit type declaration, you simply bake in that index, no questions asked; if the module fails to validate, it's the author's fault

view this post on Zulip Alex Crichton (May 22 2020 at 21:17):

oh sure yeah wat does extremely little validation

view this post on Zulip Alex Crichton (May 22 2020 at 21:18):

this is more just about trying to do name resolution where we're figuring out what indexes are assigned to everything

view this post on Zulip Alex Crichton (May 22 2020 at 21:20):

like is this supposed to work?

(module
  (type $foo (instance
    (export "" (func $bar))
  ))
  (module
    (alias (parent (type $foo)))
    (import "" (instance $i (type 0)))
    (alias ($i (func $bar)))
    (func
      call 0)
  )
)

view this post on Zulip Luke Wagner (May 22 2020 at 21:23):

So one thing we can do for now (and perhaps forever) is to say that aliases can only refer to preceding definitions

view this post on Zulip Alex Crichton (May 22 2020 at 21:24):

but even in the above case

view this post on Zulip Alex Crichton (May 22 2020 at 21:24):

we're doing name resolution across modules

view this post on Zulip Alex Crichton (May 22 2020 at 21:24):

and everything is in preceding order and everything

view this post on Zulip Alex Crichton (May 22 2020 at 21:25):

(alias ($i (func $bar))) realizes that $i resolves to local instance 0, which is imported, which has a type defined locally, but that type was aliased from a parent module

view this post on Zulip Alex Crichton (May 22 2020 at 21:25):

like these are kind of "dumb concerns" in that they're only really applicable to the implementation of a text parser and don't really have many implications on the binary format

view this post on Zulip Alex Crichton (May 22 2020 at 21:26):

I'm tripping myself up so much because the text format is so simple today and I can't figure out how to make the addition of modules as simple as it is right now

view this post on Zulip Alex Crichton (May 22 2020 at 21:26):

without having things like global name resolution and tombstones for "this'll get resolved later" and things like that

view this post on Zulip Luke Wagner (May 22 2020 at 21:26):

Yeah, I think the text parser has to maintain a stack of identifier scopes (one per nested module) that it updates in a linear pass over the AST

view this post on Zulip Luke Wagner (May 22 2020 at 21:27):

I suppose what this means is that some identifiers (e.g., calls to $functions in function bodies) get resolved at the end (b/c they are allowed to be circular), while some get filled in as part of a linear pass

view this post on Zulip Alex Crichton (May 22 2020 at 21:27):

yeah that's also the weird part for me

view this post on Zulip Alex Crichton (May 22 2020 at 21:28):

the text format is super loose today in that you just throw things in a soup and a valid module almost always pops out

view this post on Zulip Alex Crichton (May 22 2020 at 21:28):

but this is starting to place lots of restrictions of "no everything has to be very strictly ordered"

view this post on Zulip Luke Wagner (May 22 2020 at 21:28):

well, i think every identifier that exists today would be in this "fill in at the end when all identifiers are known" category,

view this post on Zulip Luke Wagner (May 22 2020 at 21:30):

and we're just introducing a new "kind" of identifier that gets resolved in a new, earlier, linear pass

view this post on Zulip Alex Crichton (May 22 2020 at 21:30):

that makes sense, yeah, and I'm getting tripped up trying to not do that

view this post on Zulip Alex Crichton (May 22 2020 at 21:30):

it feels wrong to have "oh these identifiers work only linearly" and "oh but those identifiers can work anywhere"

view this post on Zulip Alex Crichton (May 22 2020 at 21:31):

and I can't figure out how to prove "yes the linear stuff is required due to this design constraint"

view this post on Zulip Luke Wagner (May 22 2020 at 21:32):

heh, i guess that's sortof the case with C++; in struct C { typedef int X; X foo() { bar(); } X bar() { foo(); }, the names bar() and foo() can be cyclic whereas the reference to X has to be in order

view this post on Zulip Alex Crichton (May 22 2020 at 21:33):

I'll just try to work something out

view this post on Zulip Luke Wagner (May 22 2020 at 21:34):

But yeah, I see what you mean, the current parse rules simply parse every field in isolation and then do name resolution as a wholly separate, order-independent pass

view this post on Zulip Luke Wagner (May 22 2020 at 21:46):

Trying to think a bit more about what the general rule is, I think it's this: you do most name resolution in a linear pass, and any time a name is un-resolved:

Then, in a second pass, go over the placeholders and check that they resolve to a name and that name is not an alias or instance definition

Thus, only name cycles involving aliases or instance definitions would be disallowed; everything else should get resolved as it is today

view this post on Zulip Alex Crichton (May 22 2020 at 23:05):

I think that sounds reasonable, yeah, I got further today in implementing all this, I think this is the last step for the text format

view this post on Zulip Alex Crichton (May 22 2020 at 23:05):

I'll work on getting this implemented next week to see if it all works

view this post on Zulip Alex Crichton (May 26 2020 at 21:12):

I'm still trying to wrap my head around the set of changes here

view this post on Zulip Alex Crichton (May 26 2020 at 21:12):

I'm definitely getting from the point that wat was previously a pretty simple parser with a single pass to resolve names, but now it's becoming more of a compiler almost where it's got a type resolutoin pass and such

view this post on Zulip Alex Crichton (May 26 2020 at 21:13):

but to confirm, @Luke Wagner the encoding of this module:

(module $outer
  (module $inner
    (module $child (export "a"))
  )
)

would look like this?

(module $outer
  (type $child_type (module))
  (type $inner_type (module
    (export "a" (module (type $child_type)))
  ))

  (module $inner (type $inner_type)
    (alias $child_type_inner (parent (type $child_type)))
    (module $child (type $child_type_inner))
    (export "a" (module $child))
  )
)

view this post on Zulip Luke Wagner (May 26 2020 at 21:23):

Yeah, that looks right, and yes, I can see how this makes the text-to-binary a lot more complicated

view this post on Zulip Alex Crichton (Jun 01 2020 at 19:09):

@Luke Wagner (export $some_instance) is only intended to be sugar for the text format, right? not reflected in the binary format?

view this post on Zulip Luke Wagner (Jun 01 2020 at 19:29):

Good question! For now: yes. At some point in the future, when one can import instances of imported instance types (O_O), they may need to become first-class things b/c it won't be possible to desugar them at text-to-binary time

view this post on Zulip Dan Gohman (Jun 01 2020 at 23:14):

@Luke Wagner The'

view this post on Zulip Dan Gohman (Jun 01 2020 at 23:15):

There's no way to use interface types with a global, right?

view this post on Zulip Dan Gohman (Jun 01 2020 at 23:17):

And as a related question, in theory commands could have immutable global exports, which could be a way for commands to export metadata, however without the ability to export strings or other higher-level types, that may not be very valuable.

view this post on Zulip Luke Wagner (Jun 01 2020 at 23:26):

Good question! Coincidentally I was just thinking about this and how it might work for stuff like metadata. It seems like one could have, as a component import/export an interface value (not const global, just a pure value) and this could be lowered into a core const global import, and this could allow one to import compound JSONesque values

view this post on Zulip Luke Wagner (Jun 01 2020 at 23:27):

@Dan Gohman

view this post on Zulip Dan Gohman (Jun 01 2020 at 23:32):

Ah, and by being a value export, rather than a global export, you'd read it with an adapter function, and not with global.get

view this post on Zulip Luke Wagner (Jun 01 2020 at 23:34):

Yup! And it lets you perform a one-time conversion (performing malloc etc)

view this post on Zulip Luke Wagner (Jun 01 2020 at 23:34):

Without thinking about global.set

view this post on Zulip Alex Crichton (Jun 10 2020 at 22:15):

@Luke Wagner here's an interesting question, how should this be encoded?

(module
  (import "" (module))
  (type (module)))

view this post on Zulip Alex Crichton (Jun 10 2020 at 22:15):

the import annotation has an "inline" type annotation

view this post on Zulip Alex Crichton (Jun 10 2020 at 22:15):

but this inline type annotation is listed later

view this post on Zulip Alex Crichton (Jun 10 2020 at 22:15):

with functions this works today because you just encode all types first

view this post on Zulip Alex Crichton (Jun 10 2020 at 22:16):

but with the first 5 sections in any order, it's unclear what the text format is supposed to do here in that regard

view this post on Zulip Alex Crichton (Jun 10 2020 at 22:16):

e.g. does this encode as:

(module
  (type (module)) ;; injected
  (import "" (module (type 0)))
  (type (module)) ;; original type annotation
)

view this post on Zulip Alex Crichton (Jun 10 2020 at 22:16):

or like this:

(module
  (type (module)) ;; original annotation reordered first
  (import "" (module (type 0)))

view this post on Zulip Alex Crichton (Jun 10 2020 at 22:17):

I suppose I'm answering my question as I'm writing this down, it basically has to be the former

view this post on Zulip Alex Crichton (Jun 10 2020 at 22:17):

this is just weird because now this doesn't behave as functions do

view this post on Zulip Luke Wagner (Jun 10 2020 at 23:04):

Yeah, good question. So in the analogous function situation, when the inline type def goes before the explicit type def... do you get two type defs or 1?

view this post on Zulip Luke Wagner (Jun 10 2020 at 23:04):

/me goes to look at spec

view this post on Zulip Luke Wagner (Jun 10 2020 at 23:12):

By my reading, an inline func type followed by explicit func type def will produce two type defs: only inline funcs "reach back"; type defs don't

view this post on Zulip Luke Wagner (Jun 10 2020 at 23:13):

Thus, I think the your former module case would be symmetric to funcs

view this post on Zulip Luke Wagner (Jun 10 2020 at 23:14):

... and yet, wabt's wat2wasm seems to merge them

view this post on Zulip Luke Wagner (Jun 10 2020 at 23:15):

I think technically that's a bug in wabt

view this post on Zulip Alex Crichton (Jun 10 2020 at 23:15):

ah yeah I've mostly gone by wabt's behavior which may be a bug

view this post on Zulip Luke Wagner (Jun 10 2020 at 23:15):

well, practically speaking, i'd do whatever was easiest for now, but if it's the former, i wouldn't feel bad about it

view this post on Zulip Alex Crichton (Jun 10 2020 at 23:17):

order of items previously in the text format have largely been irrelevant, but with the 5 sections at the front that can be all interleaved I think it's a lot more imporatnt now

view this post on Zulip Alex Crichton (Jun 10 2020 at 23:17):

so I don't think there's actually any opiton other than the first, injecting a duplicate annotation

view this post on Zulip Alex Crichton (Jun 10 2020 at 23:18):

it's a relatively niche concern anyway though, it generally needs to just work

view this post on Zulip Alex Crichton (Jun 12 2020 at 19:35):

@Yury Delendik did you want to join a video chat about https://github.com/bytecodealliance/wasm-tools/pull/26 ?

This commit is the initial implementation of the module linking proposal in the three tooling crates of this repository. Unfortunately this is just one massive commit which isn't really able to...

view this post on Zulip Yury Delendik (Jun 12 2020 at 19:36):

sure

view this post on Zulip Alex Crichton (Jun 12 2020 at 19:40):

k cool, @fitzgen (he/him) would you/yury be free this afternon?

view this post on Zulip Yury Delendik (Jun 12 2020 at 19:40):

/me is

view this post on Zulip fitzgen (he/him) (Jun 12 2020 at 19:40):

Yeah, free in roughly an hour and then for the rest of the day

view this post on Zulip Alex Crichton (Jun 12 2020 at 19:41):

k cool I'll send an invite


Last updated: Oct 23 2024 at 20:03 UTC