peginator? · wit-bindgen · Zulip Chat Archive

Is there a reason that wit-parser doesn't use something like https://docs.rs/peginator/0.4.0/peginator/ ? That would make it very explicit precisely what grammar is supported. Currently, there are WIT syntax docs in the component model repo that are not synchronized with what the actual code does.

Nathaniel McCallum (Nov 23 2022 at 15:52):

I'm thinking something like having a wit-ast crate which is just the output from peginator.

Nathaniel McCallum (Nov 23 2022 at 16:03):

Brian (Nov 23 2022 at 16:43):

@Nathaniel McCallum I'm not sure of any historical reasons for not using a parser generator but w.r.t PEG specifically, in my experience error recovery and/or synchronization is always quite painful.

Brian (Nov 23 2022 at 16:46):

PEGs also tend to be a pretty heavy dependency which won't be ideal to include, especially if at some point the goal is to exposewit parsing / componentizing as a lightweight component itself.

Nathaniel McCallum (Nov 23 2022 at 16:50):

@Brian Does the difference between dependency and build-dependecy change that equasion?

Brian (Nov 23 2022 at 17:06):

@Nathaniel McCallum That's a good question and probably implementation specific. But the storage cost of PEGs (via packrat memoization) is proportional to the length of the input string. FWIW my experience was mainly with pest and i saw a lot of bloat as result of the generated code.

Nathaniel McCallum (Nov 23 2022 at 17:07):

Nathaniel McCallum (Nov 23 2022 at 17:08):

To be clear, I'm trying to solve a narrow problem which is discontinuity between implementation and documentation.

Nathaniel McCallum (Nov 23 2022 at 17:09):

I'm not vying for a specific solution. Only that the discontinuity between impl and docs needs to be solved.

Brian (Nov 23 2022 at 17:14):

I understand. I'm just not entirely convinced that PEGs are the way to address that given the instability of WIT so-far. This sounds like a good topic for the next component tooling meeting next Friday.

Nathaniel McCallum (Nov 23 2022 at 17:32):

@Brian The instability is precisely the reason why discontinuity between the grammar and implementation needs to be addressed.

Ian Smith (Dec 20 2022 at 15:52):

@Nathaniel McCallum @Alex Crichton I have no idea if this is in-bounds for this type of discussion, but here goes: Any chance that the spec for WIT files (or whatever the ultimate name is for the object definitions) being in friendly format for many languages? I'd probably argue for yacc or Antlr4 but I'm open to just about anything that has a broad array of languages supported. (see also: https://github.com/rrevenantt/antlr4rust)

Barring that, what about some type of strategy where the already parsed and checked AST is made visible in other languages? The strategy that the protoc-* generator(s) use for protobuf is super convenient. Basically, they define a protobuf object (of course!) that represents an input to the protobuf compiler. That object already has all the tricky stuff done like syntax checking, resolving import paths, checking that everything is defined properly, etc. So, for example, the order that you receive each input to the protobuf compiler as an extension writer is topological so everything is defined before it is used, and so forth. Writing a new (language) binding for it is quite easy because you just walk around on the already validated objects to pull out the bits that you care about and then output whatever text you want based on that.

GitHub - rrevenantt/antlr4rust: ANTLR4 parser generator runtime for Rust programming laguage

ANTLR4 parser generator runtime for Rust programming laguage - GitHub - rrevenantt/antlr4rust: ANTLR4 parser generator runtime for Rust programming laguage

Ian Smith (Dec 20 2022 at 15:53):

Alex Crichton (Jan 03 2023 at 15:17):

@Ian Smith that's definitely the goal! The wit-parser crate is intended to be the "take stuff in and produce a fully-resolved AST" and right now there's just no definition for taking that AST to something like a JSON blob for consumption elsewhere. Nothing stopping it from being added though.

For the grammar of WIT I also agree it would be good to draw up a more formal grammar. I don't know how to do that myself though so it'd need to be contributed.

Ian Smith (Jan 03 2023 at 15:23):

I don't know anything about rust, but I know a lot about antlr. If you want to do a formal grammar via Antlr (or yacc or similar) I can help.

Alex Crichton (Jan 03 2023 at 15:29):

Yes currently it's hand coded, and that doesn't necessarily have to be replaced, just having a reference would be a good start in something formal

Ian Smith (Jan 03 2023 at 15:41):

Alex Crichton (Jan 03 2023 at 15:42):

Ian Smith (Jan 03 2023 at 15:43):

np, I can hack together something in Antlr pretty fast once you are convinced the parser is in a reasonable state

Jarrod Overson (Jan 05 2023 at 15:14):

@Ian Smith do you have a work in progress anywhere yet? I'd be interested in helping out or writing tests.

DougAnderson444 | PeerPiper.io (Jun 15 2023 at 21:28):

@Ian Smith would antlr also be useful for making a VS Code extension for improved syntax highlighting?