Stream: wit-bindgen

Topic: peginator?


view this post on Zulip Nathaniel McCallum (Nov 23 2022 at 15:51):

Is there a reason that wit-parser doesn't use something like https://docs.rs/peginator/0.4.0/peginator/ ? That would make it very explicit precisely what grammar is supported. Currently, there are WIT syntax docs in the component model repo that are not synchronized with what the actual code does.

view this post on Zulip Nathaniel McCallum (Nov 23 2022 at 15:52):

I'm thinking something like having a wit-ast crate which is just the output from peginator.

view this post on Zulip Nathaniel McCallum (Nov 23 2022 at 16:03):

@Alex Crichton @Brian @Luke Wagner ^

view this post on Zulip Brian (Nov 23 2022 at 16:43):

@Nathaniel McCallum I'm not sure of any historical reasons for not using a parser generator but w.r.t PEG specifically, in my experience error recovery and/or synchronization is always quite painful.

view this post on Zulip Brian (Nov 23 2022 at 16:46):

PEGs also tend to be a pretty heavy dependency which won't be ideal to include, especially if at some point the goal is to exposewit parsing / componentizing as a lightweight component itself.

view this post on Zulip Nathaniel McCallum (Nov 23 2022 at 16:50):

@Brian Does the difference between dependency and build-dependecy change that equasion?

view this post on Zulip Brian (Nov 23 2022 at 17:06):

@Nathaniel McCallum That's a good question and probably implementation specific. But the storage cost of PEGs (via packrat memoization) is proportional to the length of the input string. FWIW my experience was mainly with pest and i saw a lot of bloat as result of the generated code.

view this post on Zulip Nathaniel McCallum (Nov 23 2022 at 17:07):

@Brian packrat memoization is optional for peginator.

view this post on Zulip Nathaniel McCallum (Nov 23 2022 at 17:08):

To be clear, I'm trying to solve a narrow problem which is discontinuity between implementation and documentation.

view this post on Zulip Nathaniel McCallum (Nov 23 2022 at 17:09):

I'm aware there are other considerations.

view this post on Zulip Nathaniel McCallum (Nov 23 2022 at 17:09):

I'm not vying for a specific solution. Only that the discontinuity between impl and docs needs to be solved.

view this post on Zulip Brian (Nov 23 2022 at 17:14):

I understand. I'm just not entirely convinced that PEGs are the way to address that given the instability of WIT so-far. This sounds like a good topic for the next component tooling meeting next Friday.

view this post on Zulip Nathaniel McCallum (Nov 23 2022 at 17:32):

@Brian The instability is precisely the reason why discontinuity between the grammar and implementation needs to be addressed.

view this post on Zulip Ian Smith (Dec 20 2022 at 15:52):

@Nathaniel McCallum @Alex Crichton I have no idea if this is in-bounds for this type of discussion, but here goes: Any chance that the spec for WIT files (or whatever the ultimate name is for the object definitions) being in friendly format for many languages? I'd probably argue for yacc or Antlr4 but I'm open to just about anything that has a broad array of languages supported. (see also: https://github.com/rrevenantt/antlr4rust)

Barring that, what about some type of strategy where the already parsed and checked AST is made visible in other languages? The strategy that the protoc-* generator(s) use for protobuf is super convenient. Basically, they define a protobuf object (of course!) that represents an input to the protobuf compiler. That object already has all the tricky stuff done like syntax checking, resolving import paths, checking that everything is defined properly, etc. So, for example, the order that you receive each input to the protobuf compiler as an extension writer is topological so everything is defined before it is used, and so forth. Writing a new (language) binding for it is quite easy because you just walk around on the already validated objects to pull out the bits that you care about and then output whatever text you want based on that.

Thoughts?

ANTLR4 parser generator runtime for Rust programming laguage - GitHub - rrevenantt/antlr4rust: ANTLR4 parser generator runtime for Rust programming laguage

view this post on Zulip Ian Smith (Dec 20 2022 at 15:53):

ps. I'm a golang nerd.

view this post on Zulip Alex Crichton (Jan 03 2023 at 15:17):

@Ian Smith that's definitely the goal! The wit-parser crate is intended to be the "take stuff in and produce a fully-resolved AST" and right now there's just no definition for taking that AST to something like a JSON blob for consumption elsewhere. Nothing stopping it from being added though.

For the grammar of WIT I also agree it would be good to draw up a more formal grammar. I don't know how to do that myself though so it'd need to be contributed.

view this post on Zulip Ian Smith (Jan 03 2023 at 15:23):

I don't know anything about rust, but I know a lot about antlr. If you want to do a formal grammar via Antlr (or yacc or similar) I can help.

Is it currently implemented with a hand coded parser?

view this post on Zulip Alex Crichton (Jan 03 2023 at 15:29):

Yes currently it's hand coded, and that doesn't necessarily have to be replaced, just having a reference would be a good start in something formal

view this post on Zulip Ian Smith (Jan 03 2023 at 15:41):

Is the hand coded version stable or is still undergoing a lot of change?

view this post on Zulip Alex Crichton (Jan 03 2023 at 15:42):

Lot of change unfortunately

view this post on Zulip Ian Smith (Jan 03 2023 at 15:43):

np, I can hack together something in Antlr pretty fast once you are convinced the parser is in a reasonable state

view this post on Zulip Jarrod Overson (Jan 05 2023 at 15:14):

@Ian Smith do you have a work in progress anywhere yet? I'd be interested in helping out or writing tests.

view this post on Zulip DougAnderson444 | PeerPiper.io (Jun 15 2023 at 21:28):

@Ian Smith would antlr also be useful for making a VS Code extension for improved syntax highlighting?

view this post on Zulip DougAnderson444 | PeerPiper.io (Nov 18 2023 at 09:55):

Also https://crates.io/crates/pest is an another PEG parser, appears to be more popularly used and fairly easy to write parsing statements


Last updated: Jan 24 2025 at 00:11 UTC