Is there a reason that wit-parser
doesn't use something like https://docs.rs/peginator/0.4.0/peginator/ ? That would make it very explicit precisely what grammar is supported. Currently, there are WIT syntax docs in the component model repo that are not synchronized with what the actual code does.
I'm thinking something like having a wit-ast
crate which is just the output from peginator
.
@Alex Crichton @Brian @Luke Wagner ^
@Nathaniel McCallum I'm not sure of any historical reasons for not using a parser generator but w.r.t PEG specifically, in my experience error recovery and/or synchronization is always quite painful.
PEGs also tend to be a pretty heavy dependency which won't be ideal to include, especially if at some point the goal is to exposewit
parsing / componentizing as a lightweight component itself.
@Brian Does the difference between dependency and build-dependecy change that equasion?
@Nathaniel McCallum That's a good question and probably implementation specific. But the storage cost of PEGs (via packrat memoization) is proportional to the length of the input string. FWIW my experience was mainly with pest
and i saw a lot of bloat as result of the generated code.
@Brian packrat memoization is optional for peginator.
To be clear, I'm trying to solve a narrow problem which is discontinuity between implementation and documentation.
I'm aware there are other considerations.
I'm not vying for a specific solution. Only that the discontinuity between impl and docs needs to be solved.
I understand. I'm just not entirely convinced that PEGs are the way to address that given the instability of WIT so-far. This sounds like a good topic for the next component tooling meeting next Friday.
@Brian The instability is precisely the reason why discontinuity between the grammar and implementation needs to be addressed.
@Nathaniel McCallum @Alex Crichton I have no idea if this is in-bounds for this type of discussion, but here goes: Any chance that the spec for WIT files (or whatever the ultimate name is for the object definitions) being in friendly format for many languages? I'd probably argue for yacc or Antlr4 but I'm open to just about anything that has a broad array of languages supported. (see also: https://github.com/rrevenantt/antlr4rust)
Barring that, what about some type of strategy where the already parsed and checked AST is made visible in other languages? The strategy that the protoc-* generator(s) use for protobuf is super convenient. Basically, they define a protobuf object (of course!) that represents an input to the protobuf compiler. That object already has all the tricky stuff done like syntax checking, resolving import paths, checking that everything is defined properly, etc. So, for example, the order that you receive each input to the protobuf compiler as an extension writer is topological so everything is defined before it is used, and so forth. Writing a new (language) binding for it is quite easy because you just walk around on the already validated objects to pull out the bits that you care about and then output whatever text you want based on that.
Thoughts?
ps. I'm a golang nerd.
@Ian Smith that's definitely the goal! The wit-parser
crate is intended to be the "take stuff in and produce a fully-resolved AST" and right now there's just no definition for taking that AST to something like a JSON blob for consumption elsewhere. Nothing stopping it from being added though.
For the grammar of WIT I also agree it would be good to draw up a more formal grammar. I don't know how to do that myself though so it'd need to be contributed.
I don't know anything about rust, but I know a lot about antlr. If you want to do a formal grammar via Antlr (or yacc or similar) I can help.
Is it currently implemented with a hand coded parser?
Yes currently it's hand coded, and that doesn't necessarily have to be replaced, just having a reference would be a good start in something formal
Is the hand coded version stable or is still undergoing a lot of change?
Lot of change unfortunately
np, I can hack together something in Antlr pretty fast once you are convinced the parser is in a reasonable state
@Ian Smith do you have a work in progress anywhere yet? I'd be interested in helping out or writing tests.
@Ian Smith would antlr also be useful for making a VS Code extension for improved syntax highlighting?
Also https://crates.io/crates/pest is an another PEG parser, appears to be more popularly used and fairly easy to write parsing statements
Last updated: Jan 24 2025 at 00:11 UTC