Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the docs #254

Merged
merged 21 commits into from
Aug 9, 2017
Merged

Improve the docs #254

merged 21 commits into from
Aug 9, 2017

Conversation

deltaidea
Copy link
Contributor

@deltaidea deltaidea commented Jul 30, 2017

Preview of the progress so far

This is a joint effort to make the docs more straight-forward and complete.

I hijacked tjvr:ihatewriting branch and started this PR based on it. @tjvr can still push to this PR since he's a collaborator, and I didn't want to deal with plain-text conflicts.

Relevant:

Things to do (in a vaguely sensible order):

tjvr and others added 6 commits July 30, 2017 12:18
Run `doctoc`. The ToC got outdated.
Use consistent style: `#` headers, no extra newlines, `-` lists.
I hope this is the rigth style, this is how the majority of the file looked like.
@deltaidea
Copy link
Contributor Author

deltaidea commented Jul 30, 2017

I apologize in advance, this will require at least one of @Hardmath123 and @tjvr to go through each commit and review the changes. Things to look for:

  • terminology (I'm not sure I got the difference between "rule" and "nonterminal" right),
  • style (are explanations too technical or too vague, should code samples be formatted differently)
  • typos and spelling (English is not my first language)

The whole PR can be squashed before merging, so expect small focused commits and force pushes (I'll tell if there're any sneaky changes).

@deltaidea
Copy link
Contributor Author

deltaidea commented Jul 30, 2017

Notes so far:

  • I used ini syntax for fenced code blocks with comments. INI also has comments starting with # and it highlights everything else in a sane way. I tried python and coffeescript and they mark -> as invalid or highlight stuff inconsistently.
  • I used ES6 import in examples. Do you want me to drop it and go back to require?

@deltaidea
Copy link
Contributor Author

@Hardmath123
Using moo is recommended and obviously easier than writing a custom lexer. What do you think about moving custom tokenizing stuff to tokens.md and only showing basics with moo in the readme?

This would streamline the docs, make them easier to digest all at once and not be as intimidating.

@tjvr
Copy link
Collaborator

tjvr commented Aug 3, 2017

Thank you for working on this! :D This is definitely something that needs doing, and we keep not getting round to.

We'll definitely need to review this carefully, as you said. I suspect @Hardmath123 will have strong opinions :-)

What do you think about moving custom tokenizing stuff to tokens.md and only showing basics with moo in the readme?

I'm in favour.

@deltaidea
Copy link
Contributor Author

16fbd03 adds a snippet for compiling grammars on client-side. Do you think it's a good idea to add this feature directly to nearley instead? In a separate PR, later.

@kach
Copy link
Owner

kach commented Aug 3, 2017

I suspect @Hardmath123 will have strong opinions :-)

…not sure how I should feel that you said that; especially since you're probably right…

What do you think about moving custom tokenizing stuff to tokens.md and only showing basics with moo in the readme?

I propose something more radical: currently, the README is a behemoth of a tutorial/documentation/README crossover, while the webpage is what convinces you to use nearley. I think they should be swapped. The README should contain most of the contents of index.html (and some other stuff). /www should contain the docs, split up into chunks that make sense:

  • Tutorial
  • Using lexers
  • nearley on the front-end
  • Performance tips
  • etc.

There's probably a way to coax github into rendering those markdown documents into a nice self-contained webpage, perhaps with Jekyll or something.

@kach
Copy link
Owner

kach commented Aug 3, 2017

I used ES6 import in examples. Do you want me to drop it and go back to require?

Yes, please. We can add a note saying the import statement can also be used, but let's stick with the style nearley was written in for documentation purposes.

613ad00 readme: make all example code more readable, update to ES6

See above. Shouldn't be hard to revert.

@kach
Copy link
Owner

kach commented Aug 3, 2017

16fbd03 adds a snippet for compiling grammars on client-side. Do you think it's a good idea to add this feature directly to nearley instead? In a separate PR, later.

What is the use-case? I can't think of anything besides the "try nearley" page.

@deltaidea deltaidea changed the title Improve docs [WIP] Improve the docs Aug 4, 2017
@tjvr
Copy link
Collaborator

tjvr commented Aug 4, 2017

Do you think it's a good idea to add this feature directly to nearley instead?

At some point we should probably rework the Nearley API, but that's certainly a v3 thing.

@deltaidea
Copy link
Contributor Author

README is a behemoth of a tutorial/documentation/README crossover, while the webpage is what convinces you to use nearley. I think they should be swapped.

That's a great thought! Still, this PR is already a beast, and not in a good sense, so let's cut the scope, finish what's originally in the top comment and just merge it already. This PR is not my smartest idea.

README.md Outdated
```

Alternatively, to use a generated grammar in a browser runtime, include the
`nearley.js` file in a `<script>` tag.
Add a script to `scripts` in `package.json` that runs the command above if you only have a locally installed copy of nearley.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Collaborator

@tjvr tjvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really substantial piece of work! :D Thanks so much @deltaidea for taking this on. I've had a read through your changes to the README and made some (mostly minor) comments. Thanks again!

README.md Outdated
random strings that match your grammar.
- `nearley-railroad` generates pretty railroad diagrams from your parser. This
is mainly helpful for creating documentation, as (for example) on json.org.
See [Using in frontend](docs/using-in-frontend.md) for instructions on how to use nearley in your browser code.

You can uninstall the nearley compiler using `npm uninstall -g nearley`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop this line.

README.md Outdated
random strings that match your grammar.
- `nearley-railroad` generates pretty railroad diagrams from your parser. This
is mainly helpful for creating documentation, as (for example) on json.org.
See [Using in frontend](docs/using-in-frontend.md) for instructions on how to use nearley in your browser code.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good link! But let's move this sentence to the bottom of usage, and make clear that this is an unusual thing to do.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, to clarify, using a generated parser and nearley.js is a totally reasonable and common thing to do in the browser. Instructions on doing this should be prominent.

Using the nearley compiler in the browser is whacky and shouldn't be mentioned here.

README.md Outdated
classes at universities, as well as [file format parsers](https://github.com/raymond-h/node-dmi),
[markup languages](https://github.com/bobbybee/uPresent) and
[complete programming languages](https://github.com/bobbybee/carbon).
It's an npm [staff pick](https://www.npmjs.com/package/npm-collection-staff-picks).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hardmath123 I think this should be a list. Looks better that way. :-)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also should mention a better set of projects. :-)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, I think the markup language should be Idyll and the complete language should be https://github.com/sizigi/lp5562 or something.

README.md Outdated

- A *terminal* is a string or a token. E.g. keyword `"if"` is a terminal.
- A *nonterminal* is a combination of terminals and other nonterminals. E.g. an if statement defined as `"if" condition statement` is a nonteminal.
- A *rule* (or production rule) is a definition of a nonterminal. E.g. `"if" condition statement` is the rule according to which the if statement nonterminal is parsed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hardmath123 Explaining what a CFG rule looks like is hard, but I think it's worth making sure we get this right.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embarrassing story: I started a glossary a while back, and accidentally committed it at some point.

https://github.com/Hardmath123/nearley/blob/master/glossary.md

Some things in it might be wrong, so we should probably remove it at some point. But, it's a good starting-point if we want to, as Tim says, "make sure we get this right".

README.md Outdated
want to give fancy error messages for runtime errors.
- `data: Array` - array with the parsed parts of the rule. It will always be an array, even if there's only one part. If a rule contains nonterminals with their own postprocessors, the respective parts will already be transformed.
- `location: number` - the index (zero-based) at which the rule match starts. It's useful to retain this information in the syntax tree if you're writing an interpreter.
- `reject: Object` - return this object to signal that this rule doesn't actually match. This allows you to restrict or conditionally support language features.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably mention this here. (cf. #258 (comment))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should probably warn people to avoid reject if possible. @Hardmath123

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, "restrict or conditionally support language features" isn't the use-case here. The use-case is edge conditions like "I want [a-z]+ to match variables, EXCEPT for the keyword if…"

Many (but not all) of these get fixed by using a lexer.

README.md Outdated
```js
var grammar = require("generated-code.js");
var nearley = require("nearley");
You can still use raw strings, but they will only match full tokens parsed by Moo:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: this is (or was) actually a misfeature, but, hey, this isn't the place to discuss that. (Using raw strings is really convenient, but it interacted badly with moo's support for capture groups. Fortunately, we've removed those from Moo! :-) )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...except we added back value transforms. I should really fix that...

The `nearley.Parser` constructor takes an optional third parameter, `options`,
which is an object with the following possible keys:
- [Best practices for writing grammars](docs/how-to-grammar-good.md)
- [Custom tokens and lexers, parsing arbitrary arrays instead of strings](docs/custom-tokens-and-lexers.md)
Copy link
Collaborator

@tjvr tjvr Aug 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing the feed(array) bit in the linked file? @deltaidea

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, no, we just never documented that properly in the first place.

README.md Outdated
very large), so we recommend leaving this as `false` unless you are familiar
with the Earley parsing algorithm and are planning to do something exciting
with the parse table.
## Recipies
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo :-)

README.md Outdated
@@ -410,27 +407,22 @@ parser.

This was previously called `bin/nearleythere.js` and written by Robin.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hardmath123 Can we drop this sentence now? :-)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess if you want.

README.md Outdated
@@ -496,19 +484,16 @@ Webpack users can use
[nearley-loader](https://github.com/kozily/nearley-loader) by Andrés Arana to
load grammars directly.

## Still confused?
Copy link
Collaborator

@tjvr tjvr Aug 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should probably move up nearer to Recipes. @deltaidea @Hardmath123

written in *itself* (this is called bootstrapping).
## Introduction

nearley compiles grammar definitions from a simple syntax resembling [BNF](https://en.wikipedia.org/wiki/Backus–Naur_form) to a JS representation.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduction should also describe why people like nearley: it's user-friendly, easy to get started, works on both the browser and node, etc.

I can take care of this change later though, don't worry about it.

nearley compiles grammar definitions from a simple syntax resembling [BNF](https://en.wikipedia.org/wiki/Backus–Naur_form) to a JS representation.
You pass that representation to the nearley's tiny runtime, feed it data, and get the results.

nearley uses the Earley parsing algorithm with Joop Leo's optimizations to parse complex data structures easily.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "parse complex data structures" is not quite what we wanted to say, even in the original README. This is a good opportunity to find better wording. Complex languages, perhaps?

README.md Outdated

To use nearley, you need both a *global* and a *local* installation. The two
types of installations are described separately below.
nearley is published as an [NPM](https://docs.npmjs.com/getting-started/what-is-npm) package compatible with [Node.js](https://nodejs.org/en/) and browsers.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Most browsers! Even some old ones. :-)

README.md Outdated
> var nearley = require("nearley");
> var grammar = require("./my-generated-grammar.js");

Check out the wonderful [nearley playground](https://omrelli.ug/nearley-playground/) to explore nearley interactively in your browser.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved to a more prominent location? "NOTE: You can follow along by using the nearley playground, an online interface for exploring nearley grammars in your browser."

README.md Outdated
```

See below for detailed API and grammar syntax specification.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace "below" with internal links to sections.

README.md Outdated

will match 0 or more `cow`s in a row.
If you would like to support a different language, feel free to file a PR!

### Charsets

You can use valid RegExp charsets in a rule:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*in scannerless mode

- `keepHistory` (boolean, default `false`) - whether to preserve and expose the internal state
- `lexer` (object) - custom lexer, overrides `@lexer` in the grammar

If you are familiar with the Earley parsing algorithm and are planning to do something exciting with the parse table, set `keepHistory`:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't the initial motivation for keepHistory rewinding? @tjvr

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we now have save() for that purpose.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: document save/restore.

@@ -0,0 +1,37 @@
# Using nearley in browsers

Use a tool like [Webpack](https://webpack.js.org/) or [Rollup](https://rollupjs.org/) to include the `nearley` NPM package in your browser code.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, doesn't nearley work fine if you just, y'know, include nearley.js?

Let's be very careful making the distinction between parsing with nearley and compiling with nearleyc. This file should be called compiling nearley grammars in browsers or something!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don’t see a need to mention bundlers here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I thought we were still in README! This is fine as-is I think. (Ignoring the fact that I personally prefer browserify :-) )

@kach
Copy link
Owner

kach commented Aug 8, 2017

I made a rough pass too.

@deltaidea
Copy link
Contributor Author

Thanks for the insight, guys! This is really helpful for my understanding. I'm very busy at the moment. Maybe I'll have some time on the weekend. Feel free to fix stuff on your own if you like.

@tjvr
Copy link
Collaborator

tjvr commented Aug 8, 2017

@tjvr can still push to this PR since he's a collaborator

I'm confused what you mean by this. Should I merge your PR to ihatewriting and then tweak it, or what? :-)

@deltaidea
Copy link
Contributor Author

I mean both of you can just checkout this PR and push to it directly since the work is slow and there're no conflicts.

git fetch origin pull/254/head:improve-docs
git checkout improve-docs

tjvr added 2 commits August 8, 2017 13:25
this is a test commit. Can I commit to a PR? That seems odd.
@tjvr
Copy link
Collaborator

tjvr commented Aug 8, 2017

@deltaidea Oh, weird! I’m confused about why I have push access to your fork, but there we go. :-)

I fixed a bunch of my own review comments. @Hardmath123 wanted to rewrite some bits too, I think.

I agree we shouldn’t keep this open too much longer; this is already much improved from what we have! @deltaidea, at some point please could you:

  • have a look at the few comments left above
  • switch import -> require, but leave the code as ES6, I think that’s fine.

For now, leaving Tokenizers as part of the main README is fine. Leaving the documentation as markdown files in docs/ is also fine; @Hardmath123 is welcome to webify things later. :-)

@deltaidea
Copy link
Contributor Author

Thanks for the feedback and help!

Isn't require already used everywhere?

Regarding everything else, I'll have a proper look in an hour.

@deltaidea
Copy link
Contributor Author

Oh, weird! I’m confused about why I have push access to your fork, but there we go. :-)

GitHub has a checkbox called "Allow edits from maintainers" under every PR. It opens write access to the branch.

@tjvr
Copy link
Collaborator

tjvr commented Aug 8, 2017

Allow edits from maintainers

Ah, thanks! Never seen that before.

Looks like you have already replaced import with require -- thanks! :-)

tjvr added a commit that referenced this pull request Aug 9, 2017
@deltaidea's mega-PR. A squash of these commits:

Add some lexer stuff to README

readme: remove trailing spaces

readme: change the description to "Simple parsing in JavaScript"

nearley can run in browsers too.

readme: update the table of contents

Run `doctoc`. The ToC got outdated.

readme: make Markdown style consistent

Use consistent style: `#` headers, no extra newlines, `-` lists.
I hope this is the rigth style, this is how the majority of the file looked like.

readme: add an intro and usage guide for beginners

readme: reorder and reword parser specification

readme: make all example code more readable, update to ES6

docs: move `glossary.md` and `how-to-grammar-good.md` to `docs/`

docs: move making REPL and accessing parse table into their own files

These two topics are covered in the section "Using a parser", which doesn't make much sense.
The basic usage is already at the top of readme, so let's remove "Using a parser" completely.

docs: add a clear Moo example to readme, separate out custom lexers

docs: add a link in readme to "How to grammar good"

docs: add a complete example of usage in browsers

docs: use `require()` instead of `import`

readme: link to the indentation-aware lexer by nathan

docs: add an example of generating CST and AST

readme: move recipies to section "Recipies"

Make clear using `nearleyc` in a browser is unusual.

this is a test commit. Can I commit to a PR? That seems odd.

Fix a bunch of my review comments

:-)

adds link to nearley-gulp in README

Add doctoc script to package.json
@tjvr tjvr merged commit d479748 into kach:master Aug 9, 2017
@tjvr
Copy link
Collaborator

tjvr commented Aug 9, 2017

This PR was getting scary, so I went ahead and merged it! :D

Thanks so much for your help @deltaidea.

I'm gonna open a couple new issues for the rest--we can continue work in smaller PRs.

@deltaidea deltaidea changed the title [WIP] Improve the docs Improve the docs Aug 9, 2017
@deltaidea deltaidea deleted the improve-docs branch August 9, 2017 14:26
@danielo515
Copy link

Where is the context aware example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants