Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use high5 as a new tokenizer #114

Closed
fb55 opened this issue Nov 16, 2014 · 10 comments
Closed

Use high5 as a new tokenizer #114

fb55 opened this issue Nov 16, 2014 · 10 comments

Comments

@fb55
Copy link
Owner

fb55 commented Nov 16, 2014

Lately, a lot of tokenization-related bugs have popped up, and even though the tree-building part of high5 isn't done, its tokenizer should be ready.

This will be the 4.0.0 release of this module and will break some code – especially since a new doctype callback will be introduced and XML declarations (eg. <?xml …>) inside HTML documents will be handled as comments.

On the plus side, this means that we've got a spec compliant tokenizer, so all tokenization bugs can be pointed to the spec.

@AndreasMadsen
Copy link
Collaborator

This sounds awesome.

@stevenvachon
Copy link

+5

@mickeyckm
Copy link

When will this be available?

@fb55
Copy link
Owner Author

fb55 commented Feb 23, 2015

The tokenizer currently lacks support for positions. As soon as that's added, a new version will become available. I have no idea when I'll have the time & be motivated to do it, so I can't give a timetable or anything.

@HoldYourWaffle
Copy link

Is there any update on this?

@stevenvachon
Copy link

@HoldYourWaffle I'm not on the team, but parse5 is now the default parser.

@glen-84
Copy link

glen-84 commented Aug 3, 2019

@stevenvachon The default parser for what?

@fb55
Copy link
Owner Author

fb55 commented Aug 3, 2019

That would be cheerio. htmlparser2 is still shipped with the project and is used as the default parser when xmlMode: true

@stevenvachon
Copy link

We can probably close this?

@fb55
Copy link
Owner Author

fb55 commented Sep 1, 2020

Closing this as htmlparser2 should just keep its existing tokenizer.

@fb55 fb55 closed this as completed Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants