-
-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use high5 as a new tokenizer #114
Comments
This sounds awesome. |
+5 |
When will this be available? |
The tokenizer currently lacks support for positions. As soon as that's added, a new version will become available. I have no idea when I'll have the time & be motivated to do it, so I can't give a timetable or anything. |
Is there any update on this? |
@HoldYourWaffle I'm not on the team, but parse5 is now the default parser. |
@stevenvachon The default parser for what? |
That would be cheerio. |
We can probably close this? |
Closing this as htmlparser2 should just keep its existing tokenizer. |
Lately, a lot of tokenization-related bugs have popped up, and even though the tree-building part of high5 isn't done, its tokenizer should be ready.
This will be the
4.0.0
release of this module and will break some code – especially since a newdoctype
callback will be introduced and XML declarations (eg.<?xml …>
) inside HTML documents will be handled as comments.On the plus side, this means that we've got a spec compliant tokenizer, so all tokenization bugs can be pointed to the spec.
The text was updated successfully, but these errors were encountered: