Use high5 as a new tokenizer #114

fb55 · 2014-11-16T13:38:10Z

Lately, a lot of tokenization-related bugs have popped up, and even though the tree-building part of high5 isn't done, its tokenizer should be ready.

This will be the 4.0.0 release of this module and will break some code – especially since a new doctype callback will be introduced and XML declarations (eg. <?xml …>) inside HTML documents will be handled as comments.

On the plus side, this means that we've got a spec compliant tokenizer, so all tokenization bugs can be pointed to the spec.

The text was updated successfully, but these errors were encountered:

AndreasMadsen · 2014-11-16T13:40:15Z

This sounds awesome.

stevenvachon · 2015-01-16T02:34:33Z

+5

mickeyckm · 2015-02-23T15:04:09Z

When will this be available?

fb55 · 2015-02-23T15:54:59Z

The tokenizer currently lacks support for positions. As soon as that's added, a new version will become available. I have no idea when I'll have the time & be motivated to do it, so I can't give a timetable or anything.

HoldYourWaffle · 2019-04-13T15:02:07Z

Is there any update on this?

stevenvachon · 2019-04-14T22:08:31Z

@HoldYourWaffle I'm not on the team, but parse5 is now the default parser.

glen-84 · 2019-08-03T20:29:54Z

@stevenvachon The default parser for what?

fb55 · 2019-08-03T20:38:20Z

That would be cheerio. htmlparser2 is still shipped with the project and is used as the default parser when xmlMode: true

stevenvachon · 2019-08-03T21:28:52Z

We can probably close this?

fb55 · 2020-09-01T14:45:54Z

Closing this as htmlparser2 should just keep its existing tokenizer.

rspieker mentioned this issue Nov 28, 2014

On processing instruction event #96

Closed

fb55 mentioned this issue Dec 12, 2014

Support for ondoctype callback #116

Closed

fb55 mentioned this issue Jan 11, 2015

Carefully constructed markup sneaks tags through as "text" #105

Closed

fb55 mentioned this issue Feb 25, 2015

CDATA parsing is not correct when it is not in xmlMode #119

Closed

fb55 mentioned this issue Apr 2, 2015

opening tag fix based on html syntax standard #122

Closed

zetxx mentioned this issue Apr 2, 2015

htmlparser2 issue will not be resolved soon marko-js/marko#57

Closed

fb55 mentioned this issue Apr 6, 2015

Adds a eagerTextCapture option. #124

Closed

oriSomething mentioned this issue Apr 3, 2016

nested <a> tags (and <button> tags) #170

Closed

fb55 closed this as completed Sep 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use high5 as a new tokenizer #114

Use high5 as a new tokenizer #114

fb55 commented Nov 16, 2014

AndreasMadsen commented Nov 16, 2014

stevenvachon commented Jan 16, 2015

mickeyckm commented Feb 23, 2015

fb55 commented Feb 23, 2015

HoldYourWaffle commented Apr 13, 2019

stevenvachon commented Apr 14, 2019

glen-84 commented Aug 3, 2019

fb55 commented Aug 3, 2019

stevenvachon commented Aug 3, 2019

fb55 commented Sep 1, 2020

Use high5 as a new tokenizer #114

Use high5 as a new tokenizer #114

Comments

fb55 commented Nov 16, 2014

AndreasMadsen commented Nov 16, 2014

stevenvachon commented Jan 16, 2015

mickeyckm commented Feb 23, 2015

fb55 commented Feb 23, 2015

HoldYourWaffle commented Apr 13, 2019

stevenvachon commented Apr 14, 2019

glen-84 commented Aug 3, 2019

fb55 commented Aug 3, 2019

stevenvachon commented Aug 3, 2019

fb55 commented Sep 1, 2020