-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize (consolidate) tokens of the same kind #23
base: master
Are you sure you want to change the base?
Conversation
I think this is a reasonable change to consider because it affects only the It's an improvement in terms of reducing final HTML size, but it reduces how much control the underlying scanner has over the output. It makes sense to join the two separate tokens of Practically speaking, users of the Since you mentioned ligatures and I suspect you're using the scanner for code, you could also consider using a more specialized scanner rather than the generic I don't know if you need it for just one language, or many different ones. But if you need it for just Go, there's a package |
Thanks for your extensive explanation, Dmitri. I was just acting under impression that this package is used for pure visual highlighting (pretty-printing) of the code, and from that perspective two consequent chunks of the same style have exactly the same visual effect as one combined, so there's no need for two. Your point about borders is a good one, as in this case the rendering will differ, but then it looks like that this specific use case is only needed when one wants to visualize how text/scanner works itself, and not for pretty-printing of the code. So it definitely depends on how this package is supposed to be used, and you know that better, for sure. I'm currently using it to highlight only Go code, but I like how it gives good results on other code as well, so I was hoping that a generic highlighting package would benefit from this normalization. If you feel that the individual punctuation wrappers should stay for specific use cases, I can make this normalization optional (as I did with The Go-specific highlighting package that you're mentioning also produces code like |
I think it would be acceptable to make this always on for the But if you do that, it'll be tightly integrated with just this printer/annotator. An alternative for you to consider is to perform the same-consequent-token-unification as a higher level wrapper, so that it can be optionally applied on top of the printer here, or the one I have in |
Please correct me if I'm wrong, but the way this is currently implemented is already Printer- and Annotator- agnostic (at least this is how I interpret the package code). The consolidation is done in global |
Look at the signature of the func Print(s *scanner.Scanner, w io.Writer, p Printer) error The Similarly, func Annotate(src []byte, a Annotator) (annotate.Annotations, error) {
s := NewScanner(src) That's why I had to create similar |
You're right (and I agree that the ability to work with different scanners/printers interchangeably might then require a more high-level approach to the exposed API, including introducing some backward-incompatible changes). Maybe the package needs to define its own Speaking of normalization as a second step, I don't think it's effective (at least I don't immediately see a good API for that). Normalization needs to be performed on top of the tokenizer stream (not the final printed output) and based on internal All in all, if the scope of the package is considered to be a simple text/scanner-based highlighter that only renders HTML or does annotations for code highlighting purposes only, then I guess my code fits the purpose, as it now produces the more compact set of code/annotations. Otherwise, I guess we need to define what that scope is, and with that this PR might simply become out-of-scope or not fitting the expected use cases. |
This makes the output smaller and allows for font ligatures (like `<=`, `!=`, `==` and so on) to be displayed correctly
The way underlying text/scanner package works is it produces two different tokens for combined punctuation like
<=
,>=
,!=
,:=
,||
,::
and so on. For example,!=
would be represented in the highlighted output as follows:This is suboptimal in terms of final HTML size, and also prevents font ligatures from being displayed properly (see e.g. Fira Code font).
This PR implements token normalization so that adjacent tokens of the same kind are merged together. The example above now looks like this:
...and allows for proper ligature rendering.