Fix word break when the first character of token is multibyte #753
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello, This is my first PR for RmlUi so I wanted to say thanks for a great library :)
I use RmlUi to display "HTML-like" documents and I noticed instability in some documents with multi byte chars. In release mode there was an infinite loop and in debug a message appeared:
The error occurs when word break is enabled and the first character in the processed token is a multi-byte character, e.g. Polish characters such as "ś" or "ć".
In this case, the character returned for
token_begin
andtoken_begin + i
byStringUtilities::SeekBackwardUTF8()
may be the same char. I fixed that by stopping iteration whenpartial_string_end == token_begin
. I don't know this is the best solution but it works (no inifinite loop/assertion fail) and text is appropriately divided into lines.Example document that shows the bug
It is quite easy to cause the issue by creating a table filled with the characters ść :
Screen from rmlui fixed by this PR:

Simplified rml doc: