Skip to content

Fix word break when the first character of token is multibyte #753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 16, 2025

Conversation

and3md
Copy link
Contributor

@and3md and3md commented Apr 11, 2025

Hello, This is my first PR for RmlUi so I wanted to say thanks for a great library :)

I use RmlUi to display "HTML-like" documents and I noticed instability in some documents with multi byte chars. In release mode there was an infinite loop and in debug a message appeared:

rmlui_bug

The error occurs when word break is enabled and the first character in the processed token is a multi-byte character, e.g. Polish characters such as "ś" or "ć".
In this case, the character returned for token_begin and token_begin + i by StringUtilities::SeekBackwardUTF8() may be the same char. I fixed that by stopping iteration when partial_string_end == token_begin. I don't know this is the best solution but it works (no inifinite loop/assertion fail) and text is appropriately divided into lines.

Example document that shows the bug

It is quite easy to cause the issue by creating a table filled with the characters ść :

Screen from rmlui fixed by this PR:
obraz

Simplified rml doc:

<!DOCTYPE html>
<rml lang="pl">
	<head>
		<meta charset="UTF-8"/>
		<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
		<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
		<title>Instalacje sanitarne i sieci zewnętrzne</title>
		<style>

.txt-centered {
		text-align: center;
		vertical-align: middle;
}

body {
    display: block;
    font-family: Roboto;
    font-size: 14dp;
    text-align: left;
    color:rgb(246, 252, 255);
    background:rgb(46, 48, 49);
    position: relative;
    top: 0dp;
    left: 0dp;
    overflow: scroll;
    height: 100vh;
}

div {
    display: block;
    position: relative;
}

/*
    Default styles for all the basic elements.
*/

em
{
    font-style: italic;
}

p {
    display: block;
    padding: 2dp;
{

strong
{
    font-weight: bold;
}

select
{
    text-align: left;
}

tabset tabs
{
    display: block;
}

table
{
    box-sizing: border-box;
    display: table;
    border: 0px rgb(113, 121, 121);
    border-collapse: collapse;
    border-width: 0px 0px 1dp 1dp;
}
tr
{
    box-sizing: border-box;
    display: table-row;
}
td
{
    box-sizing: border-box;
    display: table-cell;
    border: 0px rgb(113, 121, 121);
    border-width: 1dp 1dp 0 0;
    padding: 2dp;
    min-width: 10dp;
    word-break: break-word;
}
col
{
    box-sizing: border-box;
    display: table-column;
    min-width: 10dp;
}
colgroup
{
    display: table-column-group;
}
thead, tbody, tfoot
{
    display: table-row-group;
}


</style>
	</head>
	<body>
		<div id="content">
			<h5 id="IT_10_05">Tablica 0001</h5>
			<table>
				<tr>
					<td class="txt-centered">Lp.</td>
					<td class="txt-centered">title1</td>
					<td colspan="15" class="txt-centered">title2</td>
				</tr>
				<tr>
					<td class="txt-centered">01</td>
					<td class="txt-centered">02</td>
					<td class="txt-centered">03</td>
					<td class="txt-centered">04</td>
					<td class="txt-centered">05</td>
					<td class="txt-centered">06</td>
					<td class="txt-centered">07</td>
					<td class="txt-centered">08</td>
					<td class="txt-centered">09</td>
					<td class="txt-centered">10</td>
					<td class="txt-centered">11</td>
					<td class="txt-centered">12</td>
					<td class="txt-centered">13</td>
					<td class="txt-centered">14</td>
					<td class="txt-centered">15</td>
					<td class="txt-centered">16</td>
					<td class="txt-centered">17</td>
				</tr>
				<tr>
					<td class="txt-centered">1</td>
					<td>śśśśćśśśśśććść</td>
					<td class="txt-centered">28</td>
					<td class="txt-centered">32</td>
					<td class="txt-centered">36</td>
					<td class="txt-centered">41</td>
					<td class="txt-centered">48</td>
					<td class="txt-centered">55</td>
					<td class="txt-centered">64</td>
					<td class="txt-centered">78</td>
					<td class="txt-centered">91</td>
					<td class="txt-centered">111</td>
					<td class="txt-centered">129</td>
					<td class="txt-centered">160</td>
					<td class="txt-centered">198</td>
					<td class="txt-centered">242</td>
					<td class="txt-centered">295</td>
				</tr>
			</table>
			<p/>
		</div>
	</body>
</rml>

@mikke89 mikke89 added the bug Something isn't working label Apr 12, 2025
@mikke89
Copy link
Owner

mikke89 commented Apr 12, 2025

Hey, first of all, thanks a lot for taking the time to make a pull request. And thank you for the kind words :)

It was easy for me to reproduce the issue you are describing. I think you are definitely onto something here with the suggested fix. I want to tinker a bit with it first, the +2 in particular seems a bit suspicious to me. Edit: I realize this follows the existing logic further down, I think this can be simplified quite a bit though.

@mikke89
Copy link
Owner

mikke89 commented Apr 12, 2025

I looked a bit more into this one. I figured we could make some improvements to simplify that loop. So I pushed some changes directly to this PR here. The loop had some issues with character boundaries (originally from before your changes), and it should be a bit faster now too with multi-byte characters.

What do you think, does it look correct to you? It would be great if you could test it with your documents.

@and3md
Copy link
Contributor Author

and3md commented Apr 14, 2025

What do you think, does it look correct to you? It would be great if you could test it with your documents.

I'd love to check it out, but I need some time as I won't be able to do it until Wednesday.

@and3md
Copy link
Contributor Author

and3md commented Apr 16, 2025

I've tested your changes on multiple (about 200) documents and they work great. Thank you!

@mikke89 mikke89 merged commit ba9775e into mikke89:master Apr 16, 2025
32 checks passed
@mikke89
Copy link
Owner

mikke89 commented Apr 16, 2025

Great, thanks for following up and testing this. I merged it just now, thanks again for the pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants