Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the incidence of infinite loops while case folding #160

Merged
merged 2 commits into from
Aug 29, 2023

Conversation

mszabo-wikia
Copy link
Contributor

dc7d6e5 unfortunately increases the
incidence of infinite loops during case folding if re2j is running on a
JVM newer than the version used to generate the bundled
UnicodeTables.java and the input contains a rune that would require
special case folding rules to form a closed fold loop. \u1C80 (Cyrillic
Small Letter Rounded Ve) is an example of such a rune.

Workaround the issue by inverting the order of parameters passed to
equalsIgnoreCase() so that the rune from the pattern being matched,
rather than the input content, undergoes case folding instead. This does
not fully eliminate the possibility of an infinite loop in this
scenario, since the pattern may well contain one of the problematic
runes, but it effectively restores the situation as it was pre
dc7d6e5, since the previous logic also
performed case folding on the rune from the pattern and not on the
content.

dc7d6e5 unfortunately increases the
incidence of infinite loops during case folding if re2j is running on a
JVM newer than the version used to generate the bundled
UnicodeTables.java and the input contains a rune that would require
special case folding rules to form a closed fold loop. \u1C80 (Cyrillic
Small Letter Rounded Ve) is an example of such a rune.

Workaround the issue by inverting the order of parameters passed to
equalsIgnoreCase() so that the rune from the pattern being matched,
rather than the input content, undergoes case folding instead. This does
not fully eliminate the possibility of an infinite loop in this
scenario, since the pattern may well contain one of the problematic
runes, but it effectively restores the situation as it was pre
dc7d6e5, since the previous logic also
performed case folding on the rune from the pattern and not on the
content.

Signed-off-by: Máté Szabó <mszabo@fandom.com>
@codecov-commenter
Copy link

codecov-commenter commented Jul 19, 2022

Codecov Report

Merging #160 (cd47147) into master (9b3f052) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #160   +/-   ##
=======================================
  Coverage   89.07%   89.07%           
=======================================
  Files          19       19           
  Lines        3038     3038           
  Branches      619      619           
=======================================
  Hits         2706     2706           
  Misses        189      189           
  Partials      143      143           
Files Changed Coverage Δ
java/com/google/re2j/Unicode.java 68.75% <ø> (ø)
java/com/google/re2j/Inst.java 80.85% <100.00%> (ø)

@mszabo-wikia
Copy link
Contributor Author

mszabo-wikia commented Jul 19, 2022

I looked at #104 and it seems like the proper solution will be to generate this data on startup as noted in the comments there.

@sjamesr sjamesr merged commit 10ba78d into google:master Aug 29, 2023
@sjamesr
Copy link
Contributor

sjamesr commented Aug 29, 2023

Thank you for the fix, I'm working on reducing/removing the need for separate Unicode tables in RE2J. In the meantime, I'll cut a release with this fix.

@mszabo-wikia
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants