-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parsing ranges problem #123
Comments
I think that
|
and |
So I think this is BUG |
Try |
@ltrzesniewski |
Or maybe https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_compile.c#L1973-L1985 |
See the doc:
This means that It's also not valid in Perl:
I suppose this is for backwards-compatibility reasons. Unicode didn't exist back when Perl was created, and all codepoints were a single byte. |
@ltrzesniewski |
Back in the 1960's, when IBM invented bytes, it seemed like a great idea to have "one unit of storage" = "one character", and who needs more than 255 characters? Before IBM's 360 range, computers had all sorts of different lengths of storage unit. The IBM 7090 range had 36-bit words, the Ferranti Atlas used 48 bits (with addressable "half-words" of 24-bits), and the PDP-8, which post-dates the 360 series, had 12-bit words. Memory was small and expensive, so character strings had to be bit-stuffed into words. This was no longer needed when bytes came along, but if they had only chosen 16 rather than 8 bits we might have managed for longer before needing UTF. :-) Back in the early days octal was used to represent numbers when thinking about bits. It has just occurred to me that the move to mostly hexadecimal might have been caused by the hardware unit changing to a multiple of 4 rather than 3 bits. Anyway, it is certainly history and backwards compatibility that led to the current situation. |
Anybody understand this two ifs?
https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_compile.c#L3559-L3565
I have two regexps
(?:[\\xDFFB-\\xDFFE])
can compile(?:[\\x270A-\\x270D])
can't compileAnd it can't compile 2nd as
parsed_pattern[-2] > 0x27
0x41 > 0x27
that 0x41 came from last character from range start 'A' => 0x41
So any idea what this two ifs should handle ?
I can't find anything useful in git history
The text was updated successfully, but these errors were encountered: