You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following is from a customer report; I can reproduce it as described, and the proximate issue seems to be that the non-matching characters are not in fact UTF-16 singletons, but rather combined surrogate pairs. (For context: BBEdit's backing store is UTF-16, so it's using pcre2-16.)
I'm not sure whether this constitutes a bug, or an enhancement request; if \p{…} is intended to match surrogate pairs for any given character class, then I guess it's a bug; otherwise it would be an enhancement request. :-)
I've attached the customer's supplied file directly, but here is (substantially) the contents of same:
===
The following are regular characters. BBEdit regex \p{Han} and . can find them.
刨
炸
灠
The following are surrogate-pair characters. BBEdit regex \p{Han} currently (v. 14.1) cannot find them. Using regex . BBEdit finds each individual codepoint separately, but not the whole character (both codepoints) at once.
The following is from a customer report; I can reproduce it as described, and the proximate issue seems to be that the non-matching characters are not in fact UTF-16 singletons, but rather combined surrogate pairs. (For context: BBEdit's backing store is UTF-16, so it's using pcre2-16.)
I'm not sure whether this constitutes a bug, or an enhancement request; if
\p{…}
is intended to match surrogate pairs for any given character class, then I guess it's a bug; otherwise it would be an enhancement request. :-)I've attached the customer's supplied file directly, but here is (substantially) the contents of same:
===
The following are regular characters. BBEdit regex
\p{Han}
and.
can find them.The following are surrogate-pair characters. BBEdit regex
\p{Han}
currently (v. 14.1) cannot find them. Using regex.
BBEdit finds each individual codepoint separately, but not the whole character (both codepoints) at once.===
For BareBones Software 20220312-TST — Sample files of regular and surrogate-pair characters.md
.
The text was updated successfully, but these errors were encountered: