-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Deal with greek letter "sigma" when return offset_mapping #2897
Conversation
@@ -1363,7 +1363,13 @@ def get_offset_mapping(self, text): | |||
if token in self.all_special_tokens: | |||
token = token.lower() if hasattr( | |||
self, "do_lower_case") and self.do_lower_case else token | |||
start = text[offset:].index(token) + offset | |||
# Deal with special greek letter with 2 forms (sigma) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以加个特殊处理的说明,例如:
“The greek letter "sigma" has 2 types of lowercase. When used at the end of a letter-case word (one that does not use all caps), the final form (ς) is used. Otherwise, the form (σ) is used”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
PR types
Bug fixes
PR changes
Models
Description
Fix #2854