-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix PEP 0 name parsing #1386
Closed
Closed
Fix PEP 0 name parsing #1386
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
545bea8
Fix name parsing in PEP 0
AA-Turner 90bbb4c
Fixes as per comments
AA-Turner a1013ce
Move to author metadata lookup for PEP index
AA-Turner 552a7b6
Move CSV to comma separated
AA-Turner 7a0b5b5
Fix Mark Williams
AA-Turner 3c6520d
Rollback name parsing changes and move to using author exception file…
AA-Turner ee33701
Move more special cases to exceptions file
AA-Turner efdaf15
python-dev nickname
AA-Turner 8f9db05
Add duplicate names and de-duping logic
AA-Turner File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
Full Name, Surname First, Name Reference | ||
Ernest W. Durbin III, "Durbin, Ernest W., III", Durbin | ||
Inada Naoki, "Inada, Naoki", Inada | ||
Guido van Rossum, "van Rossum, Guido (GvR)", GvR | ||
Just van Rossum, "van Rossum, Just (JvR)", JvR | ||
The Python core team and community, The Python core team and community, python-dev | ||
P.J. Eby, "Eby, Phillip J.", Eby | ||
Greg Ewing, "Ewing, Gregory", Ewing | ||
Jim Jewett, "Jewett, Jim J.", Jewett | ||
Nathaniel Smith, "Smith, Nathaniel J.", Smith | ||
Martin v. Löwis, "von Löwis, Martin", von Löwis |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tough, a name like
R. David Murray
should giveMurray
as a surname for the PEP index.That said, a name written in Japanese English convention like
INADA Naoki
should returnINADA
as surname.In short, it’s not generally possible to parse world names into US categories of «first», «middle», «last»
I guess we have to accept imperfect for now, add special cases when we notice problems (what PEP will add the first
Name MacName Sr
🙂), and maybe someday rework the system to have the proper way: metadata (or some author index dict) should include full name and short name for PEP 0.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Luckily (?!) in PEP 458
R. David Murray
is listed without the full stop in his first initial, so the name is still parsed correctly. I don't think there's a good solution forINADA Naoki
beyond adding a special-case exceptionOr UK, Australia, Canada etc 😜. But I get your point. I'm reminded of this post about names - hopefully #40 doesn't apply to us...
I think that your last suggestion having some sort of lookup table is probably the best solution, as in all the PEPs there are still only a relativley small number of authors (248) - it's quite late here so will add that feature tommorow. It also keeps special cases etc. out of the code to keep it from becoming knobbly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be OK if this PEP fixed the most egregious cases (
III
) without covering 100% of possibilities 🙂There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gives me something to do! Latest commit adds such a metadata lookup and therefore simplifies the name parsing code.
This should make it so that names can be correctly entered into AUTHORS.csv and PEP 0 will reflect this. I've also identified some duplicate entries (e.g.
P.J. Eby
&Phillip J. Eby
,Greg
andGregory
Ewing,Jim J. Jewett
&Jim Jewett
,Martin v. Löwis
&Martin von Löwis
). Is it acceptable to modify PEP headers to canonicalise these names?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would add multiple entries to the data file rather than editing historical documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest data file (exceptions rather than full mapping) doesn’t de-duplicate these entries, should it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data file is checked first (in init), so unsure where duplicates would propogate from?
Always good to be preventative but not sure I understand this one, sorry!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe my comment doesn’t make sense!
I don’t have a clear picture of the current behaviour of the code, so I wondered if the change from full data file to exceptions data file did preserve the feature you added of normalizing the duplicate names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! I forgot to add that back in, you're right - have done so now (only adding the less used variant and mapping it to the 'cannonical' variant, to keep the file smaller)