-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Thai word list from Thai Wikipedia titles #869
Conversation
for clarification
function to return frozen set of wikipedia titles. black used.
Review please |
obeying to code rules
Looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code, docs, license, test. All looks good.
Once sorted out the order of module imports, I can approve this.
pythainlp/corpus/corpus_license.md
Outdated
## Wikipedia Titles | ||
Corpus of Wikipedia titles (wikipedia_titles.txt) was processed by konbraphat51 (https://github.com/konbraphat51/Thai_Dictionary_Cleaner/tree/main) | ||
|
||
The original data is thwiki-latest-all-titles.gz of https://dumps.wikimedia.org/thwiki/latest/ which Wikipedia.org has created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind to add date of the Wikipedia data here please?
The date that you have downloaded the data for your preparation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's definitely required. Added!
Okey, review again please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It look great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
Ok, handled with the conflict! |
For standard licenses, like Creative Commons, just link to the license URL. No need to put the license text inside this file.
- sort imports - clean up test structure
Kudos, SonarCloud Quality Gate passed!
|
Merged. Thank you. |
What does this changes
Add an optional corpus of Wikipedia titles.
Fixes #858
Your checklist for this pull request
🚨Please review the guidelines for contributing to this repository.