Skip to content

Commit b0a9363

Browse files
committed
add project
1 parent 6114c2a commit b0a9363

22 files changed

+931
-2
lines changed

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2022 The Python Packaging Authority
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

MANIFEST.in

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# include pyproject.toml
2+
3+
# Include the README
4+
include *.md
5+
6+
# Include the license file
7+
include LICENSE
8+
9+
# Include setup.py
10+
include setup.py
11+
12+
# Include the data files
13+
# recursive-include data *
14+
# recursive-include examples *
15+
recursive-include src *
16+
# recursive-include images *

README.md

+153-2
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,153 @@
1-
# ner-kit
2-
Rapidly extracting useful entities from text using various Python packages
1+
## Named Entity Recognition Toolkit
2+
3+
Provide a toolkit for rapidly extracting useful entities from text using various Python packages, including [Stanza](https://stanfordnlp.github.io/stanza/index.html).
4+
5+
### Features
6+
We try to bring the complicated use of existing NLP toolkits down to earth by keeping APIs as simple as possible with best practice.
7+
8+
### Installation
9+
```pip
10+
pip install ner-kit
11+
```
12+
13+
### Examples
14+
15+
Example 1: Word segmention
16+
```python
17+
from nerkit.StanzaApi import StanzaWrapper
18+
if __name__=="__main__":
19+
sw=StanzaWrapper()
20+
sw.download(lang="en")
21+
text='This is a test sentence for stanza. This is another sentence.'
22+
result1=sw.tokenize(text)
23+
sw.print_result(result1)
24+
```
25+
26+
Example 2: Chinese word segmentation
27+
```python
28+
from nerkit.StanzaApi import StanzaWrapper
29+
if __name__=="__main__":
30+
sw=StanzaWrapper()
31+
sw.download(lang="zh")
32+
text='我在北京吃苹果!'
33+
result1=sw.tokenize(text,lang='zh')
34+
sw.print_result(result1)
35+
```
36+
37+
Example 3: Multi-Word Token (MWT) Expansion
38+
```python
39+
from nerkit.StanzaApi import StanzaWrapper
40+
if __name__=="__main__":
41+
sw=StanzaWrapper()
42+
sw.download(lang="fr")
43+
text='Nous avons atteint la fin du sentier.'
44+
result1=sw.mwt_expand(text,lang='fr')
45+
sw.print_result(result1)
46+
```
47+
48+
Example 4: POS tagging
49+
```python
50+
from nerkit.StanzaApi import StanzaWrapper
51+
if __name__=="__main__":
52+
sw=StanzaWrapper()
53+
sw.download(lang='en')
54+
text='I like apple'
55+
result1=sw.tag(text)
56+
sw.print_result(result1)
57+
sw.download_chinese_model()
58+
text='我喜欢苹果'
59+
result2=sw.tag_chinese(text,lang='zh')
60+
sw.print_result(result2)
61+
```
62+
63+
Example 5: Named Entity Recognition
64+
```python
65+
from nerkit.StanzaApi import StanzaWrapper
66+
67+
if __name__=="__main__":
68+
sw=StanzaWrapper()
69+
70+
sw.download(lang='en')
71+
sw.download_chinese_model()
72+
73+
text_en = 'I like Beijing!'
74+
result1 = sw.ner(text_en)
75+
sw.print_result(result1)
76+
77+
text='我喜欢北京!'
78+
result2=sw.ner_chinese(text)
79+
sw.print_result(result2)
80+
81+
```
82+
83+
Example 6: Sentiment Analysis
84+
```python
85+
from nerkit.StanzaApi import StanzaWrapper
86+
87+
if __name__=="__main__":
88+
sw=StanzaWrapper()
89+
text_en = 'I like Beijing!'
90+
result1 = sw.sentiment(text_en)
91+
sw.print_result(result1)
92+
93+
text_zh='我讨厌苹果!'
94+
result2=sw.sentiment_chinese(text_zh)
95+
sw.print_result(result2)
96+
```
97+
98+
Example 7: Language detection from text
99+
```python
100+
from nerkit.StanzaApi import StanzaWrapper
101+
if __name__=="__main__":
102+
sw=StanzaWrapper()
103+
list_text = ['I like Beijing!','我喜欢北京!', "Bonjour le monde!"]
104+
result1 = sw.lang(list_text)
105+
sw.print_result(result1)
106+
```
107+
108+
Example 8: Language detection from text with a user-defined processing function
109+
```python
110+
from nerkit.StanzaApi import StanzaWrapper
111+
if __name__=="__main__":
112+
sw=StanzaWrapper()
113+
list_text = ['I like Beijing!','我喜欢北京!', "Bonjour le monde!"]
114+
def process(model):# do your own business
115+
doc=model["doc"]
116+
print(f"{doc.sentences[0].dependencies_string()}")
117+
result1 = sw.lang_multi(list_text,func_process=process,download_lang='en,zh,fr')
118+
print(result1)
119+
sw.print_result(result1)
120+
```
121+
122+
Example 9: Stanza's NER (Legacy use for Java-based Stanford CoreNLP)
123+
```python
124+
from nerkit.StanzaApi import *
125+
# First, set environment variable CORENLP_HOME to the CoreNLP folder
126+
corenlp_root_path=r"stanford-corenlp-4.3.2"
127+
text="我喜欢游览广东孙中山故居景点!"
128+
list_token=get_entity_list(text,corenlp_root_path=corenlp_root_path,language="chinese")
129+
for token in list_token:
130+
print(f"{token['value']}\t{token['pos']}\t{token['ner']}")
131+
```
132+
133+
Example 10: Stanford CoreNLP (Not official version)
134+
```python
135+
import os
136+
from nerkit.StanfordCoreNLP import get_entity_list
137+
text="我喜欢游览广东孙中山故居景点!"
138+
current_path = os.path.dirname(os.path.realpath(__file__))
139+
res=get_entity_list(text,resource_path=f"{current_path}/stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2")
140+
print(res)
141+
for w,tag in res:
142+
if tag in ['PERSON','ORGANIZATION','LOCATION']:
143+
print(w,tag)
144+
```
145+
146+
### Credits & References
147+
148+
- [Stanza](https://stanfordnlp.github.io/stanza/index.html)
149+
- [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/)
150+
151+
### License
152+
The `ner-kit` project is provided by [Donghua Chen](https://github.com/dhchenx).
153+

dist/ner-kit-0.0.1.tar.gz

14.9 KB
Binary file not shown.

dist/ner-kit-0.0.1a3.tar.gz

14.8 KB
Binary file not shown.

dist/ner_kit-0.0.1-py3-none-any.whl

12.3 KB
Binary file not shown.

dist/ner_kit-0.0.1a3-py3-none-any.whl

12.3 KB
Binary file not shown.

setup.cfg

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
[metadata]
2+
# This includes the license file(s) in the wheel.
3+
# https://wheel.readthedocs.io/en/stable/user_guide.html#including-license-files-in-the-generated-wheel-file
4+
license_files = LICENSE

0 commit comments

Comments
 (0)