Skip to content

Commit 785bed5

Browse files
committed
Fix tokenization of Korean chars fix: #1877
1 parent beab770 commit 785bed5

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

hanlp/components/tokenizers/transformer.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ def tag_to_span(self, batch_tags, batch: dict):
147147
tags[i - 1] = 'B'
148148
elif prev_tag == 'E':
149149
tags[i - 1] = 'M'
150-
tags[i] = 'M'
150+
tags[i] = tag = 'M'
151151
offset = e
152152
prev_tag = tag
153153
for tags in batch_tags:

hanlp/version.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# Author: hankcs
33
# Date: 2019-12-28 19:26
44

5-
__version__ = '2.1.0-beta.55'
5+
__version__ = '2.1.0-beta.56'
66
"""HanLP version"""
77

88

0 commit comments

Comments
 (0)