-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ppocr module #1864
Merged
Merged
add ppocr module #1864
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,171 @@ | ||
# ch_pp-ocrv3 | ||
|
||
|模型名称|ch_pp-ocrv3| | ||
| :--- | :---: | | ||
|类别|图像-文字识别| | ||
|网络|Differentiable Binarization+SVTR_LCNet| | ||
|数据集|icdar2015数据集| | ||
|是否支持Fine-tuning|否| | ||
|模型大小|13M| | ||
|最新更新日期|2022-05-11| | ||
|数据指标|-| | ||
|
||
|
||
## 一、模型基本信息 | ||
|
||
- ### 应用效果展示 | ||
- [OCR文字识别场景在线体验](https://www.paddlepaddle.org.cn/hub/scene/ocr) | ||
- 样例结果示例: | ||
<p align="center"> | ||
<img src="https://user-images.githubusercontent.com/22424850/167818854-96811631-d40c-4d07-9aae-b78d4514c917.jpg" width = "600" hspace='10'/> <br /> | ||
</p> | ||
|
||
- ### 模型介绍 | ||
|
||
- PP-OCR是PaddleOCR自研的实用的超轻量OCR系统。在实现前沿算法的基础上,考虑精度与速度的平衡,进行模型瘦身和深度优化,使其尽可能满足产业落地需求。该系统包含文本检测和文本识别两个阶段,其中文本检测算法选用DB,文本识别算法选用CRNN,并在检测和识别模块之间添加文本方向分类器,以应对不同方向的文本识别。当前模块为PP-OCRv3,在PP-OCRv2的基础上,针对检测模型和识别模型,进行了共计9个方面的升级,进一步提升了模型效果。 | ||
<p align="center"> | ||
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/release/2.5/doc/ppocrv3_framework.png" width="800" hspace='10'/> <br /> | ||
</p> | ||
|
||
- 更多详情参考:[PP-OCRv3](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_ch/PP-OCRv3_introduction.md)。 | ||
|
||
|
||
|
||
## 二、安装 | ||
|
||
- ### 1、环境依赖 | ||
|
||
- paddlepaddle >= 2.2 | ||
|
||
- paddlehub >= 2.2 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst) | ||
|
||
- ### 2、安装 | ||
|
||
- ```shell | ||
$ hub install ch_pp-ocrv3 | ||
``` | ||
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md) | ||
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md) | ||
|
||
|
||
|
||
## 三、模型API预测 | ||
|
||
- ### 1、命令行预测 | ||
|
||
- ```shell | ||
$ hub run ch_pp-ocrv3 --input_path "/PATH/TO/IMAGE" | ||
``` | ||
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst) | ||
|
||
- ### 2、代码示例 | ||
|
||
- ```python | ||
import paddlehub as hub | ||
import cv2 | ||
|
||
ocr = hub.Module(name="ch_pp-ocrv3", enable_mkldnn=True) # mkldnn加速仅在CPU下有效 | ||
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')]) | ||
|
||
# or | ||
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE']) | ||
``` | ||
|
||
- ### 3、API | ||
|
||
- ```python | ||
__init__(text_detector_module=None, enable_mkldnn=False) | ||
``` | ||
|
||
- 构造用于文本检测的模块 | ||
|
||
- **参数** | ||
|
||
- text_detector_module(str): 文字检测PaddleHub Module名字,如设置为None,则默认使用[ch_pp-ocrv3_det Module](../ch_pp-ocrv3_det/)。其作用为检测图片当中的文本。 | ||
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。 | ||
|
||
|
||
- ```python | ||
def recognize_text(images=[], | ||
paths=[], | ||
use_gpu=False, | ||
output_dir='ocr_result', | ||
visualization=False, | ||
box_thresh=0.5, | ||
text_thresh=0.5, | ||
angle_classification_thresh=0.9, | ||
det_db_unclip_ratio=1.5) | ||
``` | ||
|
||
- 预测API,检测输入图片中的所有中文文本的位置。 | ||
|
||
- **参数** | ||
|
||
- paths (list\[str\]): 图片的路径; | ||
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; | ||
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** | ||
- box\_thresh (float): 检测文本框置信度的阈值; | ||
- text\_thresh (float): 识别中文文本置信度的阈值; | ||
- angle_classification_thresh(float): 文本角度分类置信度的阈值 | ||
- visualization (bool): 是否将识别结果保存为图片文件; | ||
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result; | ||
- det\_db\_unclip\_ratio: 设置检测框的大小; | ||
- **返回** | ||
|
||
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: | ||
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为: | ||
- text(str): 识别得到的文本 | ||
- confidence(float): 识别文本结果置信度 | ||
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标 | ||
如果无识别结果则data为\[\] | ||
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为'' | ||
|
||
|
||
## 四、服务部署 | ||
|
||
- PaddleHub Serving 可以部署一个目标检测的在线服务。 | ||
|
||
- ### 第一步:启动PaddleHub Serving | ||
|
||
- 运行启动命令: | ||
- ```shell | ||
$ hub serving start -m ch_pp-ocrv3 | ||
``` | ||
|
||
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 | ||
|
||
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 | ||
|
||
- ### 第二步:发送预测请求 | ||
|
||
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 | ||
|
||
- ```python | ||
import requests | ||
import json | ||
import cv2 | ||
import base64 | ||
|
||
def cv2_to_base64(image): | ||
data = cv2.imencode('.jpg', image)[1] | ||
return base64.b64encode(data.tostring()).decode('utf8') | ||
|
||
# 发送HTTP请求 | ||
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} | ||
headers = {"Content-type": "application/json"} | ||
url = "http://127.0.0.1:8866/predict/ch_pp-ocrv3" | ||
r = requests.post(url=url, headers=headers, data=json.dumps(data)) | ||
|
||
# 打印预测结果 | ||
print(r.json()["results"]) | ||
``` | ||
|
||
## 五、更新历史 | ||
|
||
* 1.0.0 | ||
|
||
初始发布 | ||
|
||
- ```shell | ||
$ hub install ch_pp-ocrv3==1.0.0 | ||
``` |
223 changes: 223 additions & 0 deletions
223
modules/image/text_recognition/ch_pp-ocrv3/character.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,223 @@ | ||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
import string | ||
|
||
import numpy as np | ||
|
||
|
||
class CharacterOps(object): | ||
""" Convert between text-label and text-index | ||
Args: | ||
config: config from yaml file | ||
""" | ||
|
||
def __init__(self, config): | ||
self.character_type = config['character_type'] | ||
self.max_text_len = config['max_text_length'] | ||
if self.character_type == "en": | ||
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz" | ||
dict_character = list(self.character_str) | ||
# use the custom dictionary | ||
elif self.character_type == "ch": | ||
character_dict_path = config['character_dict_path'] | ||
add_space = False | ||
if 'use_space_char' in config: | ||
add_space = config['use_space_char'] | ||
self.character_str = [] | ||
with open(character_dict_path, "rb") as fin: | ||
lines = fin.readlines() | ||
for line in lines: | ||
line = line.decode('utf-8').strip("\n").strip("\r\n") | ||
self.character_str.append(line) | ||
if add_space: | ||
self.character_str.append(" ") | ||
dict_character = list(self.character_str) | ||
elif self.character_type == "en_sensitive": | ||
# same with ASTER setting (use 94 char). | ||
self.character_str = string.printable[:-6] | ||
dict_character = list(self.character_str) | ||
else: | ||
self.character_str = None | ||
self.beg_str = "sos" | ||
self.end_str = "eos" | ||
|
||
dict_character = self.add_special_char(dict_character) | ||
self.dict = {} | ||
for i, char in enumerate(dict_character): | ||
self.dict[char] = i | ||
self.character = dict_character | ||
|
||
def add_special_char(self, dict_character): | ||
dict_character = ['blank'] + dict_character | ||
return dict_character | ||
|
||
def encode(self, text): | ||
"""convert text-label into text-index. | ||
input: | ||
text: text labels of each image. [batch_size] | ||
|
||
output: | ||
text: concatenated text index for CTCLoss. | ||
[sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)] | ||
length: length of each text. [batch_size] | ||
""" | ||
if self.character_type == "en": | ||
text = text.lower() | ||
|
||
text_list = [] | ||
for char in text: | ||
if char not in self.dict: | ||
continue | ||
text_list.append(self.dict[char]) | ||
text = np.array(text_list) | ||
return text | ||
|
||
def decode(self, text_index, text_prob=None, is_remove_duplicate=False): | ||
""" convert text-index into text-label. """ | ||
result_list = [] | ||
ignored_tokens = self.get_ignored_tokens() | ||
batch_size = len(text_index) | ||
for batch_idx in range(batch_size): | ||
selection = np.ones(len(text_index[batch_idx]), dtype=bool) | ||
if is_remove_duplicate: | ||
selection[1:] = text_index[batch_idx][1:] != text_index[batch_idx][:-1] | ||
for ignored_token in ignored_tokens: | ||
selection &= text_index[batch_idx] != ignored_token | ||
char_list = [self.character[text_id] for text_id in text_index[batch_idx][selection]] | ||
if text_prob is not None: | ||
conf_list = text_prob[batch_idx][selection] | ||
else: | ||
conf_list = [1] * len(selection) | ||
if len(conf_list) == 0: | ||
conf_list = [0] | ||
|
||
text = ''.join(char_list) | ||
result_list.append((text, np.mean(conf_list).tolist())) | ||
return result_list | ||
|
||
def get_char_num(self): | ||
return len(self.character) | ||
|
||
def get_beg_end_flag_idx(self, beg_or_end): | ||
if self.loss_type == "attention": | ||
if beg_or_end == "beg": | ||
idx = np.array(self.dict[self.beg_str]) | ||
elif beg_or_end == "end": | ||
idx = np.array(self.dict[self.end_str]) | ||
else: | ||
assert False, "Unsupport type %s in get_beg_end_flag_idx"\ | ||
% beg_or_end | ||
return idx | ||
else: | ||
err = "error in get_beg_end_flag_idx when using the loss %s"\ | ||
% (self.loss_type) | ||
assert False, err | ||
|
||
def get_ignored_tokens(self): | ||
return [0] # for ctc blank | ||
|
||
|
||
def cal_predicts_accuracy(char_ops, preds, preds_lod, labels, labels_lod, is_remove_duplicate=False): | ||
""" | ||
Calculate prediction accuracy | ||
Args: | ||
char_ops: CharacterOps | ||
preds: preds result,text index | ||
preds_lod: lod tensor of preds | ||
labels: label of input image, text index | ||
labels_lod: lod tensor of label | ||
is_remove_duplicate: Whether to remove duplicate characters, | ||
The default is False | ||
Return: | ||
acc: The accuracy of test set | ||
acc_num: The correct number of samples predicted | ||
img_num: The total sample number of the test set | ||
""" | ||
acc_num = 0 | ||
img_num = 0 | ||
for ino in range(len(labels_lod) - 1): | ||
beg_no = preds_lod[ino] | ||
end_no = preds_lod[ino + 1] | ||
preds_text = preds[beg_no:end_no].reshape(-1) | ||
preds_text = char_ops.decode(preds_text, is_remove_duplicate) | ||
|
||
beg_no = labels_lod[ino] | ||
end_no = labels_lod[ino + 1] | ||
labels_text = labels[beg_no:end_no].reshape(-1) | ||
labels_text = char_ops.decode(labels_text, is_remove_duplicate) | ||
img_num += 1 | ||
|
||
if preds_text == labels_text: | ||
acc_num += 1 | ||
acc = acc_num * 1.0 / img_num | ||
return acc, acc_num, img_num | ||
|
||
|
||
def cal_predicts_accuracy_srn(char_ops, preds, labels, max_text_len, is_debug=False): | ||
acc_num = 0 | ||
img_num = 0 | ||
|
||
char_num = char_ops.get_char_num() | ||
|
||
total_len = preds.shape[0] | ||
img_num = int(total_len / max_text_len) | ||
for i in range(img_num): | ||
cur_label = [] | ||
cur_pred = [] | ||
for j in range(max_text_len): | ||
if labels[j + i * max_text_len] != int(char_num - 1): #0 | ||
cur_label.append(labels[j + i * max_text_len][0]) | ||
else: | ||
break | ||
|
||
for j in range(max_text_len + 1): | ||
if j < len(cur_label) and preds[j + i * max_text_len][0] != cur_label[j]: | ||
break | ||
elif j == len(cur_label) and j == max_text_len: | ||
acc_num += 1 | ||
break | ||
elif j == len(cur_label) and preds[j + i * max_text_len][0] == int(char_num - 1): | ||
acc_num += 1 | ||
break | ||
acc = acc_num * 1.0 / img_num | ||
return acc, acc_num, img_num | ||
|
||
|
||
def convert_rec_attention_infer_res(preds): | ||
img_num = preds.shape[0] | ||
target_lod = [0] | ||
convert_ids = [] | ||
for ino in range(img_num): | ||
end_pos = np.where(preds[ino, :] == 1)[0] | ||
if len(end_pos) <= 1: | ||
text_list = preds[ino, 1:] | ||
else: | ||
text_list = preds[ino, 1:end_pos[1]] | ||
target_lod.append(target_lod[ino] + len(text_list)) | ||
convert_ids = convert_ids + list(text_list) | ||
convert_ids = np.array(convert_ids) | ||
convert_ids = convert_ids.reshape((-1, 1)) | ||
return convert_ids, target_lod | ||
|
||
|
||
def convert_rec_label_to_lod(ori_labels): | ||
img_num = len(ori_labels) | ||
target_lod = [0] | ||
convert_ids = [] | ||
for ino in range(img_num): | ||
target_lod.append(target_lod[ino] + len(ori_labels[ino])) | ||
convert_ids = convert_ids + list(ori_labels[ino]) | ||
convert_ids = np.array(convert_ids) | ||
convert_ids = convert_ids.reshape((-1, 1)) | ||
return convert_ids, target_lod |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
版本信息应该是2.2