Skip to content

Commit ef0943e

Browse files
committed
Add streamlit app
1 parent 9d3e906 commit ef0943e

12 files changed

+542
-75
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ wandb
88
notebooks
99
results
1010
data
11+
slices
1112

1213
# Byte-compiled / optimized / DLL files
1314
__pycache__/

README.md

+28-28
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Surya is named for the [Hindu sun god](https://en.wikipedia.org/wiki/Surya), who
3232
| Presentation | [Image](static/images/pres.png) | [Image](static/images/pres_text.png) |
3333
| Scientific Paper | [Image](static/images/paper.png) | [Image](static/images/paper_text.png) |
3434
| Scanned Document | [Image](static/images/scanned.png) | [Image](static/images/scanned_text.png) |
35-
| Scanned Form | [Image](static/images/funsd.png) | |
35+
| Scanned Old Form | [Image](static/images/funsd.png) | [Image](static/images/funsd_text.jpg) |
3636

3737
# Installation
3838

@@ -51,6 +51,15 @@ Model weights will automatically download the first time you run surya. Note th
5151
- Inspect the settings in `surya/settings.py`. You can override any settings with environment variables.
5252
- Your torch device will be automatically detected, but you can override this. For example, `TORCH_DEVICE=cuda`. For text detection, the `mps` device has a bug (on the [Apple side](https://github.com/pytorch/pytorch/issues/84936)) that may prevent it from working properly.
5353

54+
## Interactive App
55+
56+
I've included a streamlit app that lets you interactively try Surya on images or PDF files. Run it with:
57+
58+
```
59+
pip install streamlit
60+
surya_gui
61+
```
62+
5463
## OCR (text recognition)
5564

5665
You can detect text in an image, pdf, or folder of images/pdfs with the following command. This will write out a json file with the detected text and bboxes, and optionally save images of the reconstructed page.
@@ -78,10 +87,7 @@ The `results.json` file will contain these keys for each page of the input docum
7887

7988
**Performance tips**
8089

81-
Setting the `RECOGNITION_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `40MB` of VRAM, so very high batch sizes are possible. The default is a batch size `256`, which will use about 10GB of VRAM.
82-
83-
Depending on your CPU core count, `RECOGNITION_BATCH_SIZE` might make a difference there too - the default CPU batch size is `32`.
84-
90+
Setting the `RECOGNITION_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `40MB` of VRAM, so very high batch sizes are possible. The default is a batch size `256`, which will use about 10GB of VRAM. Depending on your CPU core count, it may help, too - the default CPU batch size is `32`.
8591

8692
### From python
8793

@@ -94,20 +100,15 @@ from surya.model.recognition.processor import load_processor as load_rec_process
94100
95101
image = Image.open(IMAGE_PATH)
96102
langs = ["en"] # Replace with your languages
97-
98-
det_processor = load_det_processor()
99-
det_model = load_det_model()
100-
101-
rec_model = load_rec_model()
102-
rec_processor = load_rec_processor()
103+
det_processor, det_model = load_det_processor(), load_det_model()
104+
rec_model, rec_processor = load_rec_model(), load_rec_processor()
103105
104106
predictions = run_ocr([image], langs, det_model, det_processor, rec_model, rec_processor)
105107
```
106108

107-
108109
## Text line detection
109110

110-
You can detect text lines in an image, pdf, or folder of images/pdfs with the following command. This will write out a json file with the detected bboxes, and optionally save images of the pages with the bboxes.
111+
You can detect text lines in an image, pdf, or folder of images/pdfs with the following command. This will write out a json file with the detected bboxes.
111112

112113
```
113114
surya_detect DATA_PATH --images
@@ -128,12 +129,7 @@ The `results.json` file will contain these keys for each page of the input docum
128129

129130
**Performance tips**
130131

131-
Setting the `DETECTOR_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `280MB` of VRAM, so very high batch sizes are possible. The default is a batch size `32`, which will use about 9GB of VRAM.
132-
133-
Depending on your CPU core count, `DETECTOR_BATCH_SIZE` might make a difference there too - the default CPU batch size is `2`.
134-
135-
You can adjust `DETECTOR_NMS_THRESHOLD` and `DETECTOR_TEXT_THRESHOLD` if you don't get good results. Try lowering them to detect more text, and vice versa.
136-
132+
Setting the `DETECTOR_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `280MB` of VRAM, so very high batch sizes are possible. The default is a batch size `32`, which will use about 9GB of VRAM. Depending on your CPU core count, it might help, too - the default CPU batch size is `2`.
137133

138134
### From python
139135

@@ -149,9 +145,20 @@ model, processor = load_model(), load_processor()
149145
predictions = batch_detection([image], model, processor)
150146
```
151147

152-
## Table and chart detection
148+
# Limitations
149+
150+
- This is specialized for document OCR. It will likely not work on photos or other images.
151+
- It is for printed text, not handwriting (though it may work on some handwriting).
152+
- The model has trained itself to ignore advertisements.
153+
- You can find language support for OCR in `surya/languages.py`. Text detection should work with any language.
154+
155+
## Troubleshooting
156+
157+
If OCR isn't working properly:
158+
159+
- If the lines aren't detected properly, try increasing resolution of the image if the width is below `896px`, and vice versa. Very high width images don't work well with the detector.
160+
- You can adjust `DETECTOR_BLANK_THRESHOLD` and `DETECTOR_TEXT_THRESHOLD` if you don't get good results. `DETECTOR_BLANK_THRESHOLD` controls the space between lines - any prediction below this number will be considered blank space. `DETECTOR_TEXT_THRESHOLD` controls how text is joined - any number above this is considered text. `DETECTOR_TEXT_THRESHOLD` should always be higher than `DETECTOR_BLANK_THRESHOLD`, and both should be in the 0-1 range. Looking at the heatmap from the debug output of the detector can tell you how to adjust these (if you see faint things that look like boxes, lower the thresholds, and if you see bboxes being joined together, raise the thresholds).
153161

154-
Coming soon.
155162

156163
# Manual install
157164

@@ -162,13 +169,6 @@ If you want to develop surya, you can install it manually:
162169
- `poetry install` - installs main and dev dependencies
163170
- `poetry shell` - activates the virtual environment
164171

165-
# Limitations
166-
167-
- This is specialized for document OCR. It will likely not work on photos or other images.
168-
- It is for printed text, not handwriting (though it may work on some handwriting).
169-
- The model has trained itself to ignore advertisements.
170-
- You can find language support for OCR in `surya/languages.py`. Text detection should work with any language.
171-
172172
# Benchmarks
173173

174174
## OCR

demo_app.py

-38
This file was deleted.

ocr_app.py

+119
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
import io
2+
3+
import pypdfium2
4+
import streamlit as st
5+
from surya.detection import batch_detection
6+
from surya.model.detection.segformer import load_model, load_processor
7+
from surya.model.recognition.model import load_model as load_rec_model
8+
from surya.model.recognition.processor import load_processor as load_rec_processor
9+
from surya.postprocessing.heatmap import draw_polys_on_image
10+
from surya.ocr import run_ocr
11+
from surya.postprocessing.text import draw_text_on_image
12+
from PIL import Image
13+
from surya.languages import CODE_TO_LANGUAGE
14+
from surya.input.langs import replace_lang_with_code
15+
16+
17+
@st.cache_resource()
18+
def load_det_cached():
19+
return load_model(), load_processor()
20+
21+
22+
@st.cache_resource()
23+
def load_rec_cached():
24+
return load_rec_model(), load_rec_processor()
25+
26+
27+
def text_detection(img):
28+
preds = batch_detection([img], det_model, det_processor)[0]
29+
det_img = draw_polys_on_image(preds["polygons"], img.copy())
30+
return det_img, preds
31+
32+
33+
# Function for OCR
34+
def ocr(img, langs):
35+
replace_lang_with_code(langs)
36+
pred = run_ocr([img], [langs], det_model, det_processor, rec_model, rec_processor)[0]
37+
rec_img = draw_text_on_image(pred["bboxes"], pred["text_lines"], img.size)
38+
return rec_img, pred
39+
40+
41+
def open_pdf(pdf_file):
42+
stream = io.BytesIO(pdf_file.getvalue())
43+
return pypdfium2.PdfDocument(stream)
44+
45+
46+
@st.cache_data()
47+
def get_page_image(pdf_file, page_num, dpi=96):
48+
doc = open_pdf(pdf_file)
49+
renderer = doc.render(
50+
pypdfium2.PdfBitmap.to_pil,
51+
page_indices=[page_num - 1],
52+
scale=dpi / 72,
53+
)
54+
png = list(renderer)[0]
55+
png_image = png.convert("RGB")
56+
return png_image
57+
58+
59+
@st.cache_data()
60+
def page_count(pdf_file):
61+
doc = open_pdf(pdf_file)
62+
return len(doc)
63+
64+
65+
st.set_page_config(layout="wide")
66+
col1, col2 = st.columns([.5, .5])
67+
68+
det_model, det_processor = load_det_cached()
69+
rec_model, rec_processor = load_rec_cached()
70+
71+
72+
st.markdown("""
73+
# Surya OCR Demo
74+
75+
This app will let you try surya, a multilingual OCR model. It supports text detection in any language, and text recognition in 90+ languages.
76+
77+
Notes:
78+
- This works best on documents with printed text.
79+
- Try to keep the image width around 896, especially if you have large text.
80+
- This supports 90+ languages, see [here](https://github.com/VikParuchuri/surya/tree/master/surya/languages.py) for a full list of codes.
81+
82+
Find the project [here](https://github.com/VikParuchuri/surya).
83+
""")
84+
85+
in_file = st.sidebar.file_uploader("PDF file or image:", type=["pdf", "png", "jpg", "jpeg", "gif", "webp"])
86+
languages = st.sidebar.multiselect("Languages", sorted(list(CODE_TO_LANGUAGE.values())), default=["English"], max_selections=4)
87+
88+
if in_file is None:
89+
st.stop()
90+
91+
filetype = in_file.type
92+
whole_image = False
93+
if "pdf" in filetype:
94+
page_count = page_count(in_file)
95+
page_number = st.sidebar.number_input(f"Page number out of {page_count}:", min_value=1, value=1, max_value=page_count)
96+
97+
pil_image = get_page_image(in_file, page_number)
98+
else:
99+
pil_image = Image.open(in_file).convert("RGB")
100+
101+
text_det = st.sidebar.button("Run Text Detection")
102+
text_rec = st.sidebar.button("Run OCR")
103+
104+
# Run Text Detection
105+
if text_det and pil_image is not None:
106+
det_img, preds = text_detection(pil_image)
107+
with col1:
108+
st.image(det_img, caption="Detected Text", use_column_width=True)
109+
st.json(preds)
110+
111+
# Run OCR
112+
if text_rec and pil_image is not None:
113+
rec_img, pred = ocr(pil_image, languages)
114+
with col1:
115+
st.image(rec_img, caption="OCR Result", use_column_width=True)
116+
st.json(pred)
117+
118+
with col2:
119+
st.image(pil_image, caption="Uploaded Image", use_column_width=True)

0 commit comments

Comments
 (0)