You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Scanned Form |[Image](static/images/funsd.png)||
35
+
| Scanned Old Form |[Image](static/images/funsd.png)|[Image](static/images/funsd_text.jpg)|
36
36
37
37
# Installation
38
38
@@ -51,6 +51,15 @@ Model weights will automatically download the first time you run surya. Note th
51
51
- Inspect the settings in `surya/settings.py`. You can override any settings with environment variables.
52
52
- Your torch device will be automatically detected, but you can override this. For example, `TORCH_DEVICE=cuda`. For text detection, the `mps` device has a bug (on the [Apple side](https://github.com/pytorch/pytorch/issues/84936)) that may prevent it from working properly.
53
53
54
+
## Interactive App
55
+
56
+
I've included a streamlit app that lets you interactively try Surya on images or PDF files. Run it with:
57
+
58
+
```
59
+
pip install streamlit
60
+
surya_gui
61
+
```
62
+
54
63
## OCR (text recognition)
55
64
56
65
You can detect text in an image, pdf, or folder of images/pdfs with the following command. This will write out a json file with the detected text and bboxes, and optionally save images of the reconstructed page.
@@ -78,10 +87,7 @@ The `results.json` file will contain these keys for each page of the input docum
78
87
79
88
**Performance tips**
80
89
81
-
Setting the `RECOGNITION_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `40MB` of VRAM, so very high batch sizes are possible. The default is a batch size `256`, which will use about 10GB of VRAM.
82
-
83
-
Depending on your CPU core count, `RECOGNITION_BATCH_SIZE` might make a difference there too - the default CPU batch size is `32`.
84
-
90
+
Setting the `RECOGNITION_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `40MB` of VRAM, so very high batch sizes are possible. The default is a batch size `256`, which will use about 10GB of VRAM. Depending on your CPU core count, it may help, too - the default CPU batch size is `32`.
85
91
86
92
### From python
87
93
@@ -94,20 +100,15 @@ from surya.model.recognition.processor import load_processor as load_rec_process
You can detect text lines in an image, pdf, or folder of images/pdfs with the following command. This will write out a json file with the detected bboxes, and optionally save images of the pages with the bboxes.
111
+
You can detect text lines in an image, pdf, or folder of images/pdfs with the following command. This will write out a json file with the detected bboxes.
111
112
112
113
```
113
114
surya_detect DATA_PATH --images
@@ -128,12 +129,7 @@ The `results.json` file will contain these keys for each page of the input docum
128
129
129
130
**Performance tips**
130
131
131
-
Setting the `DETECTOR_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `280MB` of VRAM, so very high batch sizes are possible. The default is a batch size `32`, which will use about 9GB of VRAM.
132
-
133
-
Depending on your CPU core count, `DETECTOR_BATCH_SIZE` might make a difference there too - the default CPU batch size is `2`.
134
-
135
-
You can adjust `DETECTOR_NMS_THRESHOLD` and `DETECTOR_TEXT_THRESHOLD` if you don't get good results. Try lowering them to detect more text, and vice versa.
136
-
132
+
Setting the `DETECTOR_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `280MB` of VRAM, so very high batch sizes are possible. The default is a batch size `32`, which will use about 9GB of VRAM. Depending on your CPU core count, it might help, too - the default CPU batch size is `2`.
- This is specialized for document OCR. It will likely not work on photos or other images.
151
+
- It is for printed text, not handwriting (though it may work on some handwriting).
152
+
- The model has trained itself to ignore advertisements.
153
+
- You can find language support for OCR in `surya/languages.py`. Text detection should work with any language.
154
+
155
+
## Troubleshooting
156
+
157
+
If OCR isn't working properly:
158
+
159
+
- If the lines aren't detected properly, try increasing resolution of the image if the width is below `896px`, and vice versa. Very high width images don't work well with the detector.
160
+
- You can adjust `DETECTOR_BLANK_THRESHOLD` and `DETECTOR_TEXT_THRESHOLD` if you don't get good results. `DETECTOR_BLANK_THRESHOLD` controls the space between lines - any prediction below this number will be considered blank space. `DETECTOR_TEXT_THRESHOLD` controls how text is joined - any number above this is considered text. `DETECTOR_TEXT_THRESHOLD` should always be higher than `DETECTOR_BLANK_THRESHOLD`, and both should be in the 0-1 range. Looking at the heatmap from the debug output of the detector can tell you how to adjust these (if you see faint things that look like boxes, lower the thresholds, and if you see bboxes being joined together, raise the thresholds).
153
161
154
-
Coming soon.
155
162
156
163
# Manual install
157
164
@@ -162,13 +169,6 @@ If you want to develop surya, you can install it manually:
162
169
-`poetry install` - installs main and dev dependencies
163
170
-`poetry shell` - activates the virtual environment
164
171
165
-
# Limitations
166
-
167
-
- This is specialized for document OCR. It will likely not work on photos or other images.
168
-
- It is for printed text, not handwriting (though it may work on some handwriting).
169
-
- The model has trained itself to ignore advertisements.
170
-
- You can find language support for OCR in `surya/languages.py`. Text detection should work with any language.
0 commit comments