You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
||||
14
14
15
15
16
16
Surya is named for the [Hindu sun god](https://en.wikipedia.org/wiki/Surya), who has universal vision.
@@ -21,19 +21,19 @@ Surya is named for the [Hindu sun god](https://en.wikipedia.org/wiki/Surya), who
You can detect the layout of an image, pdf, or folder of images/pdfs with the following command. This will write out a json file with the detected layout.
162
+
163
+
```
164
+
surya_layout DATA_PATH --images
165
+
```
166
+
167
+
-`DATA_PATH` can be an image, pdf, or folder of images/pdfs
168
+
-`--images` will save images of the pages and detected text lines (optional)
169
+
-`--max` specifies the maximum number of pages to process if you don't want to process everything
170
+
-`--results_dir` specifies the directory to save results to instead of the default
171
+
172
+
The `results.json` file will contain a json dictionary where the keys are the input filenames without extensions. Each value will be a list of dictionaries, one per page of the input document. Each page dictionary contains:
173
+
174
+
-`bboxes` - detected bounding boxes for text
175
+
-`bbox` - the axis-aligned rectangle for the text line in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner.
176
+
-`polygon` - the polygon for the text line in (x1, y1), (x2, y2), (x3, y3), (x4, y4) format. The points are in clockwise order from the top left.
177
+
-`confidence` - the confidence of the model in the detected text (0-1). This is currently not very reliable.
178
+
-`label` - the label for the bbox. One of `Caption`, `Footnote`, `Formula`, `List-item`, `Page-footer`, `Page-header`, `Picture`, `Figure`, `Section-header`, `Table`, `Text`, `Title`.
179
+
-`page` - the page number in the file
180
+
-`image_bbox` - the bbox for the image in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner. All line bboxes will be contained within this bbox.
181
+
182
+
**Performance tips**
183
+
184
+
Setting the `DETECTOR_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `280MB` of VRAM, so very high batch sizes are possible. The default is a batch size `32`, which will use about 9GB of VRAM. Depending on your CPU core count, it might help, too - the default CPU batch size is `2`.
185
+
186
+
### From python
187
+
188
+
```
189
+
from PIL import Image
190
+
from surya.detection import batch_detection
191
+
from surya.model.segformer import load_model, load_processor
192
+
from surya.settings import settings
193
+
194
+
image = Image.open(IMAGE_PATH)
195
+
model = load_model(checkpoint=settings.LAYOUT_MODEL_CHECKPOINT)
- This is specialized for document OCR. It will likely not work on photos or other images.
162
208
- Surya is for OCR - the goal is to recognize the text lines correctly, not sort them into reading order. Surya will attempt to sort the lines, which will work in many cases, but use something like [marker](https://github.com/VikParuchuri/marker) or other postprocessing if you need to order the text.
163
209
- It is for printed text, not handwriting (though it may work on some handwriting).
164
-
- The model has trained itself to ignore advertisements.
165
-
- You can find language support for OCR in `surya/languages.py`. Text detection should work with any language.
210
+
- The text detection model has trained itself to ignore advertisements.
211
+
- You can find language support for OCR in `surya/languages.py`. Text detection and layout analysis will work with any language.
166
212
167
213
## Troubleshooting
168
214
@@ -172,7 +218,6 @@ If OCR isn't working properly:
172
218
- Preprocessing the image (binarizing, deskewing, etc) can help with very old/blurry images.
173
219
- You can adjust `DETECTOR_BLANK_THRESHOLD` and `DETECTOR_TEXT_THRESHOLD` if you don't get good results. `DETECTOR_BLANK_THRESHOLD` controls the space between lines - any prediction below this number will be considered blank space. `DETECTOR_TEXT_THRESHOLD` controls how text is joined - any number above this is considered text. `DETECTOR_TEXT_THRESHOLD` should always be higher than `DETECTOR_BLANK_THRESHOLD`, and both should be in the 0-1 range. Looking at the heatmap from the debug output of the detector can tell you how to adjust these (if you see faint things that look like boxes, lower the thresholds, and if you see bboxes being joined together, raise the thresholds).
174
220
175
-
176
221
# Manual install
177
222
178
223
If you want to develop surya, you can install it manually:
@@ -231,6 +276,10 @@ First calculate coverage for each bbox, then add a small penalty for double cove
231
276
232
277
Then we calculate precision and recall for the whole dataset.
233
278
279
+
## Layout analysis
280
+
281
+
282
+
234
283
## Running your own benchmarks
235
284
236
285
You can benchmark the performance of surya on your machine.
0 commit comments