Suggestion: get raw OCR text for non-table content #216

gonarguello · 2024-09-12T13:23:43Z

Oftentimes it is really useful to have all the text that does not belong to tables in the document to make further processing.
Maybe, in the same way that the lib extracts 'title' it could extract 'footer'.
Or just put all the OCR text that is not part of a table in another attribute, accesible through the 'table' object.

Example:
When processing an invoice, the 'invoice items' would come in a 'table' and everything else in 'title' and 'footer' objects to make further (manual) processing of important fields such as date, number, account numbers, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: get raw OCR text for non-table content #216

Suggestion: get raw OCR text for non-table content #216

gonarguello commented Sep 12, 2024

Suggestion: get raw OCR text for non-table content #216

Suggestion: get raw OCR text for non-table content #216

Comments

gonarguello commented Sep 12, 2024