LangChain-OCR

LangChain-OCR is an advanced OCR solution that converts PDFs and image files into Markdown using cutting-edge vision LLMs. The project comprises two main components: the OCR library (usable via CLI) and a FastAPI backend that offers a streamlined interface for file uploads and processing.

1. Overview

LangChain-OCR leverages vision LLMs to deliver high-quality OCR conversion from PDFs and images (JPEG, PNG) into Markdown. With support for both a direct CLI and an asynchronous FastAPI interface, it serves as a versatile tool for developers and end-users.

2. Features

File Conversion: Convert PDFs and images (JPEG, PNG) to Markdown.
Extensible Design: Easily customize converters, language models, and dependency injections with Inject.
Modern API: Asynchronous processing built on FastAPI.
Observability: Integrated tracing via Langfuse.
Multilingual Support: Configurable language settings.
LLM Integration: Supports Ollama, vLLM and OpenAI with potential for other providers.
Containerization: Ready-to-use Docker and Docker Compose configurations.
CLI Access: Quick OCR processing through the command line.

3. Installation

3.1 Prerequisites

Python: 3.11 or higher (refer to api/.python-version)
Dependency Manager: Poetry
Docker & Docker Compose: For containerized deployment

3.2 Cloning & Environment Setup

Clone the repository and configure your environment:

git clone https://github.com/a-klos/langchain-ocr.git
cd langchain-ocr
cp .env.template .env

Edit the .env file as necessary to adjust language settings, model configuration, and endpoints.

4. Usage

LangChain-OCR can be employed in different ways:

4.1 CLI

For quick OCR tasks via the command line, see the CLI documentation.

4.2 FastAPI Server

Launch the FastAPI backend to access OCR functionality through a RESTful API. Detailed instructions are provided in the FastAPI README.

4.3 Docker Compose Deployment

Deploy the entire stack with Docker Compose:

Install Docker Compose:
Follow the installation guide.
Build & Run Containers:
In the repository root, execute:
```
docker compose up --build
```
Pull a Vision-Capable Model:
Ensure your model configuration matches by pulling the model (e.g., gemma3:4b-it-q4_K_M):
```
ollama pull <<model_name>>
```
Access the Services:
- FastAPI Interface: http://0.0.0.0:8081/docs
- Langfuse Dashboard: http://localhost:3000
  (Default credentials: Username: user, Password: password123 – update as needed.)
Stop Containers:
When done, clean up with:
```
docker compose down
```

4.4 Gradio UI

Access OCR functionality through an intuitive browser interface:

Online Demo: Try the hosted version at Hugging Face Spaces without any installation.

Local Deployment: Configure the environment variables in gradio_ui/.env and`run the Gradio app:

cd gradio_ui
poetry install --no-root
set -a
source .env
set +a
cd src/gradio_ui
python app.py

5. Contributing

Contributions, bug reports, and feature suggestions are welcome. See CONTRIBUTING.md for details on how to get involved.

6. License

Licensed under the MIT License. Refer to the LICENSE file for more information.

7. Contact

For questions, issues, or suggestions, please open an issue on GitHub or contact the maintainer at aklos.ocr@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.vscode		.vscode
api		api
examples		examples
gradio_ui		gradio_ui
images		images
langchain_ocr_lib		langchain_ocr_lib
.env.template		.env.template
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
docker-compose.debug.yaml		docker-compose.debug.yaml
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LangChain-OCR

Table of Contents

1. Overview

2. Features

3. Installation

3.1 Prerequisites

3.2 Cloning & Environment Setup

4. Usage

4.1 CLI

4.2 FastAPI Server

4.3 Docker Compose Deployment

4.4 Gradio UI

5. Contributing

6. License

7. Contact

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors 4

Uh oh!

Languages

License

a-klos/langchain-ocr

Folders and files

Latest commit

History

Repository files navigation

LangChain-OCR

Table of Contents

1. Overview

2. Features

3. Installation

3.1 Prerequisites

3.2 Cloning & Environment Setup

4. Usage

4.1 CLI

4.2 FastAPI Server

4.3 Docker Compose Deployment

4.4 Gradio UI

5. Contributing

6. License

7. Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors 4

Uh oh!

Languages

Packages