LangChain-OCR is an advanced OCR solution that converts PDFs and image files into Markdown using cutting-edge vision LLMs. The project comprises two main components: the OCR library (usable via CLI) and a FastAPI backend that offers a streamlined interface for file uploads and processing.
LangChain-OCR leverages vision LLMs to deliver high-quality OCR conversion from PDFs and images (JPEG, PNG) into Markdown. With support for both a direct CLI and an asynchronous FastAPI interface, it serves as a versatile tool for developers and end-users.
- File Conversion: Convert PDFs and images (JPEG, PNG) to Markdown.
- Extensible Design: Easily customize converters, language models, and dependency injections with Inject.
- Modern API: Asynchronous processing built on FastAPI.
- Observability: Integrated tracing via Langfuse.
- Multilingual Support: Configurable language settings.
- LLM Integration: Supports Ollama, vLLM and OpenAI with potential for other providers.
- Containerization: Ready-to-use Docker and Docker Compose configurations.
- CLI Access: Quick OCR processing through the command line.
- Python: 3.11 or higher (refer to api/.python-version)
- Dependency Manager: Poetry
- Docker & Docker Compose: For containerized deployment
Clone the repository and configure your environment:
git clone https://github.com/a-klos/langchain-ocr.git
cd langchain-ocr
cp .env.template .env
Edit the .env
file as necessary to adjust language settings, model configuration, and endpoints.
LangChain-OCR can be employed in different ways:
For quick OCR tasks via the command line, see the CLI documentation.
Launch the FastAPI backend to access OCR functionality through a RESTful API. Detailed instructions are provided in the FastAPI README.
Deploy the entire stack with Docker Compose:
-
Install Docker Compose:
Follow the installation guide. -
Build & Run Containers:
In the repository root, execute:docker compose up --build
-
Pull a Vision-Capable Model:
Ensure your model configuration matches by pulling the model (e.g.,gemma3:4b-it-q4_K_M
):ollama pull <<model_name>>
-
Access the Services:
- FastAPI Interface: http://0.0.0.0:8081/docs
- Langfuse Dashboard: http://localhost:3000
(Default credentials: Username: user, Password: password123 – update as needed.)
-
Stop Containers:
When done, clean up with:docker compose down
Contributions, bug reports, and feature suggestions are welcome. See CONTRIBUTING.md for details on how to get involved.
Licensed under the MIT License. Refer to the LICENSE file for more information.
For questions, issues, or suggestions, please open an issue on GitHub or contact the maintainer at aklos.ocr@gmail.com
.