A self-hosted, offline, ChatGPT-like chatbot with different LLM support. 100% private, with no data leaving your device.
You can run OpenLLM on any x86 system. Make sure you have Docker installed.
Then, clone this repo and cd
into it:
git clone https://github.com/edgar971/open-chat.git
cd open-chat
You can now run OpenLLM with any of the following models depending upon your hardware:
Model size | Model used | Minimum RAM required | How to start OpenLLM |
---|---|---|---|
7B | Nous Hermes Llama 2 7B (GGML q4_0) | 8GB | docker compose up -d |
13B | Nous Hermes Llama 2 13B (GGML q4_0) | 16GB | docker compose -f docker-compose-13b.yml up -d |
70B | Meta Llama 2 70B Chat (GGML q4_0) | 48GB | docker compose -f docker-compose-70b.yml up -d |
You can access OpenLLM at http://localhost:3000
.
To stop OpenLLM, run:
docker compose down
Additional settings can be found here and added as env variables or arguments to the run.sh (--n_ctx 12) script.
Example:
version: '3'
services:
api:
image: ghcr.io/edgar971/open-chat-cuda:latest
environment:
- MODEL=/path/to/your/model
- N_CTX=4096
ports:
A massive thank you to the following developers and teams for making OpenLLM possible:
- Mckay Wrigley for building Chatbot UI.
- Georgi Gerganov for implementing llama.cpp.
- Andrei for building the Python bindings for llama.cpp.
- NousResearch for fine-tuning the Llama 2 7B and 13B models.
- Tom Jobbins for quantizing the Llama 2 models.
- Meta for releasing Llama 2 under a permissive license.