Skip to content

This project provides a web application to compare responses from multiple Large Language Models (LLMs) side by side, with automated analysis. It generates a comprehensive HTML report containing the models' answers to a given query, along with similarity metrics and visualizations.

License

BSD-3-Clause, MIT licenses found

Licenses found

BSD-3-Clause
LICENSE-BSD
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

isaacharlem/llm-comparative-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Comparative Analysis Tool

Overview

This project provides a web application to compare responses from multiple Large Language Models (LLMs) side by side, with automated analysis. It generates a comprehensive HTML report containing the models' answers to a given query, along with similarity metrics and visualizations.

This project uses two different licenses for different parts. Please see Licenses below.

References

Features

  • Multi-Model Query: Ask a question once and get responses from multiple LLMs.
  • Automated Metrics: Computes cosine similarity between response embeddings to find the most consistent answer, plus BLEU and ROUGE-L scores that use the most consistent answer from each model as the reference.
  • Visualizations: Includes PCA plots of embeddings, similarity distribution histograms, heatmaps, and summary bar charts.
  • Interactive Web UI: React frontend with real-time progress updates as models generate responses.
  • Reports Archive: Each run produces an HTML report saved in a persistent volume (reports/ directory) for later viewing.

Architecture

  • Frontend: React app (SPA) that connects via WebSocket to the backend for live updates. Users input the query and model names here.
  • Backend: FastAPI server that orchestrates calls to the LLM engine (Ollama) and computes metrics. Exposes a WebSocket endpoint for progress and serves the final reports.
  • LLM Engine: Ollama is used to run LLMs and embedding models locally (supports CPU, Apple MPS, and NVIDIA GPUs).
  • Containerization: Docker Compose is used to containerize the frontend, backend, and Ollama services for easy deployment across different environments.

Repository Structure

After cloning, your repository should have the following structure:

llm-comparative-analysis/
├── .gitignore             # Ignore node_modules, build artifacts, etc.
├── README.md
├── docker-compose.yml
├── reports/               # Persisted HTML reports (empty initially)
├── backend/
│   ├── Dockerfile.backend
│   ├── app.py
│   ├── model_comparison.py
│   └── requirements.txt
├── docker/
│   ├── Dockerfile.ollama
│   └── entrypoint.sh
└── frontend/
    ├── Dockerfile.frontend
    ├── package.json
    ├── package-lock.json  # Not committed, Generated by npm install
    |── node_modules/      # Not committed, generated by npm install
    ├── public/
    │   └── index.html
    └── src/
        ├── App.js
        └── index.js

Note: The node_modules/ folder and package-lock.json is not committed (and is listed in .gitignore).

Prerequisites

  • Docker (or Podman with podman-compose) installed on your system.
  • Sufficient system resources to run the chosen LLMs. (For heavy models, a GPU is recommended: Apple M1/M2 or NVIDIA GPU with CUDA.)
  • Ensure your system has internet access for downloading Docker images and models.
  • (Optional) If using BLEU metrics, note that NLTK data may be downloaded on the first run.

Setup and Running

Follow these steps after cloning the repository:

1. Clone the Repository

git clone https://github.com/isaacharlem/llm-comparative-analysis.git
cd llm-comparative-analysis

2. Frontend Setup

Navigate to the frontend directory and install the dependencies.

cd frontend
npm install

This will generate a consistent package-lock.json. (The node_modules/ folder is not committed.)

3. Ensure the Ollama Entrypoint Script Is Executable

Return to the repository root and set executable permission on the entrypoint script:

cd ..
chmod +x docker/entrypoint.sh

4. Build and Start All Containers

From the repository root, run:

docker-compose up --build

This command will:

  1. Build the Frontend: Using Dockerfile.frontend to install Node.js dependencies, build the React app, and serve it via Nginx.
  2. Build the Backend: Using Dockerfile.backend to install Python dependencies and run the FastAPI app.
  3. Build the Ollama Service: Using docker/Dockerfile.ollama with the provided entrypoint.sh script, which pulls the required models (smollm:135m, deepseek-r1:1.5b, qwen:1.8b) and then starts the Ollama server.

5. Access the Application

Once the containers are running:

6. Using the Web Application

On the frontend page:

  1. Enter a Query: Type your question or prompt.
  2. List Models: Provide model identifiers as a comma-separated list (e.g., deepseek-r1:1.5b, qwen:1.8b). Ensure these models match those pulled by the Ollama service. Note: If you want to test Ollama models other than deepseek-r1:1.5b, qwen:1.8b, or phi3:mini, you must also update the pull commands in docker/entrypoint.sh.
  3. Set Number of Responses: Choose the number of responses each model should generate.
  4. Reference Answer (Optional): Enter a reference answer to compute BLEU and ROUGE-L scores.
  5. Click Generate Report. Real-time progress will be shown via WebSocket updates. Once completed, the final report (an HTML page) will be displayed in an embedded frame and saved in the reports/ directory.

7. Stopping the Application

To stop all services, press Ctrl+C in the terminal running Docker Compose or run:

docker-compose down

The generated reports will remain in the reports/ directory.

Customization and Troubleshooting

  • Model Configuration: To change the models being pulled, update the entrypoint script in docker/entrypoint.sh. For example, modify the commands to pull different models.
  • NLP Metrics & Visualizations: The backend (in backend/model_comparison.py) computes cosine similarity, BLEU, and ROUGE-L metrics, and generates various plots. You can extend these metrics by editing that file.
  • GPU Support: For NVIDIA GPU support, ensure Docker is configured with the NVIDIA Container Toolkit. You can adjust GPU settings in docker-compose.yml if needed. For Apple M1/M2, Docker Desktop should automatically handle hardware acceleration.
  • Default 404 on Root: The FastAPI backend is API-only and may return 404 for the root path. You can add a simple route in backend/app.py if desired.
  • Ollama Service: If the Ollama service fails to start, try re-running chmod +x docker/entrypoint.sh and docker-compose up --build.

Licenses

model_comparison.py is derived from code originally developed by Isaac Harlem at the UChicago Data Science Institute and is licensed under the BSD 3-Clause License (see LICENSE-BSD) (Copyright © 2024, UChicago Data Science Institute).

This statement does not imply any official endorsement by the UChicago Data Science Institute.

All other files in this repository are authored by Isaac Harlem and are licensed under the MIT License (see LICENSE-MIT).

About

This project provides a web application to compare responses from multiple Large Language Models (LLMs) side by side, with automated analysis. It generates a comprehensive HTML report containing the models' answers to a given query, along with similarity metrics and visualizations.

Resources

License

BSD-3-Clause, MIT licenses found

Licenses found

BSD-3-Clause
LICENSE-BSD
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published