
Automatically analyze codebases to generate comprehensive threat models with attack vectors, data flow diagrams, and security recommendations.
- Overview
- Architecture
- Features
- Requirements
- Installation
- Usage
- Output Files
- Advanced Configuration
- Troubleshooting
- Contributing
ThreatMapAI is an advanced tool that leverages large language models (LLMs) to automatically analyze code repositories and generate comprehensive security threat models. Using a combination of static code analysis, Retrieval Augmented Generation (RAG), and graph-based visualization, it identifies potential security vulnerabilities, maps data flows across security boundaries, and provides detailed security recommendations.
The tool is designed to be language-agnostic and can analyze repositories of any size through efficient embedding-based RAG techniques.
graph TD
A[Repository Source] --> B[Repository Analyzer]
B --> C[Code Structure Analysis]
B --> D[Data Flow Mapping]
B --> E[Security Boundary Detection]
C --> F[Embedding Store]
D --> F
E --> F
F --> G[LLM Processor]
G --> H[Code Analysis]
G --> I[Threat Detection]
G --> J[Security Recommendations]
H --> K[Threat Model]
I --> K
J --> K
K --> L[Visualizer]
L --> M[Class Diagrams]
L --> N[Data Flow Diagrams]
L --> O[Threat Relationship Diagrams]
L --> P[HTML Security Report]
- Unlimited Repository Size Analysis: Process repositories of any size through efficient embedding-based RAG techniques
- Multi-Language Support: Python, JavaScript, TypeScript, Java, Go, PHP, and more
- Security Boundary Detection: Automatically identifies security domains and trust boundaries
- Cross-Boundary Data Flow Analysis: Identifies and analyzes data flows crossing security boundaries
- AI-Powered Vulnerability Detection: Uses LLMs to identify potential security vulnerabilities
- Static Code Analysis: Parses and analyzes code structure, dependencies, and patterns
- Data Flow Mapping: Traces how data moves through the application
- Security Control Detection: Identifies existing security controls and gaps
- Vulnerability Prioritization: Ranks vulnerabilities by severity and impact
- Interactive Mermaid Diagrams: Generates class structure, data flow, and threat relationship diagrams
- Comprehensive HTML Reports: Detailed security findings with recommendations
- Exportable Results: All analysis results available in structured JSON format
- Python 3.8+
- 16GB+ RAM recommended for larger repositories
- 10GB disk space (for model storage)
- GraphViz (for diagram generation)
- GPU support (optional but recommended for larger codebases)
# Clone the repository
git clone https://github.com/omaid/ThreatMapAI.git
cd ThreatMapAI
# Run the installer (automatically sets up a virtual environment)
chmod +x setup.sh
./setup.sh
This will:
- Create a virtual environment in the
venv
directory - Install all dependencies within the virtual environment
- Download the required model files
- Set up the configuration
If you prefer to set up manually:
- Create and activate a virtual environment:
# Create a virtual environment
python3 -m venv venv
# Activate it (on macOS/Linux)
source venv/bin/activate
# OR on Windows
# venv\Scripts\activate
- Install the required Python libraries:
pip install -r requirements.txt
- Initialize the environment and download required models:
python -m cli init
Some models require authentication with a Hugging Face token:
- Create a free account at Hugging Face
- Generate a token at https://huggingface.co/settings/tokens
- Set your token:
python -m cli set_token
The primary way to interact with ThreatMapAI is through its command-line interface:
# Initialize environment and download required models
python -m cli init
# Analyze a GitHub repository
python -m cli analyze https://github.com/username/repo
# Analyze a local repository
python -m cli analyze /path/to/local/repo --local
# Generate visualizations only
python -m cli visualize --output-dir output
# View diagrams in browser
python -m cli view
ThreatMapAI can also be used as an API service:
# Start the API server
python main.py
The API server provides endpoints for:
- POST
/analyze
- Analyze a repository - GET
/health
- Health check
from repository_analyzer.analyzer import RepositoryAnalyzer
from repository_analyzer.embedding_store import EmbeddingStore
from llm_processor.processor import LLMProcessor
from visualizer.visualizer import ThreatModelVisualizer
# Initialize components
embedding_store = EmbeddingStore()
analyzer = RepositoryAnalyzer(repo_path="temp_repo", embedding_store=embedding_store)
llm_processor = LLMProcessor(embedding_store)
visualizer = ThreatModelVisualizer()
# Clone and analyze repository
analyzer.clone_repository("https://github.com/username/repo")
analysis_results = analyzer.analyze_code()
# Generate threat model
threat_model = llm_processor.generate_threat_model(analysis_results)
# Generate visualizations and report
report_path = visualizer.generate_report(threat_model)
print(f"Report generated at: {report_path}")
All output is saved to the output
directory (configurable):
analysis_results.json
: Raw analysis datathreat_model.json
: Generated threat modelclass_diagram.mmd
: Class structure diagramflow_diagram.mmd
: Data flow diagramthreat_diagram.mmd
: Threat relationship diagramthreat_analysis_report.html
: Comprehensive HTML report
There are several ways to view the generated Mermaid diagrams:
-
Using the built-in viewer:
python -m cli view
-
Using Mermaid Live Editor:
- Copy diagram content from .mmd files
- Paste into Mermaid Live Editor
-
Using GitHub:
- GitHub natively supports Mermaid in Markdown files
-
Using VS Code:
- Install Mermaid Preview Extension
ThreatMapAI can utilize GPU acceleration for faster processing:
# Configure GPU settings
python -m cli configure_gpu
# Check GPU information
python -m cli gpu_info --detailed
# Run benchmark
python -m cli gpu_info --benchmark
You can select different models for analysis:
# List available models
python -m cli select_model --list
# Select a specific model
python -m cli select_model CodeLlama-2-7b-Instruct --variant Q4_0 --download
Create a .env
file to configure:
MODEL_PATH=/path/to/model
OUTPUT_DIR=output
HF_TOKEN=your_huggingface_token
-
Dependency conflicts:
- Try creating a fresh virtual environment
- Install dependencies one by one:
pip install -r requirements.txt --no-deps
- Then resolve missing dependencies:
pip install -r requirements.txt
-
tree-sitter installation fails:
- Install tree-sitter-languages package explicitly:
pip install tree-sitter-languages
- Make sure you have a C compiler installed on your system
- Install tree-sitter-languages package explicitly:
-
Model download fails:
- Hugging Face may require authentication - use
python -m cli set_token
to set up your token - Ensure enough disk space (10GB+ free)
- Hugging Face may require authentication - use
-
Out of memory:
- Use RAG approach:
python -m cli analyze https://github.com/username/repo --memory-limit 4.0
- Close other memory-intensive applications
- Ensure you have at least 16GB of RAM for large codebases
- Use RAG approach:
-
GPU issues:
- Force CPU mode if GPU is causing problems:
python -m cli analyze URL --force-cpu
- Check GPU support:
python -m cli gpu_info
- Force CPU mode if GPU is causing problems:
-
Analysis taking too long:
- Check your system resources
- Try analyzing a smaller subset of the codebase
Method 2: Set in your environment
# Add to your shell profile (.bashrc, .zshrc, etc.) export HF_TOKEN=your_token_here # OR set just for the current session export HF_TOKEN=your_token_here
Method 3: Manually add to .env file
# Create or edit the .env file in your project directory echo "HF_TOKEN=your_token_here" >> .env
-
Verify your token is set
# Run the initialization which will verify token access python -m cli init
If you see authentication errors even after setting your token, try:
- Ensure you copied the full token without extra spaces
- Restart your terminal if you set it in your environment
- Run
python -m cli set_token
to save it in your .env file