A powerful tool for analyzing, documenting, and visualizing SQL codebase structure and dependencies. This project combines modern language models with vector storage to provide comprehensive insights into SQL code architecture.
- Intelligent SQL Parsing: Automatically breaks down SQL files into logical chunks (packages, procedures, functions)
- Dependency Analysis: Identifies and visualizes relationships between different SQL objects
- Vector-Based Storage: Uses ChromaDB for efficient storage and retrieval of code chunks
- LLM-Powered Analysis: Leverages language models to provide detailed code analysis and insights
- Interactive Documentation: Generates comprehensive HTML documentation with interactive components
- Streamlit Interface: User-friendly web interface for uploading and analyzing SQL files
- **Monolithic helps to generate a SQL file which can then be used for this project
- **Output is created as a HTML , sample shown sql_documentation.html
The project consists of several key components:
sqldataeng.py
: SQL parsing and chunk extractionvectorstore.py
: Vector storage implementation using ChromaDBllmprocessor.py
: Language model integration for code analysisdocgenerator.py
: Documentation generation and formattingmain.py
: Streamlit web interface
- Python 3.8+
- CUDA-capable GPU (optional, for faster processing)
- Clone the repository:
git clone https://github.com/ramcovasu/monolithic.git
cd monolithic
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Start the Streamlit application:
streamlit run main.py --server.fileWatcherType none
-
Open your browser and navigate to
http://localhost:8501
-
Upload your SQL file through the web interface
-
Follow the step-by-step process:
- Parse SQL code
- Process and store chunks
- Generate analysis
- View and download documentation
- Intelligent package and procedure detection
- Accurate dependency tracking
- Support for complex SQL structures
- Efficient code chunk storage
- Semantic similarity search
- Dependency graph construction
- Comprehensive HTML reports
- Interactive visualizations
- Detailed code analysis
- Dependency diagrams
- Uses BAAI/bge-small-en-v1.5 for embeddings
- Supports GPU acceleration when available
- Efficient batch processing
- ChromaDB for persistent storage
- Optimized for code similarity search
- Efficient metadata handling
- Local LLM support via LM Studio
- Batched processing for large codebases
- Error handling and retry logic
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- ChromaDB for vector storage
- Sentence Transformers for embeddings
- Streamlit for the web interface
- SQLParse for SQL parsing
monolithic/
├── main.py # Streamlit application
├── sqldataeng.py # SQL parsing engine
├── vectorstore.py # Vector storage management
├── llmprocessor.py # LLM integration
├── docgenerator.py # Documentation generator
├── requirements.txt # Project dependencies
└── README.md # This file
- Support for additional SQL dialects
- Enhanced visualization options
- Code quality metrics
- Performance optimization suggestions
- Batch processing for multiple files
Create an issue in the repository for bug reports, feature requests, or general questions.