This project leverages machine learning to detect fraudulent transactions in e-commerce and banking, aiding in proactive security and risk management. The goal is to provide a robust fraud detection pipeline with explainability, deployment, and dashboard visualization for actionable insights.
- Data Analysis & Preprocessing: Handling missing values, data cleaning, and feature engineering for fraud detection.
- Model Building & Training: Comparison of multiple models, including deep learning architectures (CNN, RNN, LSTM).
- Explainability: Interpretation using SHAP and LIME for feature influence insights.
- Deployment: API service for real-time fraud predictions via Flask, Dockerized for scalability.
- Dashboard: Interactive visualization of fraud insights using Dash.
The repository is organized as follows:
.github/workflows/
: Contains GitHub Actions for CI/CD and automated testing..vscode/
: Development configuration for Visual Studio Code.fraud-detection-api/
: REST API implementation for serving fraud detection models.fraud-dashboard/
: Dash application for real-time fraud data visualization.notebooks/
: Jupyter notebooks for data exploration, feature engineering, and model prototyping.scripts/
: Scripts for data preprocessing, visualization, and model building.tests/
: Unit tests for model integrity and data processing functions.
Follow these steps to set up and run the project locally:
-
Clone the Repository
git clone https://github.com/epythonlab/fraud-detection.git cd fraud-detection
-
Set Up a Virtual Environment
For Linux/MacOS:
python3 -m venv .venv source .venv/bin/activate
For Windows:
python -m venv .venv .venv\Scripts\activate
-
Install Required Packages
pip install -r requirements.txt
- Handling Missing Values: Imputation or removal of missing data.
- Data Cleaning: Removing duplicates and correcting data types.
- Exploratory Data Analysis (EDA):
- Univariate and bivariate analysis.
- Geolocation Analysis:
- Convert IP addresses to integers.
- Merge
Fraud_Data.csv
withIpAddress_to_Country.csv
.
- Feature Engineering:
- Transaction frequency and velocity.
- Time-based features (hour of day, day of week).
- Normalization and Scaling
- Encoding Categorical Features
- Data Preparation: Feature and target separation, and train-test split.
- Model Selection:
- Classical models: Logistic Regression, Decision Tree, Random Forest.
- Advanced models: Gradient Boosting, MLP, CNN, RNN, LSTM.
- Model Training and Evaluation:
- Train on both
creditcard
andFraud_Data
datasets.
- Train on both
- MLOps:
- Use MLflow for versioning, experiment tracking, and model comparison.
- SHAP (SHapley Additive exPlanations):
- Explain feature importance using SHAP summary, force, and dependence plots.
- LIME (Local Interpretable Model-agnostic Explanations):
- Generate feature importance plots for individual predictions.
- Setting Up the Flask API:
- Serve models via Flask in
serve_model.py
.
- Serve models via Flask in
- Dockerization:
- Create a Docker container for the API with a
Dockerfile
. - Run the container with:
docker build -t fraud-detection-model . docker run -p 5000:5000 fraud-detection-model
- Create a Docker container for the API with a
- Logging:
- Use Flask-Logging to monitor requests and track predictions.
- Interactive Dashboard:
We welcome contributions to enhance the project:
- Fork the repository and create a new branch.
- Make changes with clear, descriptive commit messages.
- Submit a pull request with a detailed explanation.