-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3f34e04
commit ac7c6fd
Showing
1 changed file
with
98 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,60 +1,127 @@ | ||
|
||
# Fraud Detection for E-commerce and Banking | ||
|
||
This project utilizes machine learning to detect fraudulent activity in e-commerce and banking transactions. The model facilitates data-driven decisions for enhanced security and risk management. | ||
This project leverages machine learning to detect fraudulent transactions in e-commerce and banking, aiding in proactive security and risk management. The goal is to provide a robust fraud detection pipeline with explainability, deployment, and dashboard visualization for actionable insights. | ||
|
||
--- | ||
|
||
## Project Overview | ||
|
||
### Key Features | ||
- **Data Analysis & Preprocessing**: Handling missing values, data cleaning, and feature engineering for fraud detection. | ||
- **Model Building & Training**: Comparison of multiple models, including deep learning architectures (CNN, RNN, LSTM). | ||
- **Explainability**: Interpretation using SHAP and LIME for feature influence insights. | ||
- **Deployment**: API service for real-time fraud predictions via Flask, Dockerized for scalability. | ||
- **Dashboard**: Interactive visualization of fraud insights using Dash. | ||
|
||
--- | ||
|
||
## Project Directory Structure | ||
|
||
The repository is well-organized for efficient development: | ||
The repository is organized as follows: | ||
|
||
* **`.github/workflows/`**: Automates tasks like testing through GitHub Actions. | ||
* **`.vscode/`**: Enhances the development experience with configurations for Visual Studio Code. | ||
* **`app/`**: Contains the API implementation for interacting with the machine learning model via RESTful endpoints. | ||
* **`notebooks/`**: Jupyter notebooks are used for exploring data, feature engineering, and initial model exploration. | ||
* **`scripts/`**: Python scripts handle data preprocessing, feature extraction, visualization, and model implementation. | ||
* **`tests/`**: Unit tests ensure the model and data processing logic function correctly. | ||
- **`.github/workflows/`**: Contains GitHub Actions for CI/CD and automated testing. | ||
- **`.vscode/`**: Development configuration for Visual Studio Code. | ||
- **`fraud-detection-api/`**: REST API implementation for serving fraud detection models. | ||
- **`fraud-dashboard/`**: Dash application for real-time fraud data visualization. | ||
- **`notebooks/`**: Jupyter notebooks for data exploration, feature engineering, and model prototyping. | ||
- **`scripts/`**: Scripts for data preprocessing, visualization, and model building. | ||
- **`tests/`**: Unit tests for model integrity and data processing functions. | ||
|
||
--- | ||
|
||
## Installation | ||
|
||
- To run the project locally, follow these steps: | ||
Follow these steps to set up and run the project locally: | ||
|
||
1. **Clone the Repository:** | ||
1. **Clone the Repository** | ||
|
||
```bash | ||
git clone https://github.com/epythonlab/fraud-detection.git | ||
cd fraud-detection | ||
``` | ||
|
||
2. **Set Up the Virtual Environment** | ||
2. **Set Up a Virtual Environment** | ||
|
||
- Create a virtual environment to manage the project's dependencies: | ||
**For Linux/MacOS:** | ||
```bash | ||
python3 -m venv .venv | ||
source .venv/bin/activate | ||
``` | ||
|
||
**For Linux/MacOS:** | ||
**For Windows:** | ||
```bash | ||
python -m venv .venv | ||
.venv\Scripts\activate | ||
``` | ||
|
||
```bash | ||
python3 -m venv .venv | ||
source .venv/bin/activate | ||
``` | ||
3. **Install Required Packages** | ||
|
||
**For Windows:** | ||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
|
||
--- | ||
|
||
## Project Tasks and Workflow | ||
|
||
### Task 1 - Data Analysis and Preprocessing | ||
- **Handling Missing Values**: Imputation or removal of missing data. | ||
- **Data Cleaning**: Removing duplicates and correcting data types. | ||
- **Exploratory Data Analysis (EDA)**: | ||
- Univariate and bivariate analysis. | ||
- **Geolocation Analysis**: | ||
- Convert IP addresses to integers. | ||
- Merge `Fraud_Data.csv` with `IpAddress_to_Country.csv`. | ||
- **Feature Engineering**: | ||
- Transaction frequency and velocity. | ||
- Time-based features (hour of day, day of week). | ||
- **Normalization and Scaling** | ||
- **Encoding Categorical Features** | ||
|
||
### Task 2 - Model Building and Training | ||
- **Data Preparation**: Feature and target separation, and train-test split. | ||
- **Model Selection**: | ||
- Classical models: Logistic Regression, Decision Tree, Random Forest. | ||
- Advanced models: Gradient Boosting, MLP, CNN, RNN, LSTM. | ||
- **Model Training and Evaluation**: | ||
- Train on both `creditcard` and `Fraud_Data` datasets. | ||
- **MLOps**: | ||
- Use MLflow for versioning, experiment tracking, and model comparison. | ||
|
||
### Task 3 - Model Explainability | ||
- **SHAP (SHapley Additive exPlanations)**: | ||
- Explain feature importance using SHAP summary, force, and dependence plots. | ||
- **LIME (Local Interpretable Model-agnostic Explanations)**: | ||
- Generate feature importance plots for individual predictions. | ||
|
||
### Task 4 - Model Deployment and API Development | ||
- **Setting Up the Flask API**: | ||
- Serve models via Flask in `serve_model.py`. | ||
- **Dockerization**: | ||
- Create a Docker container for the API with a `Dockerfile`. | ||
- Run the container with: | ||
```bash | ||
python -m venv .venv | ||
.venv\Scripts\activate | ||
docker build -t fraud-detection-model . | ||
docker run -p 5000:5000 fraud-detection-model | ||
``` | ||
- **Logging**: | ||
- Use Flask-Logging to monitor requests and track predictions. | ||
|
||
3. **Install Dependencies** | ||
### Task 5 - Dashboard Development with Flask and Dash | ||
- **Interactive Dashboard**: | ||
- Visualize fraud insights (transaction count, fraud cases, geographic data). | ||
- Use Dash to create charts (line, bar) and summary boxes for fraud trends. | ||
- Set up a Flask endpoint to serve fraud data for the Dash frontend. | ||
|
||
- Install the required Python packages by running: | ||
--- | ||
|
||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## Contributing | ||
|
||
We welcome contributions to improve the project. Please follow the steps below to contribute: | ||
We welcome contributions to enhance the project: | ||
|
||
1. Fork the repository and create a new branch. | ||
2. Make changes with clear, descriptive commit messages. | ||
3. Submit a pull request with a detailed explanation. | ||
|
||
--- | ||
|
||
- Fork the repository. | ||
- Create a new branch for your feature or bug fix. | ||
- Submit a pull request with a detailed explanation of your changes. |