Skip to content

An end-to-end machine learning project to predict customer churn in a banking dataset. Includes data cleaning, exploratory analysis, feature engineering, logistic regression modeling, model evaluation with ROC AUC, and insights on customer retention strategies.

Notifications You must be signed in to change notification settings

bhandeystruck/Bank-Customer-Churn-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏦 Bank Customer Churn Prediction

This is an end-to-end machine learning project to predict customer churn using a real-world inspired bank dataset. The project focuses on understanding why customers leave the bank and building an interpretable logistic regression model to identify customers at risk of churning.


📌 Project Highlights

  • 🔍 Exploratory Data Analysis (EDA)
  • 🧼 Data Cleaning & Outlier Detection
  • 🧠 Feature Engineering (Engagement score, loyalty ratio, etc.)
  • 🚩 Outlier & Underbanked Flags
  • 📊 Logistic Regression Model with Balanced Class Weights
  • 📈 Model Evaluation: Precision, Recall, F1, ROC AUC
  • 💬 Business Insights and Churn Strategy Recommendations

📂 Dataset Overview

The dataset includes the following features:

  • Demographics: age, gender, country
  • Bank interaction: tenure, balance, estimated_salary
  • Product usage: num_of_products, has_credit_card, is_active_member
  • Engineered features: engagement_score, balance_salary_ratio, loyalty_score, etc.
  • Target: churn (0 = stayed, 1 = churned)

🧠 Feature Engineering

Created several insightful features such as:

  • engagement_score = activity + credit card usage + product count
  • loyalty_score = tenure / (age + 1)
  • balance_salary_ratio = balance / (estimated_salary + 1)
  • Flags for high-value customers, underbanked users, and low credit score holders

🤖 Model: Logistic Regression

  • Trained with and without class balancing
  • Balanced model improved recall for churners from 31% ➜ 73%
  • ROC AUC = 0.81 (strong discriminatory power)

📊 Evaluation

  • Confusion matrix and classification report used to assess performance
  • Feature importances extracted for interpretability
  • Model and scaler saved for future predictions

🔮 What's Next

  • Train Random Forest and XGBoost for performance comparison
  • Visualize SHAP values for model transparency
  • Build a Streamlit app or dashboard
  • Generate customer churn probability scores and prioritize high-risk users

🚀 Run Locally

  1. Clone the repo
git clone https://github.com/bhandeystruck/bank-churn-prediction.git
  1. Install requirements
pip install -r requirements.txt
  1. Run the notebook or use the saved model for predictions

📬 Contact

Created by Aditya Bhandari · bhandeystruck@gmail.com


📄 License

This project is open source and available under the MIT License.

About

An end-to-end machine learning project to predict customer churn in a banking dataset. Includes data cleaning, exploratory analysis, feature engineering, logistic regression modeling, model evaluation with ROC AUC, and insights on customer retention strategies.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published