- Resampling Technique
- Ensemble Learning Technique
- The credit_risk_resampling.ipynb and credit_risk_ensemble.ipynb files (open in Jupyter Lab) for code details.
-
Simple Logistic Regression
Balanced Accuracy Score = 0.954321 -
Oversampling - Naive Random Oversampling
Balanced Accuracy Score = 0.994828 -
SMOTE Oversampling
Balanced Accuracy Score = 0.994828 -
Undersampling
Balanced Accuracy Score = 0.982881 -
Combination (SMOTEENN)
Balanced Accuracy Score = 0.994748
- Which model had the best balanced accuracy score?
Answer: Both the oversampling models resulted in the best balanced accuracy scores:- Naive Random Oversampling Score = 0.994828
- SMOTE Oversampling Score = 0.994828
- Which model had the best recall score?
Answer: They all had 0.99 rec scores. - Which model had the best geometric mean score?
Answer: Both the oversampling models and the combination model had 0.99 geo scores.
-
Balanced Random Forest Classifier Balanced Accuracy Score = 0.5
-
Easy Ensemble Classifier Balanced Accuracy Score = 0.926
-
Which model had the best balanced accuracy score?
Answer: The Easy Ensemble had the best balanced accuracy score at 0.926. -
Which model had the best recall score?
Answer: The Balanced Random Forest had the best recall score at 0.99 vs Easy Ensemble at 0.94. -
Which model had the best geometric mean score?
Answer: The Easy Ensemble had the best geo score at 0.93. -
What are the top three features?
Answer: The top three features are installment, dti, and loan_amnt as shown in the graph below.