In this project, I did an in-depth analysis and visualization of the titanic dataset which can be found on Kaggle. The goal of the project is to understand the relationship between characteristic features of titanic passengers and their survival rate, and train various classification models to predict the survival rate.
Predicted data was checked on Kaggle and models had the following scores (a score of 0.77 means 77% of the passengers were correctly predicted):
- Logistic Regression : 0.77
- K-Nearest Neighbors : 0.67
- Support Vector Machine : 0.77
- Decision Tree : 0.73
- Random Forest : 0.75
For data visualization, matplotlib and seaborn libraries are used to understand the impact of the following features on a passenger's survival rate:
- Age
- Sex
- Pclass (Ticket class)
- Port of Embarkation
- Number of family members on board (siblings & spouses)
NumPy and scikit-learn are used to train different classification models.