Project: Data Mining
Topic: Data Mining Process for Heart Disease Patients' Data
Keywords: Data Mining, Exploratory Data Analysis (EDA), Machine Learning, Heart Disease, Python
- Use Data Mining approach to identify which patient has heart disease or not based on features like Age, Sex, ChestPainType, etc.
- The dataset can be obtained here from Kaggle.
- Python is used to assist this project with Data Mining by extracting important insights using:
- In the healthcare industry, understanding what factors or indicators affect a disease is an essential part of the decision-making and problem-solving process.
- People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model or statistical analyses can be of great help.
- These indicators allow decision-makers to identify any potential ways to reduce risk factors of future health and increase the likelihood of disease prevention effectively (Santos et al., 2019).
- Aim:
- To improve the process of analyzing patients’ heart disease in the healthcare industry to allow earlier detection and avoidance of heart disease and morbidity.
- Objective:
- To create and select the best Machine Learning model that classifies patients into those who will develop heart disease in the future and those who will not based on the importance of data variables and models evaluation and assessment (i.e. Accuracy, Recall, AUC, etc.).
- The insights gained by analyzing the feature importance of each data variable to the target data will aid in establishing which factor or indicator is critical in causing heart disease.
(1) HeartDisease_Dataset.csv
- Heart Disease dataset file in CSV format.
(2) HeartDisease_EDA-ML_Python Folder
- Contains the main Python notebook with implementation codes and explanations for the project.
- None (for now)
- Took inspiration from Kaggle