This repository contains a Jupyter Notebook that analyzes a dataset related to depression and builds a predictive model to classify individuals as depressed or not based on various features.
The dataset used in this analysis is depression.csv
. It contains several features that are used to predict the target variable depressed
. The features include:
age
: Age of the individualgender
: Gender of the individualeducation
: Education levelmarital_status
: Marital statusemployment_status
: Employment statusincome
: Income levelfamily_history
: Family history of mental illnessphysical_activity
: Level of physical activitysleep_hours
: Average sleep hours per nightstress_level
: Self-reported stress level
The target variable is:
depressed
: Indicates whether the individual is depressed (1) or not (0)
The notebook is structured as follows:
-
Data Loading and Exploration: The dataset is loaded, and initial exploratory data analysis (EDA) is performed to understand the distribution of features and the target variable.
-
Data Preprocessing: Steps include handling missing values, encoding categorical variables, and scaling numerical features to prepare the data for modeling.
-
Model Building: A logistic regression model is trained to predict the likelihood of depression based on the input features.
-
Model Evaluation: The performance of the model is evaluated using metrics such as accuracy, precision, recall, and the ROC-AUC score.
-
Conclusion: Insights from the analysis and model performance are summarized.
To run the notebook, ensure you have the following Python libraries installed:
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
You can install these packages using pip
:
pip install pandas numpy scikit-learn matplotlib seaborn
-
Clone the repository:
git clone https://github.com/Liantsoarandria0803/Health-mental-disease.git
-
Navigate to the project directory:
cd Health-mental-disease
-
Open the Jupyter Notebook:
jupyter notebook Depression.ipynb
-
Run the cells sequentially to perform the analysis and view the results.
Best model : catBoost with F1 score : 0.9621830111721936
Contributions are welcome! If you have suggestions for improvements or additional analyses, feel free to open an issue or submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.