This project is centered around exploring the application of machine learning techniques to predict human wellbeing, comparing traditional econometric methods, such as Ordinary Least Squares (OLS), with modern algorithms like Least Absolute Shrinkage and Selection Operator (LASSO), Random Forests (RF), and Gradient Boosting (GB). Inspired by the work of Oparina et al. (2023), the aim is to assess the effectiveness of various models in predicting human wellbeing.
The dataset for this project is sourced from the German Socio-Economic Panel (SOEP), covering the years 2010 to 2018. This timeframe aligns with the original paper, and flexibility is maintained to consider other years as long as data remains available.
Focus will be placed on the "restricted set" mentioned in the paper, including variables such as Age, Area of Residence, BMI, Disability Status, Education, Labour-force status, Log HH income, Ethnicity/Migration Background, Health, Housing Status, Marital Status, Month of Interview, Number of children in HH, Number of people in HH, Religion, Sex, and Working Hours. Categorical data will be transformed into sets of dummy variables for analysis.
a) Generate descriptive statistics for the variables of interest. b) Utilize the four algorithms to regress life satisfaction on the variables of interest. c) Compute performance metrics as R². d) Compare performance metrics across the different models.
Produce figures akin to those presented in Oparina et al. (2023), encompassing model performance, performance improvement through the use of machine learning, variable importance, and wellbeing patterns concerning age and income.
a) Consider expanding the dataset by including other years for a more comprehensive analysis. b) Explore and apply additional machine learning algorithms beyond the ones mentioned in the original paper. c) Compare the performance of these new algorithms with those previously examined. As a considerable amount of time was spent on cleaning the data and selecting relevant variables, the additional part was disconsidered for this project and only the replication of the paper was maintained (the original code for this project was not available).
Oparina, E., Kaiser, C., Gentile, N., Tkatchenko, A., Clark, A. E., De Neve, J. E., & D'Ambrosio, C. (2023). Machine Learning in the Prediction of Human Wellbeing. Working Paper see.
To get started, create and activate the environment with
$ conda/mamba env create
$ conda activate wellbeing
To build the project, type
$ pytask
The dataset is privately owned.