Wine has been a popular drink for thousands of years and continues to be a highly sought after drink even today. There are many different factors for each wine besides the alcohol that determine how good it is, and by looking at these factors, wine quality can be better predicted. The dataset for this proposal is from Kaggle and includes a result of a red wine quality score based upon 11 other physicochemical properties . In a study conducted by Gambetta, Cozzolino, Bastian, and Jeffrey (2016), they find a correlation between chardonnay juice and the compositions of wine from different regions of Europe. Additionally, in a study done by Sinton, Ough, Kissler, and Kasimatis (1978), there is correlation found between crop level and other factors such as intensity ratings.
This project is designed for people who are beginners in wine tasting. Wine can get expensive, and there is a lot information that goes into picking a good quality. Beginners can get confused easily in choosing good quality wines versus bad quality ones. We hope to quell this confusion by having a place for beginners to choose a high rated wine based on certain factors like manufacturing, year, acidity, etc. Also, by having this project, beginners can save their money by not wasting it on bad wine. We have chosen to build a supervised learning model for the classification of red wines by their quality. We propose to use either a decision tree model or a random forest model to do the same. Though a random forest model is computationally more expensive, it promises better accuracy when compared to a single decision tree. Depending on the size and complexity of the dataset, we will choose one of the methods above.
As interest in wine has increased, the wine industry has grown tremendously. Because of this, wine quality and quality certification have become a priority. We will rate the wine preferences on a continuous scale from 0 to 10, where 0 is very bad wine and 10 is excellent wine. We hope to find the wines that are highest in quality based on qualities such as fixed acidity and relative sugar. Using the raw data, we hope to find the greatest accuracy for the most common type of wine, vinho verde. The implications of this project support the wine industry where most quality testing is done by humans, who can be subjective when giving quality results. We hope to aid the wine quality control process through accurate results from our project.