Authors: Cristian Porzio and Raffaele Monti
University of Salerno
1. Introduction In this project, we explore and compare the performance of three different machine learning models for classification tasks. The models investigated are Multinomial Naive Bayes, Complement Naive Bayes, and Decision Tree Classifier. Our aim is to identify the strengths and weaknesses of each model and determine which one performs best on our dataset. The full documentation is available in Italian here. The resulting plots are available here
2. Data Collection
- 2.1. Understanding and Identifying Necessary Data: We first identify the data required for our project, considering its relevance to our classification task.
- 2.2. Identified Datasets: We provide an overview of the datasets used in our analysis.
- 2.3. Data Exploration: We delve into the datasets, comparing them through graphical representations and drawing insights for further analysis.
3. Model Selection In this section, we introduce the three machine learning models under consideration:
- 3.1. Multinomial Naive Bayes
- 3.2. Complement Naive Bayes
- 3.3. Decision Tree Classifier
We discuss the characteristics and underlying assumptions of each model, setting the stage for our comparative analysis.
4. Data Manipulation and Execution
- 4.1. Preprocessing and Feature Selection: We outline the preprocessing steps undertaken to prepare the data for model training, including feature selection techniques.
- 4.2. Feature Extraction: Details on feature extraction methods employed in our analysis.
- 4.3. Pipeline: We describe the pipeline architecture utilized for streamlined model training and evaluation.
5. Evaluation and Comparative Analysis
- 5.1. Evaluation Metrics: Discussion on the evaluation metrics used to assess the performance of the models.
- 5.2. Results: We present the comparative analysis results, including model accuracy, precision, recall, and F1 score, among others.
6. Conclusions In the final section, we summarize our findings and draw conclusions regarding the effectiveness of each machine learning model for the given classification task. We discuss potential areas for further research and improvements in model performance.
This project provides valuable insights into the comparative analysis of machine learning models, aiding practitioners and researchers in selecting appropriate models for classification tasks.
- dataset from Shayan Gerami note:no license
- dataset from Zachary Grinberg