This data science project is an analysis of GitHub repositories, specifically of all Scala repositories, to track the most influential developers on the language history. This project was conducted using Python, including libraries like pandas, numpy and seaborn, for data cleaning, transformation and visualization.
The project consisted on importing a dataset in .csv format, of all Scala pulls requests, and cleaning that dataset to select the proper categories. From the dataset, I filtered large projects with individual commits. Then, unveil those users, by fitering their user names. Finally, I selected the most recent pull requests, and for those users, which ones did the most total contribution in Scala. The results indicated 2 users, xeno-by and soc, were resposible for over 50% of the largest Scala contributions, and are still involved in creating projects.