Skip to content

UofT-DSI/team_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Team Project

Contents

Overview

The Team Project will showcase your ability to deliver business value in a real-world context. This project will be a valuable asset in your portfolio, and you should be comfortable presenting it to prospective employers as a demonstration of your skillset.

In your assigned team of ~6, you will collaboratively create a program to analyze an open-sourced dataset. For example, your team might wish to examine the relationship between the length of movies and the ratings users give them. Or, you may wish to explore the relationship between the size of a dog breed and the associated genetic ailments of that breed. You are encouraged to pick a business case that interests you, is robust and flexible enough to practise your skills, and is well-suited for showcasing business impact.

The task in front of your team is deliberately open-ended. You will have to make many decisions together, such as:

  • How will you make sure all team members can contribute to the project?
  • How will you make decisions?
  • What is the question you're trying to answer through your data analysis?
  • What tasks need to be completed to get to your final output?

At the end of the project, all team members are encouraged to fork the repo onto their profiles so that prospective employers can view the project.

Learning Outcomes

By the end of this module, participants will be able to:

  1. Apply multiple technical skills in a cohesive project by integrating concepts from Python, Git, Shell, SQL, Linear Regression, Classification, Resampling, Production, Data Science Stream Content (Sampling and Visualization) or Machine Learning Software Foundation Stream Content (Algorithms, Data Structures, and Deep Learning).
  2. Develop a portfolio-ready project that demonstrates technical proficiency, problem-solving skills, and the ability to generate meaningful insights from data.
  3. Write high-quality, maintainable code by practicing structured coding, effective commenting, debugging, and refactoring in a team setting.
  4. Document and present project findings effectively by maintaining a clear README, summarizing key decisions, and delivering insights in a way that is accessible to both technical and non-technical audiences.
  5. Collaborate effectively in a team environment by using Git best practices (small commits, branches, pull requests), participating in stand-ups, and maintaining effective communication.

Getting Started

Key Contacts

Questions can be submitted to the 'help' channel on Slack.

Module Delivery & Expectations

The Team Project schedule will include some content delivery in the live learning sessions, however most of the session time (and additional work periods) will be used to work on your project collaboratively as a team. There will also be a case study presented during the second week, which is a great opportunity to see a real-life example of the type of project you are working on.

Although the project is not due until the end of the second week, it is important to plan out your work and stick to a schedule, so there will be milestones that you are expected to meet. The Technical Facilitator and Learning Support will check in with your team daily, and help guide you throughout the module. Constant communication with your team is crucial in short projects such as this!

Your project plan will be evaluated at the end of your first week, and your finished project will be evaluated at the end of the second week. The last work period will also be a Project Showcase, where you will have to opportunity to present your project to the DSI team and the rest of your cohort. This is a great opportunity to practice showcasing the value of your work, and we encourage you to record it and include the video in your portfolios!

Schedule

Day 1 Day 2 Day 3 Day 4 Day 5
Live Learning + Work Period Live Learning + Work Period Work Period Work Period Work Period
Day 6 Day 7 Day 8 Day 9 Day 10
Review + Work Period Work Period Case Study + Work Period Work Period Presentation + Video Submission

Instructions (How to Work on the Project)

How to Pick a Dataset

  1. You will be given access to a dataset bank. As a team, choose one of our carefully selected datasets (more info on this in the live learning session) and explore it, keeping the questions listed below in mind.

  2. A team member should create a new repository for the project, which the rest of the team can clone (see the Project Folder Structure for a folder structure template to use as a starting point). It doesn’t matter who creates the repository, as GitHub tracks everyone’s contributions fairly.

  3. Determine what roles the various team members will play on the team, which tasks need to be completed and assigned to which team members, and what your team standards will be with respect to code reviews, approvals, and merges.

  4. Have fun! This project is yours. This is the time to create something that prospective employers can consider when reviewing your application for a role, so be sure to clearly demonstrate the business value that your project could provide. What will your project tell them about you, your skills, and your ability to work effectively on a team?

Additionally, there are resources listed at the bottom of this page to help you understand Git conflicts and ways to work effectively as a team. You should review these to help set standards for your team processes.

Reviewing Your Dataset

Questions to Discuss When Reviewing Your Dataset

  • What are the key variables and attributes in your dataset?
  • How can we explore the relationships between different variables?
  • Are there any patterns or trends in the data that we can identify?
  • Who is the intended audience?
  • What is the question our analysis is trying to answer?
  • Are there any specific libraries or frameworks that are well-suited to our project requirements?

Data Visualization Guiding Questions

  • What are the main goals and objectives of our visualization project?
  • How can we tailor the visualization to effectively communicate with our audience?
  • What type of visualization best suits our data and objectives (e.g., bar chart, scatter plot, heatmap)?
  • How can we iterate on our design to address feedback and make iterative improvements?
  • What best practices can we follow to promote inclusivity and diversity in our visualization design?
  • How can we ensure that our visualization accurately represents the underlying data without misleading or misinterpreting information?
  • Are there any privacy concerns or sensitive information that need to be addressed in our visualization?

Machine Learning Model Guiding Questions

  • What are the specific objectives and success criteria for our machine learning model?
  • How can we select the most relevant features for training our machine learning model?
  • Are there any missing values or outliers that need to be addressed through preprocessing?
  • Which machine learning algorithms are suitable for our problem domain?
  • What techniques can we use to validate and tune the hyperparameters for our models?
  • How should we split the dataset into training, validation, and test sets?
  • Are there any ethical implications or biases associated with our machine learning model?
  • How can we document our machine learning pipeline and model architecture for future reference?

Folder Structure & Repo Setup

Each team is responsible for creating their own Git repository for the Team Project. Below is a suggested starting structure, but teams should adapt it as needed for their specific project. You should structure your project in a way that makes sense for your business case, ensure it is clean, and remove any unused files and folders.

├── data
├──── processed
├──── raw
├──── sql
├── experiments
├── models
├── reports
├── src
├── README.md
└── .gitignore
  • Data: Contains the raw, processed and final data. For any data living in a database, make sure to export the tables out into the sql folder, so it can be used by anyone else.
  • Experiments: A folder for experiments.
  • Models: A folder containing trained models or model predictions.
  • Reports: Generated HTML, PDF etc. of your report.
  • src: Project source code.
  • README: This file!
  • .gitignore: Files to exclude from this folder.

How to Work as a Team

  • Thoroughly understand your data and the business case for your analysis. What will the impact of your results be?
  • Clean your data. Be confident in the decisions you have made while doing so (e.g., default handling of NULL values).
  • Test out regression analyses and machine learning models/data visualizations. It may take several tries before you are satisfied with your results and understand how you can provide the most insight.
  • Make sure your code is well-commented and decisions are documented.
  • Use software to help keep track of your to-do list such as GitHub Projects.
  • Define roles and responsibilities (e.g., one person handles Git merges while another focuses on modeling).
  • Establish clear communication protocols (e.g., set up a dedicated Slack channel, schedule regular check-ins).
  • Keep your README up to date. Not only is that easier than writing it all at the end of your project, it will help keep you on track and ensure your alignment with your business objective.

Further Reading on Teamwork & Collaboration

For additional insights on effective teamwork, meetings, and collaboration, check out:

Submitting Your Project

Submission & Evaluation

Your team must update this document with the links to your project repository and the dataset that you have chosen to analyze. You will not be submitting a PR to the DSI repository to submit your project. We will be evaluating your repository directly.

After Week 1, you will be evaluated on your project's README file. By this point, it must include a detailed project proposal. This should include the business motivation for your project, the dataset you have chosen to use, and any risks or unknowns you have identified.

Your final project will be evaluated on the following criteria:

  1. Each team member must have created a pull request, and reviewed and merged a different pull request.

  2. (a) For Data Science teams, your project must include a visualization that presents new insights into the chosen dataset.

    (b) For Machine Learning teams, your project must include a machine learning model that you have developed and implemented to obtain new insights.

  3. In addition to the project proposal from Week 1, each project's README should describe the final outcome of the project, the key business takeaways, and describe your team's approach to working collaboratively. It should also demonstrate thoughtful consideration of the guiding questions.

  4. Each team member must record a 3-5 minute video reflecting on your experience. You may each choose where to host your own video, however it should be public and a link to each team member's video should be included in your project README. This video is meant to be an asset to your portfolio, and should be available for prospective employers. Your videos should answer the following questions:

    • What did you learn?
    • What challenges did you face?
    • How did you overcome those challenges?
    • If you had more time, what would you add?
    • What strengths do you bring to a team environment?

Project Showcase

  • Each team will have 5 minutes to present your project during the Project Showcase on March 22nd.

  • This is not a lot of time, so you should not try to describe every step of your project.

  • Instead, think of it as an "elevator pitch".

    • Assume that the audience is not an expert in your industry. Provide the required context.
    • Who are your intended stakeholders and why should they care about your project?
    • Explain your dataset in a way that your audience can understand.
    • BRIEFLY describe the tools/technologies that you used.
    • What are the key outcomes and takeaways? What is the business impact of your findings? Highlight one or two key visualizations or metrics.
    • If you had more time, what would your team have explored next?

Getting Help

Troubleshooting & FAQs

  1. Gather information about your problem

    • Copy and paste your error message
    • Copy and paste the code that caused the error, and the last few commands leading up to the error
    • Write down what you are trying to accomplish with your code. Include both the specific action, and the bigger picture and context
    • (optional) Take a screenshot of your entire workspace
  2. Try searching the web for your error message

    • Sometimes, the error has common solutions that can be easy to find!
      • This will be faster than waiting for an answer
    • If none of the solutions apply, consider asking a Generative AI tool
      • Paste your code, the error message, and a description of your overall goals
  3. Try asking in your cohort's Slack help channel

    • Since we're all working through the same material, there's a good chance one of your peers has encountered the same error, or has already solved it
    • Try searching in the DSI Certificates Slack help channel for whether a similar query has been posted
    • If the question has not yet been answered, post your question!
      • Describe your the overall goals, the context, and the specific details of what you were trying to accomplish
      • Make sure to copy and paste your code, your error message
      • Copying and pasting helps:
        1. Your peers and teaching team quickly try out your code
        2. Others to find your question in the future

Additional Resources

Other Resources

Git

Past Participant Projects