Skip to content

Python and Data Science Source by Team-Ant, partnered with Venturenix Lab

Notifications You must be signed in to change notification settings

gawainchin/data_science_course

This branch is 6 commits behind anthonyhplo/data_science_course:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ad5b8e8 · Jul 24, 2021

History

55 Commits
Jun 29, 2019
Dec 21, 2020
Jul 6, 2019
May 26, 2019
May 26, 2019
Aug 3, 2019
Jun 29, 2019
Jul 24, 2021

Repository files navigation

Python and Data Science Course Repository

Course materials for weekly Python/Data science class in Hong Kong, partnered with Venturenix Lab since 2018

Instructor: Anthony Lo and Gawain Chin
Part 1 Part 2
Lesson 1: Python Basics Lesson 7: Introduction to Data Science
Lesson 2: Functions and Your First Application Lesson 8: Data Manipulation and Visualization
Lesson 3: Intensive Code Training Lesson 9: Black box machine learning
Lesson 4: Data Strucuture and Complexity Lesson 10: Linear models and gradient descent
Lesson 5: Web Scraping and OOP Lesson 11: Logistic Regression and SVM
Lesson 6: Data Manipulation Lesson 12: Model Evaluation and Regularization
Lesson 13: Ensemble Learning and Tree based Models
Lesson 14: Kaggle competition
Lesson 15: Clustering and Dimensionality Reduction
Lesson 16: Recommender System
Lesson 17: Natural Language Processing
Lesson 18: TBD

Lesson 1: Python Basics

  • Setting up Python Environment
  • Data Type: Integer, Floats, Booleans, String
  • Variable assignments
  • Type Conversion
  • Operators: Arithmetic, Comparison, Logical, Bitwise
  • Control Flows: If-elif-else, While Loop, For loop
Homework:
  • Setting up your python environment if you have any set up issues during the lession
  • Join the class slack
  • Complete the L1 Homework before next lession
Resources

Lesson 2: Your First Applications

  • Functions: Input arguments, function return
  • Local Variable vs Global Variable
  • Classwork: Write a game (Refer to class notes)
Homework:
Resources

Lesson 3: Intensive Code Training

  • Introduction of Github
  • Review Python Basics and functions
  • Review Game 2 Homework
Homework:
  • Complete L3 Homework
  • Create your own github account and explore the open source world
  • ⭐ Star this Data Science repo to get the latest materials!
Resources

Lesson 4: Data Strucuture and Complexity

  • Data structures (List, Set, Dictionary, Tuple)
  • Mutable vs Immutable
  • Understanding time complexity and space complexity
Homework:
Resources

Lesson 5: Web Scraping and OPP

  • Web Scraping overview
  • Python Web Scraping tool: request and beautiful soup
  • Classwork: Hands-on crawling excerise
Homework:
  • Web scraping homework
Resources

Lesson 6: Data Manipulation

  • Web Scraping II
  • Introduction to Python Class Objects
  • Pandas Basics with Case study
Homework:
  • Flight Delay Dataset: Create your own tables with Pandas
Resources

Lesson 7: Introduction to Data Science

  • What is Data Science?
  • Essential Skills of Data Scientist
  • Foundation of Probability
  • Permutation vs Combination
Homework:
Resources

Lesson 8: Data Manipulation and Visualization

  • Case Study: Titanic Dataset
  • Understand Machine Learning Workflow
  • First EDA Training
  • Visualization: Matplotlib, Seaborn
Homework:
  • One Hot encoding on Variables
Resources

Lesson 9: Black box machine learning

  • Your First machine learning experience
  • EDA on Advertising Dataset
  • Understand the X and Y Relationship
Homework:
  • First Linear Regression with Scikit-Learn Model Training
Resources

Lesson 10: Linear models and gradient descent

  • Build Linear Regression from Scratch
  • Learn the theory behind gradient descent
Resources

Lesson 11: Logistic Regression and SVM

  • Learn the concept behind the logistic regression and its cost function
  • Understand different types of Classification model and the difference from Linear Regression
Resources

Lesson 12: Model Evaluation and Regularization

  • Model Evaluation Techniques: Training Set, Validation Set, Test Set
  • Understand the concept: Overfitting and Underfitting
  • Classification Metrics: Accuracy, Confusion Matrics, F1 Score, True Positive, False Positive, True Negative, False Negative
  • Regularization Concepts: Ridge and Lasso (L1 and L2)
Homework:
Resources

Lesson 13: Ensemble Learning and Tree based Models

  • Introduction of Tree-Based Model: Decision Tree
  • Tree Construction Concept: Edge and Node, Splitting Concepts
  • Ensemble Learning: Bagging and Boosting
Homework:
  • Revise the tree-based model and submit one kaggle competition by using tree base method
Resources

Lesson 14: Kaggle competition

  • Workshop lesson to work on Kaggle competition together
  • Understand end to end Machine Learning flow and apply to the kaggle competition
  • Use different algorithms to explore and review on model perforamance
Homework:
  • NA
Resources
  • NA

Lesson 15: Clustering and Dimensionality Reduction

  • Unsupervised Learning Concepts
  • Clustering Algorithm (e.g. K-Means)
  • Dimensionality reduction (e.g. PCA)
  • Case Study: Eigenface
Homework:
  • Self implementation of K-Means clustering
Resources

Lesson 16: Recommender System

  • Understand Recommendater System
  • Content-Base vs Collaborative Filtering
Homework:
  • Create movie profile by genres. each columns is 0/1 indicator for each genre
  • Use numpy to calculate the similarity matrix (m x m)
    • normalize each row by norm (A/|A| etc)
    • obtain similarity matrix by M dot Mt
  • write a function to get movie id and return the top K most similar movies with
    • min score, max score, min rating, min total rating, time range
Resources

Lesson 17: Natural Language Processing

  • Introduction to NLP
  • Tokenization, Tf-idf, Word Embedding
  • NLP Package Overview: NLTK
Homework:
  • NLP exercise and Recsys exercise
Resources

Lesson 18: TBD

Homework:
Resources

Past Kaggle by students

Python Resources


Cheat sheet


Data Visualization


Web Scraping


Basic Linear Algebra, Statistics and Calculus


Loss function

Supervised Learning

Linear Regression
  • Tutorial on Linear Regression: This blog describes the basic of linear regression.
Logistc Regression
  • Tutorial on Logistic Regression: This blog describes the basic of logistic regression.
  • CS229 notes of Logistic Regression, read p16 - p19
SVM
Decision Tree
Ensemble Learning
kNN

Unsupervised Learning

-Unsupervised Learning Overview

K-Means
Dimensionality Reduction
  • A very comprehensive study material on SVD/PCA.

Recommender System


Natural Language Processing


Reinforcement Learning


Deep Learning

About

Python and Data Science Source by Team-Ant, partnered with Venturenix Lab

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%