Skip to content

Commit

Permalink
Merge pull request #360 from harvard-edge/358-typos-in-data_engineeri…
Browse files Browse the repository at this point in the history
…ngqmd

358 Typos in "data_engineering.qmd"
  • Loading branch information
profvjreddi authored Aug 17, 2024
2 parents 9227c7a + 09377e8 commit 0e1d626
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions contents/data_engineering/data_engineering.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ Data processing refers to the steps involved in transforming raw data into a for

![Data scientists' tasks breakdown by time spent. Source: [Forbes.](https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/?sh=20c55a266f63)](images/jpg/data_engineering_features.jpg){#fig-data-engineering}

Proper data cleaning is a crucial step that directly impacts model performance. Real-world data is often dirty, containing errors, missing values, noise, anomalies, and inconsistencies. Data cleaning involves detecting and fixing these issues to prepare high-quality data for modeling. By carefully selecting appropriate techniques, data scientists can improve model accuracy, reduce overfitting, and enable algorithms to learn more robust patterns. Overall, thoughtful data processing allows machine learning systems to uncover insights better and make predictions from real-world data.
Proper data cleaning is a crucial step that directly impacts model performance. Real-world data is often dirty, containing errors, missing values, noise, anomalies, and inconsistencies. Data cleaning involves detecting and fixing these issues to prepare high-quality data for modeling. By carefully selecting appropriate techniques, data scientists can improve model accuracy, reduce overfitting, and train algorithms to learn more robust patterns. Overall, thoughtful data processing allows machine learning systems to uncover insights better and make predictions from real-world data.

Data often comes from diverse sources and can be unstructured or semi-structured. Thus, processing and standardizing it is essential, ensuring it adheres to a uniform format. Such transformations may include:

Expand Down Expand Up @@ -506,13 +506,13 @@ These slides are a valuable tool for instructors to deliver lectures and for stu
* [Responsible Data Collection.](https://docs.google.com/presentation/d/1vcmuhLVNFT2asKSCSGh_Ix9ht0mJZxMii8MufEMQhFA/edit?resourcekey=0-_pYLcW5aF3p3Bvud0PPQNg#slide=id.ga4ca29c69e_0_195)

* Data Anomaly Detection:
* [Anamoly Detection: Overview.](https://docs.google.com/presentation/d/1R8A_5zKDZDZOdAb1XF9ovIOUTLWSIuFWDs20-avtxbM/edit?resourcekey=0-pklEaPv8PmLQ3ZzRYgRNxw#slide=id.g94db9f9f78_0_2)
* [Anomaly Detection: Overview.](https://docs.google.com/presentation/d/1R8A_5zKDZDZOdAb1XF9ovIOUTLWSIuFWDs20-avtxbM/edit?resourcekey=0-pklEaPv8PmLQ3ZzRYgRNxw#slide=id.g94db9f9f78_0_2)

* [Anamoly Detection: Challenges.](https://docs.google.com/presentation/d/1JZxx2kLaO1a8O6z6rRVFpK0DN-8VMkaSrNnmk_VGbI4/edit#slide=id.g53eb988857_0_91)
* [Anomaly Detection: Challenges.](https://docs.google.com/presentation/d/1JZxx2kLaO1a8O6z6rRVFpK0DN-8VMkaSrNnmk_VGbI4/edit#slide=id.g53eb988857_0_91)

* [Anamoly Detection: Datasets.](https://docs.google.com/presentation/d/1wPDhp4RxVrOonp6pU0Capk0LWXZOGZ3x9BzW_VjpTQw/edit?resourcekey=0-y6wKAnuxrLWqhleq9ruLOA#slide=id.g53eb988857_0_91)
* [Anomaly Detection: Datasets.](https://docs.google.com/presentation/d/1wPDhp4RxVrOonp6pU0Capk0LWXZOGZ3x9BzW_VjpTQw/edit?resourcekey=0-y6wKAnuxrLWqhleq9ruLOA#slide=id.g53eb988857_0_91)

* [Anamoly Detection: using Autoencoders.](https://docs.google.com/presentation/d/1Q4h7XrayNRIP0r52Hlk5VjxRcli-GY2xmyZ53nCd6CI/edit#slide=id.g53eb988857_0_91)
* [Anomaly Detection: using Autoencoders.](https://docs.google.com/presentation/d/1Q4h7XrayNRIP0r52Hlk5VjxRcli-GY2xmyZ53nCd6CI/edit#slide=id.g53eb988857_0_91)

:::

Expand Down

0 comments on commit 0e1d626

Please sign in to comment.