Skip to content

US Accident Exploratory Data Analysis using a Kaggle dataset

Notifications You must be signed in to change notification settings

aadityasikder/us_accident_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

US Accident Analysis

Overview

This project utilizes a dataset sourced from Kaggle to analyze and derive insights from accidents that occur in the United States. The dataset contains detailed information about accidents, such as their location, severity, weather conditions, time of occurrence, and more. Through data analysis and visualization, this project aims to uncover patterns, trends, and factors contributing to accidents in the US.

Dataset

The dataset used for this analysis can be found on Kaggle: US Accidents (3.0 million records) - A Countrywide Traffic Accident Dataset. It comprises millions of records collected over several years across various states in the US.

Analysis

The analysis involves several steps:

  1. Data Preprocessing: Cleaning the dataset, handling missing values, and converting data types as necessary.

  2. Exploratory Data Analysis (EDA): Exploring the dataset to understand its structure, distributions, and relationships between different variables. This step involves creating visualizations such as histograms, scatter plots, and heatmaps to gain insights.

  3. Feature Engineering: Extracting additional features from the dataset that might be useful for analysis, such as extracting the day of the week or time of day from the timestamp.

  4. Insights and Visualization: Summarizing key findings and insights derived from the analysis. This could include identifying high-risk areas, common causes of accidents, factors affecting severity, and more. Visualizations such as maps, bar charts, and pie charts can help communicate these insights effectively.

Tools Used

  • Python: Pandas, NumPy, Matplotlib, Seaborn for data manipulation, analysis, and visualization.
  • Jupyter Notebook: For interactive data exploration and analysis.
  • Machine Learning Libraries: Scikit-learn for building predictive models.

Conclusion

Through this analysis, we aim to provide a better understanding of accident patterns in the US and identify potential areas for improvement in road safety measures. By leveraging data-driven insights, stakeholders such as transportation authorities, city planners, and drivers can make informed decisions to reduce the frequency and severity of accidents on US roads.

Releases

No releases published

Packages

No packages published