In this project, we build a Linear Regression model to predict housing prices based on various features. The dataset provided contains information about houses in different regions of the United States, including average area income, house age, number of rooms, and more. The goal is to create a model that can estimate house prices given these features.
pandas
: For data manipulation and analysis.numpy
: For numerical computations.matplotlib
: For data visualization.seaborn
: For statistical data visualization.scikit-learn
: For building and evaluating the Linear Regression model.
The dataset used is USA_Housing.csv
and contains the following columns:
Avg. Area Income
: Average income of residents in the city.Avg. Area House Age
: Average age of houses in the city.Avg. Area Number of Rooms
: Average number of rooms in houses in the city.Avg. Area Number of Bedrooms
: Average number of bedrooms in houses in the city.Area Population
: Population of the city.Price
: Price that the house sold for.Address
: Address of the house (excluded from the model).
- Import Libraries: Load necessary libraries for data analysis and modeling.
- Load and Inspect Data: Read the
USA_Housing.csv
file and inspect the data. - Exploratory Data Analysis (EDA):
- Visualize relationships between features and the target variable (
Price
). - Analyze the distribution of data and correlation between features.
- Visualize relationships between features and the target variable (
- Data Preparation:
- Prepare features (
X
) and target variable (y
). - Split the data into training and test sets.
- Prepare features (
- Model Training:
- Train the Linear Regression model using the training data.
- Model Evaluation:
- Evaluate the model by checking its coefficients.
- Make predictions and evaluate the model using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE).
- Coefficients for each feature indicate how each feature affects the price of the house.
- MAE, MSE, and RMSE provide insights into the model's performance and prediction accuracy.
The Linear Regression model provides a useful tool for estimating housing prices based on various features. By understanding the coefficients and evaluating the model's performance, we can make informed predictions and insights about housing prices in different regions.
pandas
numpy
matplotlib
seaborn
scikit-learn
-
Clone the repository:
git clone <repository-url>