This project is a machine learning-based car price prediction system. It uses a cleaned and optimized dataset to train a regression model for predicting car prices based on user input. Additionally, the system provides recommendations for similar cars from the dataset. The project includes features such as dataset preprocessing, hyperparameter optimization, and a web interface for user interaction.
Dataset Cardekho used in this project. You can access from Kaggle
- Removing Null Values: Missing values are handled by dropping incomplete rows or filling them with appropriate values.
- Feature Engineering:
- Created new features like
car_age
from theyear
column. - Applied one-hot encoding for categorical columns such as
fuel
,transmission
,seller_type
, andbrand
.
- Created new features like
- Scaling and Normalization:
- Used
MinMaxScaler
for numerical features likemileage(km/ltr/kg)
andengine
.
- Used
- Outlier Removal:
- Implemented the IQR method to remove extreme values in the
selling_price
column.
- Implemented the IQR method to remove extreme values in the
- Applied TF-IDF vectorization to extract important features from car model names.
- Removed redundant columns for improved model training performance.
- Trained a Random Forest Regressor on the dataset for price prediction.
- Conducted a grid search using
GridSearchCV
with parameters like:n_estimators
max_depth
min_samples_split
min_samples_leaf
- Evaluated the model using the following metrics:
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- R² Score
- Explained Variance
- Best Parameters: Selected based on cross-validation R² scores.
- Feature importance was visualized to understand the contribution of each feature.
A bar chart visualizing the importance of features in predicting car prices.
Scatter plot showing the relationship between actual and predicted prices.
A learning curve was plotted to show the model's performance on the training and validation sets. It highlights the relationship between the number of training samples and model accuracy.
Histogram depicting the distribution of prediction errors.
- The project includes a Flask-based web interface.
- Users can input car details to get:
- Predicted price for their car.
- Recommendations for similar cars.
- A suggestion system identifies the top 5 most similar cars based on:
- Euclidean distances between user input and dataset features.
- Provides additional insights by comparing predicted prices with actual dataset prices.
- Incorporate additional features like car condition and regional pricing variations.
- Enhance the recommendation algorithm with advanced similarity metrics.
- Transition to a cloud-based architecture for real-time predictions.
To run this project locally, follow these steps:
-
Clone the Repository:
git clone https://github.com/yourusername/car-price-prediction.git cd car-price-prediction
-
Install Required Dependencies:
pip install -r requirements.txt
-
Prepare the Dataset:
- Place the dataset file (
cardekho.csv
) in the root directory.
- Place the dataset file (
-
Train the Model:
python train_model.py
-
Run the Web Application:
python web.py
Access the application at
http://127.0.0.1:5000/
.
Contributions are welcome! To contribute:
- Fork the Repository.
- Create a Feature Branch:
git checkout -b feature-name
- Commit Your Changes:
git commit -m "Describe your feature"
- Push to Your Fork:
git push origin feature-name
- Open a Pull Request.