This project implements object detection and Optical Character Recognition (OCR) to extract train coach numbers from images using YOLO (You Only Look Once) and Tesseract.
- YOLO Model: Utilizes a pretrained YOLO model for object detection in images.
- OCR with Tesseract: Extracts text from the detected regions using Tesseract.
- Image Rotation: Enhances OCR accuracy by rotating images at specified angles to improve text readability.
- Python 3.x
- OpenCV
- NumPy
- Matplotlib
- Pytesseract
- Flask
-
Clone the repository:
git clone https://github.com/yourusername/your-repo-name.git cd your-repo-name
-
Install dependencies:
pip install -r requirements.txt
-
Download the YOLO model:
- Ensure the
best.onnx
model is located in./static/models/
.
- Ensure the
-
Start the Flask server:
python app.py
-
Open your browser and navigate to
http://127.0.0.1:5000/
.
-
Upload an Image:
- Click on "Choose File" to select an image containing train coach numbers.
- Click on "Upload" to process the image.
-
View Results:
- The processed image with detected regions and extracted text will be displayed on the web page.
To improve OCR accuracy, images are rotated at specified angles. This adjustment helps Tesseract recognize text more accurately by aligning the train numbers optimally.
angles = [-10, -8, -6, -4, -2, 2, 4, 6, 8, 10]
Each image undergoes rotation through these angles, and OCR is performed on each rotated image. The image yielding the highest confidence score from Tesseract is selected as the best result.
Converts the image to YOLO format and retrieves predictions from the YOLO model.
Filters detections based on confidence and probability scores, applying Non-Maximum Suppression (NMS).
Extracts text from the specified bounding box region of the image using Tesseract.
Draws bounding boxes and extracted text on the image.
Rotates the image by the specified angle.
Rotates the image by the specified angle around the center using affine transformation.
Processes the image with YOLO and OCR, iterating over different rotation angles.
Main function to handle image reading, processing, and saving the result.
/
: Main route to upload and display images.
To run the server, execute:
python app.py