Form Digitization Using CV2 and AI 🎥🤖

This project leverages OpenCV and the Gemini 1.5 Flash API for real-time form digitization. It provides an interactive interface to digitize forms, detect hand gestures for form type selection, and overlay extracted data on a visually appealing custom background.

Features 🌟

Real-Time Video Feed: Captures live input from the webcam.
Hand Gesture Recognition: Uses finger gestures to select the form type (e.g., Student Card, Challan, General Form).
Gemini 1.5 Flash API Integration: Extracts text from images of forms uploaded by the user.
Dynamic Data Display: Displays extracted data (like name, CNIC, department, etc.) directly on the application interface.
Customizable UI: Implements a styled UI with background overlays and region-specific displays.
Video Recording: Allows recording of sessions for future reference.
Multi-Mode Support: Toggle between instruction view, live capture, and extracted text view.

Prerequisites 📋

Make sure you have the following installed and set up before running the application:

Python 3.8+
OpenCV (cv2)
cvzone (for hand gesture detection)
Pillow (for image processing)
Gemini 1.5 Flash API credentials
A webcam for live video input

Installation 🛠️

Clone the repository:

git clone https://github.com/cyberfantics/form_digitilization.git
cd form_digitilization

Install the required dependencies:
```
pip install -r requirements.txt
```
Add your Gemini 1.5 Flash API key in the extract.py script.

How to Use ▶️

Run the application:

python main.py

Follow the On-Screen Instructions 📋

Press p: Toggle video recording.
Press i: View the instructions screen.
Press c: Enter live capture mode.
Press s: Send the frame to the Gemini 1.5 Flash API for text extraction.
Press h: Activate hand gesture detection for form type selection.
Press q: Exit the application.

Hand Gesture Rules ✋

Five fingers open: Select General Form Mode.
Two fingers open (peace sign): Select Fee Challan Mode.
Five fingers closed (fist): Select Student Card Mode.

Output 📊

The extracted data (e.g., Name, CNIC, Gender) is displayed directly on the UI.
You can view the processed data in live video frames and save the session as a video file.

File Structure 📂

main.py: Core application logic.
extract.py: Contains API integration for text extraction.
resources/: Contains images for the UI (e.g., background and instructions).
requirements.txt: List of dependencies.

Screenshots 📸

Contributing 🤝

Contributions are welcome! Feel free to submit a pull request or open an issue to suggest improvements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Form Digitization Using CV2 and AI 🎥🤖

Features 🌟

Prerequisites 📋

Installation 🛠️

How to Use ▶️

Run the application:

Follow the On-Screen Instructions 📋

Hand Gesture Rules ✋

Output 📊

File Structure 📂

Screenshots 📸

Contributing 🤝

License 📜

Author

Happy Digitizing! 🎉

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
resources		resources
screenshots		screenshots
README.md		README.md
extract.py		extract.py
main.py		main.py
requirements.txt		requirements.txt

cyberfantics/form_digitilization

Folders and files

Latest commit

History

Repository files navigation

Form Digitization Using CV2 and AI 🎥🤖

Features 🌟

Prerequisites 📋

Installation 🛠️

How to Use ▶️

Run the application:

Follow the On-Screen Instructions 📋

Hand Gesture Rules ✋

Output 📊

File Structure 📂

Screenshots 📸

Contributing 🤝

License 📜

Author

Happy Digitizing! 🎉

About

Topics

Resources

Stars

Watchers

Forks

Languages