This project leverages OpenCV and the Gemini 1.5 Flash API for real-time form digitization. It provides an interactive interface to digitize forms, detect hand gestures for form type selection, and overlay extracted data on a visually appealing custom background.
- Real-Time Video Feed: Captures live input from the webcam.
- Hand Gesture Recognition: Uses finger gestures to select the form type (e.g., Student Card, Challan, General Form).
- Gemini 1.5 Flash API Integration: Extracts text from images of forms uploaded by the user.
- Dynamic Data Display: Displays extracted data (like name, CNIC, department, etc.) directly on the application interface.
- Customizable UI: Implements a styled UI with background overlays and region-specific displays.
- Video Recording: Allows recording of sessions for future reference.
- Multi-Mode Support: Toggle between instruction view, live capture, and extracted text view.
Make sure you have the following installed and set up before running the application:
- Python 3.8+
- OpenCV (
cv2
) - cvzone (for hand gesture detection)
- Pillow (for image processing)
- Gemini 1.5 Flash API credentials
- A webcam for live video input
- Clone the repository:
git clone https://github.com/cyberfantics/form_digitilization.git cd form_digitilization
- Install the required dependencies:
pip install -r requirements.txt
- Add your
Gemini 1.5 Flash API
key in theextract.py
script.
python main.py
- Press
p
: Toggle video recording. - Press
i
: View the instructions screen. - Press
c
: Enter live capture mode. - Press
s
: Send the frame to the Gemini 1.5 Flash API for text extraction. - Press
h
: Activate hand gesture detection for form type selection. - Press
q
: Exit the application.
- Five fingers open: Select General Form Mode.
- Two fingers open (peace sign): Select Fee Challan Mode.
- Five fingers closed (fist): Select Student Card Mode.
- The extracted data (e.g., Name, CNIC, Gender) is displayed directly on the UI.
- You can view the processed data in live video frames and save the session as a video file.
main.py
: Core application logic.extract.py
: Contains API integration for text extraction.resources/
: Contains images for the UI (e.g., background and instructions).requirements.txt
: List of dependencies.
Contributions are welcome! Feel free to submit a pull request or open an issue to suggest improvements.
This project is licensed under the MIT License.
Syed Mansoor ul Hassan Bukhari
GitHub | LinkedIn