Skip to content

This project transforms visual content into vivid audio narratives for visually impaired individuals. Using advanced image recognition and text-to-speech technologies, it generates detailed captions and provides audio output in English, Kannada, and Hindi, fostering inclusivity and independence.

Notifications You must be signed in to change notification settings

bhoomikaniranjan/Depiction-of-image-features-with-audio-to-aid-visually-impaired-persons

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Depiction-of-image-features-with-audio-to-aid-visually-impaired-persons

This project is designed to make visual content accessible to visually impaired individuals by generating audio descriptions of images. Leveraging Python, machine learning, and text-to-speech technologies, the system identifies image features, converts them into textual captions, and translates them into audio for playback.


Features

  • Image Recognition: Uses deep learning models (e.g., Vision Transformers) to analyze images and extract features.
  • Caption Generation: Generates meaningful descriptions of images in text form.
  • Multilingual Audio Output: Provides audio descriptions in English, Kannada, and Hindi using advanced text-to-speech (TTS) libraries.
  • User-Friendly Interface: Enables users to upload images and listen to detailed audio descriptions seamlessly.

Tech Stack

  • Frontend: Google Colab for interactive prototyping.
  • Backend: Python, TensorFlow, PyTorch, VisionEncoderDecoderModel, gTTS, and OpenCV.
  • Additional Libraries: NumPy, Matplotlib, and Google Translate API for multilingual support.
  • Development Tools: Flask for interface development and Git for version control.

How It Works

  1. Image Upload: Users upload an image via the interface.
  2. Feature Extraction: The system analyzes the image using pre-trained deep learning models.
  3. Caption Generation: Converts features into meaningful captions.
  4. Text-to-Speech: Translates captions into audio in the desired language.
  5. Playback: Users can listen to detailed audio descriptions of the image.

Applications

  • Enhancing accessibility for visually impaired individuals.
  • Use in education and assistive technologies.
  • Real-time applications for image and video captioning.

Results

The project has demonstrated effective use of audio descriptions to convey image features. It significantly enhances accessibility and independence for visually impaired users.


About

This project transforms visual content into vivid audio narratives for visually impaired individuals. Using advanced image recognition and text-to-speech technologies, it generates detailed captions and provides audio output in English, Kannada, and Hindi, fostering inclusivity and independence.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages