Speech-enabled Image Narration: Assistance for the Visually Impaired

Overview

This project focuses on enabling accessibility for visually impaired individuals by automating the generation of descriptive captions for images and providing these descriptions through audio narration. The solution integrates advanced machine learning techniques in image processing, natural language understanding, and speech synthesis.

Features

Image Feature Extraction: Utilizes the InceptionV3 CNN model to extract high-level image features.
Semantic Word Embeddings: Employs GloVe embeddings to enhance the language representation.
Caption Generation: Generates meaningful and contextually relevant captions using an LSTM-based decoder.
Speech Narration: Converts generated captions into audio using Text-to-Speech (TTS) technology.

Architecture

The system follows an encoder-decoder paradigm:

Image Input: Accepts images as input.
Feature Extraction: InceptionV3 CNN extracts image features.
Language Representation: GloVe embeddings provide semantic word vectors.
Caption Generation: LSTM decoder generates captions using image features and word embeddings.
Text-to-Speech Conversion: TTS converts captions to speech.
Audio Output: Delivers the generated description as audio for user accessibility.

Technologies Used

Python: Programming language
TensorFlow/Keras: For building and training the CNN and LSTM models
GloVe: Pre-trained word embeddings for language representation
Text-to-Speech (TTS): For converting text captions into speech
Flask/Django (Optional): For deploying the application
NumPy, Pandas, Matplotlib: For data handling and visualization

Installation

Clone this repository:

git clone https://github.com/faizahkureshi232/imagetospeech.git
cd project-name

Download the pre-trained models:
- InceptionV3 weights
- GloVe word embeddings
Run the application:

bash

Copy code

eval.ipnby

Usage

Upload an image through the interface or specify the image path in the script.
The system will generate a descriptive caption.
The caption will be converted into speech and played as audio.

Sample Results

Input Image: [example.png]
Generated Caption: "A Dog Running through the."
Audio Output: Speech narration of the generated caption.

Future Enhancements

Integration with real-time image capture (e.g., through a smartphone camera).
Support for multiple languages in Text-to-Speech.
Advanced customization for user-specific accessibility needs.

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a feature branch:

bash

Copy code

git checkout -b feature-name
Commit your changes:

bash

Copy code

git commit -m "Add feature description"
Push to the branch:

bash

Copy code

git push origin feature-name
Create a pull request.

License

This project is licensed under the MIT License. See LICENSE for more details.

Acknowledgements

InceptionV3 for feature extraction.
GloVe for pre-trained word embeddings.
OpenAI and community resources for inspiration and support.

Feel free to suggest improvements or report issues in the repository!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
encoder_decoder.ipynb		encoder_decoder.ipynb
eval.ipynb		eval.ipynb
example.png		example.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-enabled Image Narration: Assistance for the Visually Impaired

Overview

Features

Architecture

Technologies Used

Installation

Usage

Sample Results

Future Enhancements

Contributing

License

Acknowledgements

About

Releases

Packages

Languages

License

faizahkureshi232/imagetospeech

Folders and files

Latest commit

History

Repository files navigation

Speech-enabled Image Narration: Assistance for the Visually Impaired

Overview

Features

Architecture

Technologies Used

Installation

Usage

Sample Results

Future Enhancements

Contributing

License

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages