Skip to content

faizahkureshi232/imagetospeech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech-enabled Image Narration: Assistance for the Visually Impaired

Overview

This project focuses on enabling accessibility for visually impaired individuals by automating the generation of descriptive captions for images and providing these descriptions through audio narration. The solution integrates advanced machine learning techniques in image processing, natural language understanding, and speech synthesis.

Features

  • Image Feature Extraction: Utilizes the InceptionV3 CNN model to extract high-level image features.
  • Semantic Word Embeddings: Employs GloVe embeddings to enhance the language representation.
  • Caption Generation: Generates meaningful and contextually relevant captions using an LSTM-based decoder.
  • Speech Narration: Converts generated captions into audio using Text-to-Speech (TTS) technology.

Architecture

The system follows an encoder-decoder paradigm:

  1. Image Input: Accepts images as input.
  2. Feature Extraction: InceptionV3 CNN extracts image features.
  3. Language Representation: GloVe embeddings provide semantic word vectors.
  4. Caption Generation: LSTM decoder generates captions using image features and word embeddings.
  5. Text-to-Speech Conversion: TTS converts captions to speech.
  6. Audio Output: Delivers the generated description as audio for user accessibility.

Technologies Used

  • Python: Programming language
  • TensorFlow/Keras: For building and training the CNN and LSTM models
  • GloVe: Pre-trained word embeddings for language representation
  • Text-to-Speech (TTS): For converting text captions into speech
  • Flask/Django (Optional): For deploying the application
  • NumPy, Pandas, Matplotlib: For data handling and visualization

Installation

  1. Clone this repository:

    git clone https://github.com/faizahkureshi232/imagetospeech.git
    cd project-name
    
    
  2. Download the pre-trained models:

    • InceptionV3 weights
    • GloVe word embeddings
  3. Run the application:

    bash

    Copy code

    eval.ipnby

Usage

  1. Upload an image through the interface or specify the image path in the script.
  2. The system will generate a descriptive caption.
  3. The caption will be converted into speech and played as audio.

Sample Results

  • Input Image: [example.png]
  • Generated Caption: "A Dog Running through the."
  • Audio Output: Speech narration of the generated caption.

Future Enhancements

  • Integration with real-time image capture (e.g., through a smartphone camera).
  • Support for multiple languages in Text-to-Speech.
  • Advanced customization for user-specific accessibility needs.

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository.

  2. Create a feature branch:

    bash

    Copy code

    git checkout -b feature-name

  3. Commit your changes:

    bash

    Copy code

    git commit -m "Add feature description"

  4. Push to the branch:

    bash

    Copy code

    git push origin feature-name

  5. Create a pull request.

License

This project is licensed under the MIT License. See LICENSE for more details.

Acknowledgements

  • InceptionV3 for feature extraction.
  • GloVe for pre-trained word embeddings.
  • OpenAI and community resources for inspiration and support.

Feel free to suggest improvements or report issues in the repository!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published