This project is a Hindi Article Generator, developed using natural language processing (NLP) techniques to create contextually relevant and coherent articles in Hindi. The model was trained on a custom dataset collected via web scraping. The model has been fine-tuned on a dataset of Hindi news headlines and articles to generate high-quality, fluent articles for a variety of use cases such as news, blogs, and creative content.
Direct link to Kaggle Notebook with outputs https://www.kaggle.com/code/aadisrivastava/hindi-article-generator
(I would highly recommend you to check it out, as it has all the cells including the outputs)
The model is trained on a dataset that I created through web scraping from the BBC Hindi website. The dataset includes a wide variety of articles, ensuring that the generator produces diverse and contextually accurate content.
https://github.com/AadiSrivastava05/BBC-Hindi-News-Dataset-with-web-scraping-script
The dataset is also available on Kaggle which you can use directly in your code without downloading:-
https://www.kaggle.com/datasets/aadisrivastava/bbc-hindi-news-articles-dataset-detailed
- Generates fluent and contextually accurate Hindi articles.
- Can be used for content creation in media platforms, blogs, and creative writing.
- Built with advanced NLP techniques and fine-tuned for the Hindi language on llama 3.
- This same dataset and approach can be used for many other tasks, some of them I am listing below:-
- Generating a headline for an article
- Classification of an article into different categories
- Classification of an article headline into different categories