This repository contains the implementation and dataset for the paper "Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing". It focuses on improving Marathi plagiarism detection using a weighted ensemble of TF-IDF and BERT embeddings.
If you use any datasets or refer to our methodology please cite our work via the following BibTeX citation:
@misc{mutsaddi2025enhancingplagiarismdetectionmarathi,
title={Enhancing Plagiarism Detection in Marathi with a Weighted Ensemble of TF-IDF and BERT Embeddings for Low-Resource Language Processing},
author={Atharva Mutsaddi and Aditya Choudhary},
year={2025},
eprint={2501.05260},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.05260},
}