HPLT - High Performance Language Technologies
A space that combines petabytes of natural language data with large-scale model training
Pinned Loading
Repositories
Showing 10 of 23 repositories
- OpusCleaner Public
OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
-
- warc2text-runner Public
Scripts for parallelized extraction of plain texts from WARC archieves. Aiming at common and reproducible extraction approach.