We release all evaluation data and scripts for further analysis and reproduction of the accompanying paper: A comparison of translation performance between DeepL and Supertext.
pip install poetry
poetry install
To evaluate A/B results, call the script as follows:
poetry run python analysis/analyze.py -i data/ab_LANGPAIR.csv
You will find two TSVs with results in the results
folder:
- FILENAME_by_segment_winner: Aggregated results of segment wins by system
- FILENAME_by_document_winner: Aggregated results of document wins by system
If you use our code or data, please cite our paper:
@misc{flueckiger-etal-2025-comparison,
title={A comparison of translation performance between DeepL and Supertext},
author={Alex Flückiger and Chantal Amrhein and Tim Graf and Frédéric Odermatt and Martin Pömsl and Philippe Schläpfer and Florian Schottmann and Samuel Läubli},
year={2025},
eprint={2502.02577},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.02577},
}