Implementation of NPO/SimNPO is different for MUSE v/s TOFU #5

molereddy · 2025-02-17T20:33:52Z

The MUSE implementation uses logits and the TOFU implementation uses loss to find the log-ratios. Both are not equivalent and the latter is the only one faithful to the formula from the paper. The former ignores the softmax denominator term which is part of the probability expression.

NPO for MUSE:

NPO for TOFU:

SimNPO for MUSE:

SimNPO for TOFU:

Is there a reason different implementations have been used? Does the latter version lead to the same results on MUSE?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of NPO/SimNPO is different for MUSE v/s TOFU #5

Implementation of NPO/SimNPO is different for MUSE v/s TOFU #5

molereddy commented Feb 17, 2025

Implementation of NPO/SimNPO is different for MUSE v/s TOFU #5

Implementation of NPO/SimNPO is different for MUSE v/s TOFU #5

Comments

molereddy commented Feb 17, 2025