You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The MUSE implementation uses logits and the TOFU implementation uses loss to find the log-ratios. Both are not equivalent and the latter is the only one faithful to the formula from the paper. The former ignores the softmax denominator term which is part of the probability expression.
NPO for MUSE:
NPO for TOFU:
SimNPO for MUSE:
SimNPO for TOFU:
Is there a reason different implementations have been used? Does the latter version lead to the same results on MUSE?
The text was updated successfully, but these errors were encountered:
The MUSE implementation uses logits and the TOFU implementation uses loss to find the log-ratios. Both are not equivalent and the latter is the only one faithful to the formula from the paper. The former ignores the softmax denominator term which is part of the probability expression.
NPO for MUSE:
NPO for TOFU:
SimNPO for MUSE:
SimNPO for TOFU:
Is there a reason different implementations have been used? Does the latter version lead to the same results on MUSE?
The text was updated successfully, but these errors were encountered: