Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of NPO/SimNPO is different for MUSE v/s TOFU #5

Open
molereddy opened this issue Feb 17, 2025 · 0 comments
Open

Implementation of NPO/SimNPO is different for MUSE v/s TOFU #5

molereddy opened this issue Feb 17, 2025 · 0 comments

Comments

@molereddy
Copy link

The MUSE implementation uses logits and the TOFU implementation uses loss to find the log-ratios. Both are not equivalent and the latter is the only one faithful to the formula from the paper. The former ignores the softmax denominator term which is part of the probability expression.

NPO for MUSE:

Image

NPO for TOFU:

Image

SimNPO for MUSE:

Image

SimNPO for TOFU:

Image

Is there a reason different implementations have been used? Does the latter version lead to the same results on MUSE?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant