-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TF: rework XLA generate tests #16866
Conversation
The documentation is not available anymore as the PR was closed or merged. |
expected_output_string = [ | ||
"Heute ist ein schöner Tag.", | ||
"Ich habe vier Katzen, drei Hunde, zwei Vögel und ein Pferd.", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we guaranteed that the logits are stable enough that we'll always sample this exact output? A flaky test can be really annoying!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't had problems with similar tests, so I'm assuming we won't have problems :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...fair point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks great! The fact that we can't really test sample outputs is annoying, but I see why we can't really get around that problem.
@patrickvonplaten reintroduced the fast tests, will merge as soon as CI gets to green |
What does this PR do?
In the light of recent findings (#16838), this PR reworks existing XLA generate tests. The following key changes were made:
@unittest.skipIf
on XLA generate tests, to skip when no GPU is present;sample
tests -- due to the minor numerical differences that arise when we use XLA (and that we can't control), the sampling step will gather different samples even when we use the same seed. The only thing we can properly test is whether a) we can seed them and b) the results are sensible;