-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update RoBERTa vocabulary files #255
Conversation
gpengzhi
commented
Nov 27, 2019
resolve #246 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_ids, _ = tokenizer.encode_text('Hello world!', max_seq_length=5)
is max_seq_length
necessary here? what's the result in this case without setting max_seq_length
?
Codecov Report
@@ Coverage Diff @@
## master #255 +/- ##
==========================================
+ Coverage 83.04% 83.04% +<.01%
==========================================
Files 195 195
Lines 15293 15300 +7
==========================================
+ Hits 12700 12706 +6
- Misses 2593 2594 +1
Continue to review full report at Codecov.
|
|
If the user doesn't want padding, it's difficult for them to know the correct seq length just as in this case ('Hello world!' needs seq_length 5).. Can we add an argument (or allow a special value of max_seq_length) so that the user can get the encoded text without padding (and without need to specify max-seq-length explicitly)? |
There are two returns, |
pls try to pass the test and merge asap |
I tried to disable |
As long as the build is "passing" we're good for now |