Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing RetriBERT tokenizer tests #17017

Merged
merged 9 commits into from
May 11, 2022
Merged

Add missing RetriBERT tokenizer tests #17017

merged 9 commits into from
May 11, 2022

Conversation

mpoemsl
Copy link
Contributor

@mpoemsl mpoemsl commented Apr 30, 2022

What does this PR do?

Addresses issue #16627.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@LysandreJik
@SaulLu

Notes

  1. There was no folder tests/retribert/ yet, so I created one and put an __init__.py in it. Is there anything else I have to do for these tests to get picked up by CI?

  2. RetriBertTokenizer is identical to BertTokenizer, so I mostly just duplicated BertTokenizationTest. Is that fine or should I rather a) write new tests from scratch or b) figure out a way to reuse the code in BertTokenizationTest?

@mpoemsl mpoemsl changed the title Add missing RetriBERT tokenizer tests [WIP] Add missing RetriBERT tokenizer tests Apr 30, 2022
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 30, 2022

The documentation is not available anymore as the PR was closed or merged.

@mpoemsl mpoemsl changed the title [WIP] Add missing RetriBERT tokenizer tests Add missing RetriBERT tokenizer tests Apr 30, 2022
Copy link
Contributor

@SaulLu SaulLu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the addition and the explanations regarding your choices! I agree with those 😃

I've just left 2 suggestions of change in comment

mpoemsl and others added 2 commits May 3, 2022 12:00
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
@mpoemsl mpoemsl changed the title Add missing RetriBERT tokenizer tests [WIP] Add missing RetriBERT tokenizer tests May 3, 2022
@mpoemsl
Copy link
Contributor Author

mpoemsl commented May 4, 2022

Hi @SaulLu, can you give me some guidance on how to proceed?

The CI is throwing the following error:
Make sure the names of these test files match the name of the module or utils they are testing, or adapt the constant `SPECIAL_MODULE_TO_TEST_MAP` in `utils/tests_fetcher.py` to add them. If your test file is triggered separately and is not supposed to be run by the regular CI, add it to the `EXPECTED_TEST_FILES_NEVER_TOUCHED` constant instead.

However, I think the naming is correct (test_tokenization_retribert.py tests tokenization_retribert.py) and adding it to neither constants makes sense to me. I'm also unable to reproduce this locally with pytest. Do you have any idea what triggered this?

@SaulLu
Copy link
Contributor

SaulLu commented May 5, 2022

The CI error seems to me to come from the fact that 3 days ago there was a re-organisation of the test folder (#17034).

To solves this, I suggest to1) merge the latest changes to main in your branch and 2) move the tests you added to conform to the new organisation (tests/retribert -> tests/models/retribert).

Keep me updated! 😄

@mpoemsl mpoemsl changed the title [WIP] Add missing RetriBERT tokenizer tests Add missing RetriBERT tokenizer tests May 6, 2022
@mpoemsl
Copy link
Contributor Author

mpoemsl commented May 6, 2022

Hi @SaulLu, thank you very much for the reply. I think I've got it now! Can you take another look?

Copy link
Contributor

@SaulLu SaulLu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look perfect to me! Thanks for the contribution @mpoemsl 🚀

@SaulLu SaulLu merged commit 5229744 into huggingface:main May 11, 2022
@mpoemsl
Copy link
Contributor Author

mpoemsl commented May 11, 2022

My pleasure! I would like to keep contributing and I was wondering if you could help me with a question related to that @SaulLu.

I noticed that the RetriBERT model itself is missing test files as well and I would like to write those. How do I make sure that no one else is writing them concurrently? Do I open an issue or perhaps a WIP pull request? I have already checked that there currently is no open issue or pull request related to this.

ArthurZucker pushed a commit to ArthurZucker/transformers that referenced this pull request May 12, 2022
* Create RetriBERT tests folder

* Add missing RetriBERT tokenizer test file

* Apply style corrections

* Add non-english filter

* Update tests/retribert/test_tokenization_retribert.py

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* Update tests/retribert/test_tokenization_retribert.py

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* Move test files to new directory

* Update import path for testing utils to new test file structure

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
* Create RetriBERT tests folder

* Add missing RetriBERT tokenizer test file

* Apply style corrections

* Add non-english filter

* Update tests/retribert/test_tokenization_retribert.py

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* Update tests/retribert/test_tokenization_retribert.py

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* Move test files to new directory

* Update import path for testing utils to new test file structure

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants