Consider pinning your spaCy version in requirements.txt? #178

honnibal · 2017-11-15T14:26:27Z

I just noticed that your requirements.txt doesn't pin to any particular version of spaCy or NLTK.

We've recently pushed spaCy 2, and while we've endeavoured to keep breaking changes to a minimum, it's a pretty big release: https://github.com/explosion/spaCy/releases/tag/v2.0.2

Even if the API doesn't change, there's the potential for problematic train/test skew for you if we make bug fixes to the tokenization, especially for languages other than English. Our compatibility policy is that changes that can affect statistical models can be made on minor releases --- e.g. spaCy 2.1.0 might fix some bug in the Hungarian tokenizer that affects a large number of tokens for that language. This means that sometimes, models trained with one minor version will suffer decreased accuracy if another version of the library is used at runtime.

There are also potential performance considerations. There's currently an open ticket about performance degradation of the tokenizer. It's unfortunate that this problem made it into the release, and we're working on it. But in the meantime, users who make a new installation of torch.text might find their preprocessing is much slower.

The text was updated successfully, but these errors were encountered:

jekbradbury · 2017-11-18T18:05:16Z

Our policy so far has been to treat SpaCy and NLTK as optional dependencies and use whatever version the user's already working with/already has installed. Choosing the "spacy" tokenizer option is a convenience function for manually creating a lambda that calls SpaCy's English tokenizer.
But that's not actually incompatible with providing a version in requirements.txt, since the optional dependencies there aren't installed or checked by pip install torchtext, so we'll go ahead and pin.

This was referenced Mar 24, 2024

Remove SpaCy/NLTK as an optional dependency by creating our own tokenizer for a number of languages #2245

Open

Language translation example added (#1131) pytorch/examples#1240

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider pinning your spaCy version in requirements.txt? #178

Consider pinning your spaCy version in requirements.txt? #178

honnibal commented Nov 15, 2017

jekbradbury commented Nov 18, 2017

Consider pinning your spaCy version in requirements.txt? #178

Consider pinning your spaCy version in requirements.txt? #178

Comments

honnibal commented Nov 15, 2017

jekbradbury commented Nov 18, 2017