Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Not working with nltk #81

Open
2 tasks done
guidev opened this issue Aug 19, 2024 · 2 comments
Open
2 tasks done

[Bug] Not working with nltk #81

guidev opened this issue Aug 19, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@guidev
Copy link

guidev commented Aug 19, 2024

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

nltk.download("punkt") fails for nltk v 3.9.x

@staticmethod
  def nltk_setup() -> None:
      try:
          nltk.data.find("tokenizers/punkt")
      except LookupError:
          nltk.download("punkt")

      try:
          nltk.data.find("corpora/stopwords")
      except LookupError:
          nltk.download("stopwords")

Here's a full explanation nltk/nltk#3293

Expected Behavior

pinecone-text should work with the latest nltk version

Steps To Reproduce

nltk/nltk#3293

Relevant log output

No response

Environment

- **OS**:
- **Language version**:
- **Pinecone client version**:

Additional Context

No response

@guidev guidev added the bug Something isn't working label Aug 19, 2024
@adumont
Copy link

adumont commented Aug 19, 2024

Apparently it seems to be enough to just modify sparse\bm25_tokenizer.py, replacing punkt with punkt_tab.

    @staticmethod
    def nltk_setup() -> None:
        try:
            nltk.data.find("tokenizers/punkt_tab")
        except LookupError:
            nltk.download("punkt_tab")

        try:
            nltk.data.find("corpora/stopwords")
        except LookupError:
            nltk.download("stopwords")```

@emielsteerneman
Copy link

#83

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants