Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bingo Elastic search #772

Merged
merged 10 commits into from
Jul 19, 2022
Merged

Bingo Elastic search #772

merged 10 commits into from
Jul 19, 2022

Conversation

AATDev21
Copy link
Collaborator

@AATDev21 AATDev21 commented Jul 1, 2022

  • Exact search was switched to use molecules hashes values for getting hits from Elastic.
  • Changed substructure search query type.
  • Added additional bad valence check into data importing step.
  • Changed some Elastic search parameters.

@AATDev21 AATDev21 added python Pull requests that update Python code Bingo-Elastic labels Jul 1, 2022
@AATDev21 AATDev21 linked an issue Jul 1, 2022 that may be closed by this pull request
@MysterionRise
Copy link
Collaborator

we need to apply same changes in Java API as well

@AATDev21 AATDev21 requested review from mkviatkovskii, khyurri and MysterionRise and removed request for MysterionRise and khyurri July 1, 2022 11:35
@@ -4,6 +4,7 @@ disable=
C0116, # missing-function-docstring
C0115, # missing-class-docstring
R0903, # too-few-public-methods
R0913, # too-many-arguments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not add this exclusion. Probably it's better to refactor the code that raises this warning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, found a way to refactor and escape this warning!

@@ -151,13 +154,12 @@ def prepare(


def response_to_records(
res, index_name, postprocess_actions
res, index_name, postprocess_actions, indigo_session=None, options=""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add the type hints here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, already added.

assert postprocess_actions is not None
postprocess_actions.append(getattr(self, "postprocess"))


# Alias to default similarity match
SimilarityMatch = TanimotoSimilarityMatch
SimilarityMatch = EuclidSimilarityMatch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default should be Tanimoto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it too.

substructure: IndigoRecord = None,
limit: int = 10,
substructure: IndigoObject = None,
limit: int = 5000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably 100 as better as default value

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AATDev21 @mkviatkovskii same as before. Default Elastic one is 10. We could override if needed. Getting 1 page of results of size 5000 is big load. Not sure for 100 too. Paging is much better way to deal with it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some test cases like substructure search for an empty query molecule, we expected to get 2026 hits, and a low limit value will lead this test to fail. I really shouldn't have changed limit's default value, so let's set it back to Elastic's default 10.

substructure: IndigoRecord = None,
limit: int = 10,
substructure: IndigoObject = None,
limit: int = 5000,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AATDev21 @mkviatkovskii same as before. Default Elastic one is 10. We could override if needed. Getting 1 page of results of size 5000 is big load. Not sure for 100 too. Paging is much better way to deal with it

Copy link
Collaborator

@MysterionRise MysterionRise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, apart from missing default similarity match? Should we have it?

Comment on lines +305 to +308
class SimilarityMatch:
euclid = EuclidSimilarityMatch
tanimoto = TanimotoSimilarityMatch
tversky = TverskySimilarityMatch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AATDev21 @mkviatkovskii should we have a default one still?

@mkviatkovskii mkviatkovskii merged commit 869db9c into master Jul 19, 2022
@mkviatkovskii mkviatkovskii deleted the bingo_elastic_sim_sub branch July 19, 2022 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bingo-Elastic python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bingo failed tests
4 participants