-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bingo Elastic search #772
Bingo Elastic search #772
Conversation
AATDev21
commented
Jul 1, 2022
•
edited
Loading
edited
- Exact search was switched to use molecules hashes values for getting hits from Elastic.
- Changed substructure search query type.
- Added additional bad valence check into data importing step.
- Changed some Elastic search parameters.
we need to apply same changes in Java API as well |
bingo/bingo-elastic/python/.pylintrc
Outdated
@@ -4,6 +4,7 @@ disable= | |||
C0116, # missing-function-docstring | |||
C0115, # missing-class-docstring | |||
R0903, # too-few-public-methods | |||
R0913, # too-many-arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not add this exclusion. Probably it's better to refactor the code that raises this warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, found a way to refactor and escape this warning!
@@ -151,13 +154,12 @@ def prepare( | |||
|
|||
|
|||
def response_to_records( | |||
res, index_name, postprocess_actions | |||
res, index_name, postprocess_actions, indigo_session=None, options="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add the type hints here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, already added.
assert postprocess_actions is not None | ||
postprocess_actions.append(getattr(self, "postprocess")) | ||
|
||
|
||
# Alias to default similarity match | ||
SimilarityMatch = TanimotoSimilarityMatch | ||
SimilarityMatch = EuclidSimilarityMatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default should be Tanimoto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it too.
substructure: IndigoRecord = None, | ||
limit: int = 10, | ||
substructure: IndigoObject = None, | ||
limit: int = 5000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably 100 as better as default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AATDev21 @mkviatkovskii same as before. Default Elastic one is 10. We could override if needed. Getting 1 page of results of size 5000 is big load. Not sure for 100 too. Paging is much better way to deal with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some test cases like substructure search for an empty query molecule, we expected to get 2026 hits, and a low limit value will lead this test to fail. I really shouldn't have changed limit's default value, so let's set it back to Elastic's default 10.
substructure: IndigoRecord = None, | ||
limit: int = 10, | ||
substructure: IndigoObject = None, | ||
limit: int = 5000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AATDev21 @mkviatkovskii same as before. Default Elastic one is 10. We could override if needed. Getting 1 page of results of size 5000 is big load. Not sure for 100 too. Paging is much better way to deal with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general, apart from missing default similarity match? Should we have it?
class SimilarityMatch: | ||
euclid = EuclidSimilarityMatch | ||
tanimoto = TanimotoSimilarityMatch | ||
tversky = TverskySimilarityMatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AATDev21 @mkviatkovskii should we have a default one still?