-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix potential edge case scoring in context search #474
Conversation
✅ Deploy Preview for poetic-froyo-8baba7 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
qdrant_client/local/distances.py
Outdated
def fast_sigmoid(x: np.float32) -> np.float32: | ||
if np.isfinite(x): | ||
return x / (1.0 + abs(x)) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing:
You can remove the else and directly do return X no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, we can avoid the redundant else statement.
expit
We use the same function as in core implementation to get scoring congruence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see
qdrant_client/local/distances.py
Outdated
@@ -147,13 +147,21 @@ def calculate_distance_core( | |||
return calculate_distance(query, vectors, distance_type) | |||
|
|||
|
|||
def fast_sigmoid(x: np.float32) -> np.float32: | |||
if not np.isfinite(x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused by it, I know, that we have had such a check for quite some time now though..
Do we wait for a particular not finite type here?
Like -inf, +inf, nan ? (if we wait -inf, won't it be better to return a constant value?)
It seems for me that at the moment, If vector has a non-finite difference with any of context pairs, then it is excluded from result search, because it will have non-finite score in overall_scores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't know if I understand well, but if x equal +inf or -inf the function return x is good or you want to return for example 2 or 3 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, in those cases we should return the limit of the function, which should be -1 or +1
qdrant_client/local/distances.py
Outdated
if np.isnan(x): | ||
# To avoid divisions on NaNs, which gets: RuntimeWarning: invalid value encountered in scalar divide | ||
return x # NaN | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to just hide the warning? :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust implementation also returns NaN when dividing by NaN, so I'd say it is safe to return it here too
* apply fast_sigmoid fn to context pair score * remove redundant else statement * better NaN and float32 handling * remove unused import
Follows the same fix that was made in core (qdrant/qdrant#3374) for congruence
All Submissions:
dev
branch. Did you create your branch fromdev
?New Feature Submissions:
pre-commit
withpip3 install pre-commit
and set up hooks withpre-commit install
?Changes to Core Features: