You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# compute cosine similarity between nth element in res1 and res2
from sklearn.metrics.pairwise import cosine_similarity
cosine_similarity(res1[0].reshape(1, -1), res2[0].reshape(1, -1))
array([[0.85182214]], dtype=float32)
so we can see that the cosine similarity is only 0.85, the vectors are completely different.
HI, I just added on comment on the PR that was merged (See #280 (comment))
The addition of the mean_pool solved cosine similarity issue but there are still inconsistencies about the normalization (especially when comparing to Nomic's documentation), I think this issue should be reopened in the mean time.
I was comparing the nomic embeddings and they are very different from the original version.
so we can see that the cosine similarity is only 0.85, the vectors are completely different.
The fastembed normalize https://github.com/qdrant/fastembed/blob/main/fastembed/common/models.py#L49-L54 does not follow the normalization in https://huggingface.co/nomic-ai/nomic-embed-text-v1.5#sentence-transformers it seems
The text was updated successfully, but these errors were encountered: