Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The impact of the limit parameter on the ranking of results during search #40018

Open
1 task done
zmw132 opened this issue Feb 19, 2025 · 4 comments
Open
1 task done
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@zmw132
Copy link

zmw132 commented Feb 19, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.5.1
- Deployment mode(standalone or cluster): standalone 
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): api
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Through the search interface, all other contents of the query body remain unchanged, only the limit is changed. Why is the result when limit=3 not among the top three results when limit=10? During this process, there were no operations on the collection

curl --request POST --url "http://xxx:19530/v2/vectordb/entities/search" --header "Authorization: Bearer root:xxx" --header "Content-Type: application/json"
-d '{
"annsField": "vector",
"collectionName": "test_collection",
"data": [[-0.05120890960097313,-0.028760679066181183,-0.008349430747330189,...]],
"dbName": "test_db",
"filter": "",
"limit": 10,
"outputFields": ["_id", "aaa", "bbb", "ccc", "ddd"]
}'

Expected Behavior

No response

Steps To Reproduce

Milvus Log

No response

Anything else?

No response

@zmw132 zmw132 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 19, 2025
Copy link

dosubot bot commented Feb 19, 2025

Hey @zmw132! I'm here to help you with any bugs, questions, or contributions you have for Milvus. Let me know how I can assist you!

The behavior you're observing, where the top results differ when changing the limit parameter, can be attributed to how Milvus ranks and retrieves search results. The limit parameter specifies the maximum number of entities to return, but it doesn't guarantee that the top results for a smaller limit will always be included in the results for a larger limit. This is because the search algorithm may rank results differently based on the number of results requested.

When you set limit=3, Milvus returns the top three most similar entities based on the similarity scores. However, when you increase the limit to 10, the search algorithm may consider additional factors or perform additional computations that affect the ranking of results. This can lead to a different set of top results, as the algorithm optimizes for the larger set of results.

This behavior is typical in search systems where the ranking can be influenced by the number of results requested, especially in complex vector search scenarios. If consistent top results are critical, you might need to explore additional ranking strategies or configurations that ensure stability across different limits [1][2].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@yanliang567
Copy link
Contributor

/assign @liliu-z
/unassign

@sre-ci-robot sre-ci-robot assigned liliu-z and unassigned yanliang567 Feb 20, 2025
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 20, 2025
@yanliang567
Copy link
Contributor

/assign @binbinlv
please keep an eye on this issue

@xiaofan-luan
Copy link
Collaborator

what is your index parameter and what is the result looks like?

  1. maybe some backend handoff/compaction happens, changed the index strucutre.
  2. maybe your ef search is too small, so increate limit actual change the backend search parameters.

Suggestion:
try to tune search parameters and see if it's help.
try to wait until flush and compaction happened and see if it's stil reproducible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants