Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: L0 delete compaction is not triggered again after the number of Flushed-L0 segments no longer increases #30556

Closed
1 task done
ThreadDao opened this issue Feb 6, 2024 · 2 comments
Assignees
Labels
kind/bug Issues or changes related a bug severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@ThreadDao
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20240204-18b979d9-amd64
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):  pulsar  
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

  1. deploy milvus and enable L0 segment
  2. create collection with 2 shards -> index -> load -> insert 5m-128d data -> flush -> index again -> load
  3. concurrent: search + upsert + flush
'concurrent_params': {'concurrent_number': 50,
                                                       'during_time': '10h',
                                                       'interval': 120,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'search',
                                                       'weight': 4,
                                                       'params': {'nq': 100,
                                                                  'top_k': 100,
                                                                  'search_param': {'ef': 128},
                                                                  'timeout': 120}},
                                                      {'type': 'flush',
                                                       'weight': 2,
                                                       'params': {'timeout': 120}},
                                                      {'type': 'upsert',
                                                       'weight': 4,
                                                       'params': {'nb': 100,
                                                                  'start_id': 0,
                                                                  'random_id': True,
                                                                  'random_vector': True,
                                                                  'timeout': 120}}]}
  1. Starting around 2024-02-05 7:00, the L0 delete compaction is no longer triggered
    metrics of level-zero-upsert-13
    image

Expected Behavior

No response

Steps To Reproduce

argo: https://argo-workflows.zilliz.cc/archived-workflows/qa/49d1b28c-5a67-4c14-ba28-f58b7a6d673b?nodeId=level-zero-stable-13a-2745047073

Milvus Log

pods:

level-zero-upsert-13-etcd-0                                       1/1     Running            0                 26h
level-zero-upsert-13-etcd-1                                       1/1     Running            0                 26h
level-zero-upsert-13-etcd-2                                       1/1     Running            0                 26h
level-zero-upsert-13-milvus-datanode-76574659b8-5wgxv             1/1     Running            0                 26h
level-zero-upsert-13-milvus-datanode-76574659b8-zsp4p             1/1     Running            0                 26h
level-zero-upsert-13-milvus-indexnode-776c568f59-cxrpv            1/1     Running            0                 26h
level-zero-upsert-13-milvus-indexnode-776c568f59-sndzg            1/1     Running            0                 26h
level-zero-upsert-13-milvus-mixcoord-86946954b7-j8f42             1/1     Running            0                 26h
level-zero-upsert-13-milvus-proxy-7c65458c4c-x9ksw                1/1     Running            0                 26h
level-zero-upsert-13-milvus-querynode-0-7dd76577c5-bv8pm          1/1     Running            0                 26h
level-zero-upsert-13-milvus-querynode-0-7dd76577c5-kwzrv          1/1     Running            0                 26h
level-zero-upsert-13-milvus-querynode-0-7dd76577c5-wsl4t          1/1     Running            0                 26h
level-zero-upsert-13-minio-0                                      1/1     Running            0                 26h
level-zero-upsert-13-minio-1                                      1/1     Running            0                 26h
level-zero-upsert-13-minio-2                                      1/1     Running            0                 26h
level-zero-upsert-13-minio-3                                      1/1     Running            0                 26h
level-zero-upsert-13-pulsar-bookie-0                              1/1     Running            0                 26h
level-zero-upsert-13-pulsar-bookie-1                              1/1     Running            0                 26h
level-zero-upsert-13-pulsar-bookie-2                              1/1     Running            0                 26h
level-zero-upsert-13-pulsar-bookie-init-7ljdm                     0/1     Completed          0                 26h
level-zero-upsert-13-pulsar-broker-0                              1/1     Running            0                 26h
level-zero-upsert-13-pulsar-proxy-0                               1/1     Running            0                 26h
level-zero-upsert-13-pulsar-pulsar-init-ntxbg                     0/1     Completed          0                 26h
level-zero-upsert-13-pulsar-recovery-0                            1/1     Running            0                 26h
level-zero-upsert-13-pulsar-zookeeper-0                           1/1     Running            0                 26h
level-zero-upsert-13-pulsar-zookeeper-1                           1/1     Running            0                 26h
level-zero-upsert-13-pulsar-zookeeper-2                           1/1     Running            0                 26h

Anything else?

No response

@ThreadDao ThreadDao added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 6, 2024
@ThreadDao ThreadDao added the severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. label Feb 6, 2024
@ThreadDao ThreadDao added this to the 2.4.0 milestone Feb 6, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 6, 2024
@yanliang567 yanliang567 removed their assignment Feb 6, 2024
sre-ci-robot pushed a commit that referenced this issue Feb 19, 2024
1. Increase maxCount of L0 compaction tasks to 30

This could reduce the l0 compaction task number by 30% for
high-frequently-generated-small l0 segments, with the maximum size 64MB
stay not changed. So that l0 segments would accumulate slower and
decrease the mem presure caused by L0 segment for QueryNode

2. Add force Trigger for later manual timely l0 compaction triggers.

See also: #30191, #30556

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Mar 5, 2024
Trigger l0 compaction when l0 views don't change

So that leftover l0 segments would be compacted in the end.

1. Refresh LevelZero plans in comactionPlanHandler, remove the meta
dependency
of compaction trigger v2
2. Add ForceTrigger method for CompactionView interface
3. rename mu to taskGuard
4. Add a new TriggerTypeLevelZeroViewIDLE
5. Add an idleTicker for compaction view manager

See also: #30098, #30556

Signed-off-by: yangxuan <xuan.yang@zilliz.com>

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
@xiaofan-luan
Copy link
Collaborator

/assign @ThreadDao

@ThreadDao
Copy link
Contributor Author

fixed master-20240311-a99143dd-amd64
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants