Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: During deleting the entities, it took about 12 minutes to recover from the delegator failure to search #37670

Open
1 task done
ThreadDao opened this issue Nov 14, 2024 · 0 comments
Assignees
Labels
kind/bug Issues or changes related a bug severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@ThreadDao
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.4-20241106-20534a3f-amd64
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

server config

  • qn: 4*8c32g
  • deleteBufferRowCountProtection: [12m, 25m]
  • l0SegmentsRowCountProtection: [25m 50m]
  • levelZeroForwardPolicy: RemoteLoad
  • streamingDeltaForwardPolicy: FilterByBF
  • taskPrioritizer: level

test steps

  1. Continuously run concurrent search, deleting 60 million of data in batches of 60,000 pks
  2. 08:15:54 delete done
  3. 08:16:30 kill-9 delegator qn-mcpj4
    图片

results

  • qn oom during recovery

    • 08:19 qn-mcpj4 oom
    • 08:22 qn-tqvd5 & qn-sdqrp oom
    • 08:26 qn-gzlwz oom
      图片
  • search recovery: 12min
    图片

  • metrics of compact-opt-100m-3

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@ThreadDao ThreadDao added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 14, 2024
@ThreadDao ThreadDao added the severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. label Nov 14, 2024
@ThreadDao ThreadDao added this to the 2.4.16 milestone Nov 14, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 14, 2024
@yanliang567 yanliang567 removed their assignment Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants