Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIAL_CLS / DIAL_RSP race leading to connection leak #404

Open
tallclair opened this issue Sep 16, 2022 · 5 comments
Open

DIAL_CLS / DIAL_RSP race leading to connection leak #404

tallclair opened this issue Sep 16, 2022 · 5 comments
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@tallclair
Copy link
Contributor

There could be a race condition where a DIAL_CLS packet from the frontend is received at the same time as a DIAL_RSP from the backend that could lead to the backend connection being leaked:

This could happen if the following conditions happen in this order:

  1. DIAL_RSP received from the backend
  2. The pending dial is still present in
    if frontend, ok := s.PendingDial.Get(resp.Random); !ok {
  3. Frontend starts shutting down, sends a DIAL_CLS (prior to [konnectivity-client] Ensure grpc tunnel is closed on dial failure #398 it wouldn't even send a close request)
  4. Server sends the dial response the frontend - The FE gRPC stream is still open so the packet is received, but the frontend doesn't process it:
    err := frontend.send(pkt)
  5. At this point, the server thinks the connection is established, but the frontend is not aware of that, and in the process of shutting down, leading to a leaked backend connection.

This seems fairly unlikely (at least once #403 is fixed), but worth tracking.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2022
@tallclair
Copy link
Contributor Author

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 16, 2022
@jkh52
Copy link
Contributor

jkh52 commented Feb 14, 2023

/assign @jkh52

@tallclair
Copy link
Contributor Author

/unassign @jkh52
/assign @azimjohn

@k8s-ci-robot
Copy link
Contributor

@tallclair: GitHub didn't allow me to assign the following users: azimjohn.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/unassign @jkh52
/assign @azimjohn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

4 participants