DIAL_CLS / DIAL_RSP race leading to connection leak #404

tallclair · 2022-09-16T21:17:38Z

There could be a race condition where a DIAL_CLS packet from the frontend is received at the same time as a DIAL_RSP from the backend that could lead to the backend connection being leaked:

This could happen if the following conditions happen in this order:

DIAL_RSP received from the backend
The pending dial is still present in

apiserver-network-proxy/pkg/server/server.go

Line 755 in b5e5436

if frontend, ok := s.PendingDial.Get(resp.Random); !ok {
Frontend starts shutting down, sends a DIAL_CLS (prior to [konnectivity-client] Ensure grpc tunnel is closed on dial failure #398 it wouldn't even send a close request)
Server sends the dial response the frontend - The FE gRPC stream is still open so the packet is received, but the frontend doesn't process it:

apiserver-network-proxy/pkg/server/server.go

Line 767 in b5e5436

err := frontend.send(pkt)
At this point, the server thinks the connection is established, but the frontend is not aware of that, and in the process of shutting down, leading to a leaked backend connection.

This seems fairly unlikely (at least once #403 is fixed), but worth tracking.

The text was updated successfully, but these errors were encountered:

k8s-triage-robot · 2022-12-15T22:03:18Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

tallclair · 2022-12-16T00:07:42Z

/lifecycle frozen

jkh52 · 2023-02-14T19:31:30Z

/assign @jkh52

tallclair · 2024-05-16T16:22:29Z

/unassign @jkh52
/assign @azimjohn

k8s-ci-robot · 2024-05-16T16:22:32Z

@tallclair: GitHub didn't allow me to assign the following users: azimjohn.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/unassign @jkh52
/assign @azimjohn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2022

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 16, 2022

k8s-ci-robot assigned jkh52 Feb 14, 2023

jkh52 mentioned this issue Feb 14, 2023

Protocol: simplify identifers #462

Open

k8s-ci-robot unassigned jkh52 May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DIAL_CLS / DIAL_RSP race leading to connection leak #404

DIAL_CLS / DIAL_RSP race leading to connection leak #404

tallclair commented Sep 16, 2022

k8s-triage-robot commented Dec 15, 2022

tallclair commented Dec 16, 2022

jkh52 commented Feb 14, 2023

tallclair commented May 16, 2024

k8s-ci-robot commented May 16, 2024

DIAL_CLS / DIAL_RSP race leading to connection leak #404

DIAL_CLS / DIAL_RSP race leading to connection leak #404

Comments

tallclair commented Sep 16, 2022

k8s-triage-robot commented Dec 15, 2022

tallclair commented Dec 16, 2022

jkh52 commented Feb 14, 2023

tallclair commented May 16, 2024

k8s-ci-robot commented May 16, 2024