Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cinder-csi-plugin] Duplicated attachment even after detachment finished #2709

Open
okozachenko1203 opened this issue Nov 7, 2024 · 0 comments

Comments

@okozachenko1203
Copy link

What happened:
TLDR

After a pod terminates successfully, its PV re-attaches unexpectedly to the node, causing conflicts for subsequent pods.

More

After a pod created by a stateful CronJob runs and successfully completes, its associated Persistent Volume (PV) detaches from the node as expected. However, a few minutes after the pod has terminated and disappeared, the PV re-attaches to the same node where the previous pod was scheduled.

This unexpected re-attachment causes issues for subsequent CronJob executions. When the CronJob attempts to schedule a new pod, it fails to start because the PV is already attached to the node from the previous run, causing a conflict.

What you expected to happen:
The PV should remain detached after the pod terminates and should not re-attach to the node automatically. This would ensure that subsequent pods can use the PV without attachment conflicts.

How to reproduce it:

  1. Create a stateful CronJob that uses a PV with a cinder-csi storage class.

  2. Create a job manually from the cronjob

kubectl create job --from=cronjob/CRONJOB_NAME test-job
  1. Monitor the corresponding OpenStack volume status:
    Initially, you will see the volume status change from available to in-use when the pod starts, then back to available once the pod terminates.
  2. Continue monitoring the volume status after pod termination.
    Occasionally, the volume will re-attach to the node after the pod has terminated.
  3. Note: This behavior does not occur consistently but happens intermittently.

Anything else we need to know?:

Here are observations we found with relevant csi-cinder-controllerplugin pod logs.

  • Target resources

Target volumeAttachment is csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262.
Target pv is pvc-3fd1a3a0-20e7-48d8-8e6b-910c5b717b8d.

`csi-attacher` container log
I1107 10:00:52.302581       1 controller.go:210] Started VA processing "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:00:52.302629       1 csi_handler.go:224] CSIHandler: processing VA "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:00:52.302643       1 csi_handler.go:251] Attaching "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:00:52.302649       1 csi_handler.go:421] Starting attach operation for "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:00:52.302717       1 csi_handler.go:335] PV finalizer is already set on "pvc-3fd1a3a0-20e7-48d8-8e6b-910c5b717b8d"
I1107 10:00:52.302772       1 csi_handler.go:312] VA finalizer added to "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:00:52.302786       1 csi_handler.go:326] NodeID annotation added to "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"

I1107 10:03:52.310262       1 csi_handler.go:591] Saving attach error to "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:52.314962       1 csi_handler.go:602] Saved attach error to "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:52.314997       1 csi_handler.go:234] Error processing "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262": failed to attach: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I1107 10:03:52.315473       1 controller.go:167] Ignoring VolumeAttachment "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262" change
I1107 10:03:52.315792       1 controller.go:210] Started VA processing "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:52.315812       1 csi_handler.go:224] CSIHandler: processing VA "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:52.315819       1 csi_handler.go:251] Attaching "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:52.315826       1 csi_handler.go:421] Starting attach operation for "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:52.315879       1 csi_handler.go:335] PV finalizer is already set on "pvc-3fd1a3a0-20e7-48d8-8e6b-910c5b717b8d"
I1107 10:03:52.315925       1 csi_handler.go:304] VA finalizer is already set on "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:52.315932       1 csi_handler.go:318] NodeID annotation is already set on "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.383059       1 csi_handler.go:264] Attached "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.383078       1 util.go:38] Marking as attached "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.387436       1 util.go:52] Marked as attached "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.387454       1 csi_handler.go:270] Fully attached "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.387463       1 csi_handler.go:240] CSIHandler: finished processing "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.387494       1 controller.go:210] Started VA processing "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.387503       1 csi_handler.go:224] CSIHandler: processing VA "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.387508       1 csi_handler.go:251] Attaching "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.387514       1 csi_handler.go:421] Starting attach operation for "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.387556       1 csi_handler.go:335] PV finalizer is already set on "pvc-3fd1a3a0-20e7-48d8-8e6b-910c5b717b8d"
I1107 10:03:54.387580       1 csi_handler.go:304] VA finalizer is already set on "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.387586       1 csi_handler.go:318] NodeID annotation is already set on "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.908513       1 csi_handler.go:264] Attached "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.908531       1 util.go:38] Marking as attached "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.912377       1 util.go:52] Marked as attached "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.912396       1 csi_handler.go:270] Fully attached "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.912407       1 csi_handler.go:240] CSIHandler: finished processing "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.912437       1 controller.go:210] Started VA processing "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.912446       1 csi_handler.go:224] CSIHandler: processing VA "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:03:54.912452       1 csi_handler.go:246] "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262" is already attached
I1107 10:03:54.912460       1 csi_handler.go:240] CSIHandler: finished processing "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
E1107 10:06:03.592132       1 controller.go:146] Failed to reconcile volume attachments: failed to ListVolumes: failed to list volumes: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I1107 10:06:38.179421       1 controller.go:210] Started VA processing "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:06:38.179444       1 csi_handler.go:224] CSIHandler: processing VA "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:06:38.179450       1 csi_handler.go:275] Starting detach operation for "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:06:38.179488       1 csi_handler.go:282] Detaching "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:06:40.899627       1 csi_handler.go:581] Detached "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:06:40.899723       1 util.go:80] Marking as detached "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:06:40.909442       1 util.go:106] Finalizer removed from "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:06:40.909494       1 controller.go:261] Started PV processing "pvc-3fd1a3a0-20e7-48d8-8e6b-910c5b717b8d"
I1107 10:06:40.909513       1 csi_handler.go:295] Fully detached "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:06:40.909526       1 csi_handler.go:623] CSIHandler: processing PV "pvc-3fd1a3a0-20e7-48d8-8e6b-910c5b717b8d"
I1107 10:06:40.909539       1 csi_handler.go:240] CSIHandler: finished processing "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:06:40.909546       1 csi_handler.go:645] CSIHandler: processing PV "pvc-3fd1a3a0-20e7-48d8-8e6b-910c5b717b8d": no deletion timestamp, ignoring
I1107 10:06:40.909738       1 controller.go:210] Started VA processing "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262"
I1107 10:06:40.909889       1 controller.go:217] VA "csi-ceada84ed1b630ff93671879d8f088378d2dd699437e906cf2ea78ce4cf48262" deleted, ignoring
I1107 10:06:43.412486       1 reflector.go:559] k8s.io/client-go/informers/factory.go:150: Watch close - *v1.CSINode total 10 items received

There are 5 instances of the log entry CSIHandler: processing VA "csi-cea...".
- The last entry represents the detach operation.
- The first 4 entries represent attachment operations.
- The 4th attachment was ignored since the PV was already attached, hence no ControllerPublishVolume was triggered.

`cinder-csi-plugin` container log
I1107 10:00:01.399838      11 utils.go:91] [ID:1678] GRPC call: /csi.v1.Controller/ListVolumes
I1107 10:00:01.399919      11 controllerserver.go:283] ListVolumes: called with &csi.ListVolumesRequest{MaxEntries:0, StartingToken:"", XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0} request
I1107 10:00:02.461556      11 controllerserver.go:319] ListVolumes: completed with 83 entries and "" next token
I1107 10:00:52.307832      11 utils.go:91] [ID:1680] GRPC call: /csi.v1.Controller/ControllerPublishVolume
I1107 10:00:52.307911      11 controllerserver.go:175] ControllerPublishVolume: called with args {"node_id":"e85dd55e-42a8-4f18-b137-72d79e524052","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"storage.kubernetes.io/csiProvisionerIdentity":"1726839665383-8081-cinder.csi.openstack.org"},"volume_id":"922a5f0b-6c31-4c04-9660-be18812a4df5"}
I1107 10:01:02.466565      11 utils.go:91] [ID:1681] GRPC call: /csi.v1.Controller/ListVolumes
I1107 10:01:02.466671      11 controllerserver.go:283] ListVolumes: called with &csi.ListVolumesRequest{MaxEntries:0, StartingToken:"", XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0} request
I1107 10:01:02.946258      11 controllerserver.go:319] ListVolumes: completed with 83 entries and "" next token
I1107 10:02:02.956971      11 utils.go:91] [ID:1683] GRPC call: /csi.v1.Controller/ListVolumes
I1107 10:02:02.957027      11 controllerserver.go:283] ListVolumes: called with &csi.ListVolumesRequest{MaxEntries:0, StartingToken:"", XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0} request
I1107 10:02:03.588425      11 controllerserver.go:319] ListVolumes: completed with 83 entries and "" next token
I1107 10:03:03.592361      11 utils.go:91] [ID:1685] GRPC call: /csi.v1.Controller/ListVolumes
I1107 10:03:03.592404      11 controllerserver.go:283] ListVolumes: called with &csi.ListVolumesRequest{MaxEntries:0, StartingToken:"", XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0} request
I1107 10:03:52.316413      11 utils.go:91] [ID:1687] GRPC call: /csi.v1.Controller/ControllerPublishVolume
I1107 10:03:52.316462      11 controllerserver.go:175] ControllerPublishVolume: called with args {"node_id":"e85dd55e-42a8-4f18-b137-72d79e524052","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"storage.kubernetes.io/csiProvisionerIdentity":"1726839665383-8081-cinder.csi.openstack.org"},"volume_id":"922a5f0b-6c31-4c04-9660-be18812a4df5"}
I1107 10:03:54.382749      11 controllerserver.go:227] ControllerPublishVolume 922a5f0b-6c31-4c04-9660-be18812a4df5 on e85dd55e-42a8-4f18-b137-72d79e524052 is successful
I1107 10:03:54.387851      11 utils.go:91] [ID:1688] GRPC call: /csi.v1.Controller/ControllerPublishVolume
I1107 10:03:54.387868      11 controllerserver.go:175] ControllerPublishVolume: called with args {"node_id":"e85dd55e-42a8-4f18-b137-72d79e524052","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}},"volume_context":{"storage.kubernetes.io/csiProvisionerIdentity":"1726839665383-8081-cinder.csi.openstack.org"},"volume_id":"922a5f0b-6c31-4c04-9660-be18812a4df5"}
I1107 10:03:54.778611      11 openstack_volumes.go:182] Disk 922a5f0b-6c31-4c04-9660-be18812a4df5 is already attached to instance e85dd55e-42a8-4f18-b137-72d79e524052
I1107 10:03:54.908136      11 controllerserver.go:227] ControllerPublishVolume 922a5f0b-6c31-4c04-9660-be18812a4df5 on e85dd55e-42a8-4f18-b137-72d79e524052 is successful
I1107 10:06:38.179812      11 utils.go:91] [ID:1692] GRPC call: /csi.v1.Controller/ControllerUnpublishVolume
I1107 10:06:38.179832      11 controllerserver.go:239] ControllerUnpublishVolume: called with args {"node_id":"e85dd55e-42a8-4f18-b137-72d79e524052","volume_id":"922a5f0b-6c31-4c04-9660-be18812a4df5"}
I1107 10:06:39.699179      11 controllerserver.go:319] ListVolumes: completed with 83 entries and "" next token
I1107 10:06:39.766309      11 openstack_volumes.go:290] Successfully detached volume: 922a5f0b-6c31-4c04-9660-be18812a4df5 from compute: e85dd55e-42a8-4f18-b137-72d79e524052
I1107 10:06:40.899104      11 controllerserver.go:277] ControllerUnpublishVolume 922a5f0b-6c31-4c04-9660-be18812a4df5 on e85dd55e-42a8-4f18-b137-72d79e524052
I1107 10:07:03.592644      11 utils.go:91] [ID:1693] GRPC call: /csi.v1.Controller/ListVolumes
I1107 10:07:03.592672      11 controllerserver.go:283] ListVolumes: called with &csi.ListVolumesRequest{MaxEntries:0, StartingToken:"", XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0} request
I1107 10:11:03.596401      11 utils.go:91] [ID:1698] GRPC call: /csi.v1.Controller/ListVolumes
I1107 10:11:03.596452      11 controllerserver.go:283] ListVolumes: called with &csi.ListVolumesRequest{MaxEntries:0, StartingToken:"", XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0} request
I1107 10:11:04.094795      11 controllerserver.go:319] ListVolumes: completed with 83 entries and "" next token
I1107 10:12:04.097670      11 utils.go:91] [ID:1700] GRPC call: /csi.v1.Controller/ListVolumes
I1107 10:12:04.097699      11 controllerserver.go:283] ListVolumes: called with &csi.ListVolumesRequest{MaxEntries:0, StartingToken:"", XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0} request
I1107 10:12:05.059945      11 controllerserver.go:319] ListVolumes: completed with 83 entries and "" next token
I1107 10:13:05.062104      11 utils.go:91] [ID:1702] GRPC call: /csi.v1.Controller/ListVolumes
I1107 10:13:05.062165      11 controllerserver.go:283] ListVolumes: called with &csi.ListVolumesRequest{MaxEntries:0, StartingToken:"", XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0} request
I1107 10:16:33.218066      11 controllerserver.go:227] ControllerPublishVolume 922a5f0b-6c31-4c04-9660-be18812a4df5 on e85dd55e-42a8-4f18-b137-72d79e524052 is successful
I1107 10:17:05.063441      11 utils.go:91] [ID:1707] GRPC call: /csi.v1.Controller/ListVolumes

There are 3 ControllerPublishVolume calls for the attachment, aligning with logs in csi-attacher. The timestamps of these calls match.
But there are some unexpected behaviors in the log.
- The log entry ControllerPublishVolume 922a5f0b-6c31-4c04-9660-be18812a4df5 on e85dd55e-42a8-4f18-b137-72d79e524052 is successful appears 3 times for each ControllerPublishVolume call.
- Notably, the last success log entry for this attachment occurred at 10:16:33, despite the detach operation completing earlier at 10:06:40.899104.
- During this unexpected log entry, the OpenStack volume was re-attached to the node.

Environment:

  • openstack-cloud-controller-manager(or other related binary) version: v1.28.0
  • OpenStack version: stable/zed
  • Others: This k8s cluster is a magnum cluster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant