-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] refactor: split termination controller #1837
base: main
Are you sure you want to change the base?
[WIP] refactor: split termination controller #1837
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jmdeal The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f7e2b80
to
587d150
Compare
587d150
to
c55f95c
Compare
c705878
to
52a0907
Compare
5392f91
to
6c8bccd
Compare
6c8bccd
to
0fdc2a2
Compare
/hold I'm separating out the observability changes in this PR so I can get the essential feature change in and prioritize other work. I'll come back to this refactor, the current sticking point blocking this is the rollback story when adding additional finalizers. |
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
Fixes #N/A
Description
This PR splits the node termination controller into three. Each controller has an associated finalizer, with the final controller (Instance Termination) being unable to reconciile until the drain and volume finalizers have been removed.
karpenter.sh/drain-protection
karpenter.sh/volume-protection
karpenter.sh/termination
This change was motivated by the increased complexity of the termination controller once additional status conditions were added for drain and volume detachment monitoring. Alternatively, the subreconciler pattern (a la the NodeClaim lifecycle controller) could have been used. However, this approach has some additional observability benefits thanks to the per-controller metrics, and is subjectively easier to test and maintain.
How was this change tested?
make test
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.