Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to watch *v1.NodeClaim: failed to list *v1.NodeClaim: request to convert CR from an invalid group/version: karpenter.sh/v1beta1 #1886

Open
saikumarbommakanti opened this issue Dec 17, 2024 · 14 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@saikumarbommakanti
Copy link

Description

Observed Behavior:
Failed to watch *v1.NodeClaim: failed to list *v1.NodeClaim: request to convert CR from an invalid group/version: karpenter.sh/v1beta1
Expected Behavior:
the karpenter helm chart was upgraded to v1.1.0 but seeing the above error
Reproduction Steps (Please include YAML):
Install the karpenter v1.1.0 on eks cluster and was not able to bring the kapenter pod up
Versions:

  • Chart Version: v1.1.1
  • Kubernetes Version (kubectl version): v1.31.2 is the server version
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@saikumarbommakanti saikumarbommakanti added the kind/bug Categorizes issue or PR as related to a bug. label Dec 17, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 17, 2024
@jmdeal
Copy link
Member

jmdeal commented Dec 17, 2024

Did you ensure all stored versions of Karpenter's CRs were migrated to v1 before upgrading to v1.1.0? There's a reference to how to validate this in the upgrade guide.

/triage needs-information

@k8s-ci-robot k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 17, 2024
@saikumarbommakanti
Copy link
Author

saikumarbommakanti commented Dec 18, 2024

Yes it was migrated to v1.1.0, but the error says failed to list *v1.NodeClaim: request to convert CR from an invalid group/version: karpenter.sh/v1beta1
for crd in "nodepools.karpenter.sh" "nodeclaims.karpenter.sh" "ec2nodeclasses.karpenter.k8s.aws"; do

kubectl get crd ${crd} -ojsonpath="{.status.storedVersions}{'\n'}"

done

["v1"]

["v1"]

["v1"]

@jmdeal
Copy link
Member

jmdeal commented Dec 19, 2024

What version of Karpenter did you migrate from to v1.1.0 and did you check this before performing the upgrade?

@saikumarbommakanti
Copy link
Author

it was from 0.36.0 to v1.1.0

@jmdeal
Copy link
Member

jmdeal commented Dec 19, 2024

Did you upgrade directly, or did you upgrade through v1.0?

@abhilashsupare
Copy link

I am also facing the same issue, I have upgraded from v1.0.0 to v1.1.0,

@jmdeal
Copy link
Member

jmdeal commented Dec 19, 2024

@abhilashsupare did you follow all of the steps in the migration guide, particularly ensuring that all stored versions for CRs were migrated before upgrading?

@saikumarbommakanti
Copy link
Author

Yes it was migrated to v1.1.0, but the error says failed to list *v1.NodeClaim: request to convert CR from an invalid group/version: karpenter.sh/v1beta1
for crd in "nodepools.karpenter.sh" "nodeclaims.karpenter.sh" "ec2nodeclasses.karpenter.k8s.aws"; do

kubectl get crd ${crd} -ojsonpath="{.status.storedVersions}{'\n'}"
done

["v1"]

["v1"]

["v1"]

@saikumarbommakanti
Copy link
Author

yes it is migraded and pasted you the output

@jmdeal
Copy link
Member

jmdeal commented Dec 20, 2024

@saikumarbommakanti I'm trying to understand what steps you took to migrate from v0.36.0 to v1.1.0. Did you upgrade directly, or did you upgrade to v1.0.x as an interim step. If you did, what patch version of v1.0.x did you migrate to? Did you follow the steps in the migration guide, and if not could you explain what steps you took?

@jmdeal
Copy link
Member

jmdeal commented Dec 23, 2024

/assign jmdeal

@rashkur
Copy link

rashkur commented Jan 6, 2025

Getting the same error after the upgrade

{"level":"ERROR","time":"2025-01-06T13:06:33.314Z","logger":"controller","message":"Unhandled Error","commit":"0a85efb","logger":"UnhandledError","error":"k8s.io/[email protected]/tools/cache/reflector.go:243: Failed to watch *v1.NodeClaim: failed to list *v1.NodeClaim: request to convert CR from an invalid group/version: karpenter.sh/v1beta1"}
for crd in "nodepools.karpenter.sh" "nodeclaims.karpenter.sh" "ec2nodeclasses.karpenter.k8s.aws"; do kubectl get crd ${crd} -ojsonpath="{.status.storedVersions}{'\n'}" ; done
["v1"]
["v1"]
["v1"]

@abhilashsupare
Copy link

abhilashsupare commented Jan 7, 2025

My issue was resolved by updating both helm charts(controller and CRD ) on the same level (1.0.8)
thanks @jmdeal

@rashkur
Copy link

rashkur commented Jan 7, 2025

I have an argo deployment and this is what helped in my case:

After failed upgrade to 1.1.0 via Argo, I've rolled back to v 0.37 (I have a custom chart which includes crds) and disabled sync in argo, deleted all installed part in argo including CRDS
"nodepools.karpenter.sh" "nodeclaims.karpenter.sh" "ec2nodeclasses.karpenter.k8s.aws"
They may hang becasue of finalizers.

After that I've manually installed helm charts (karpenter and karpenter crds) from v1.

export KARPENTER_VERSION="1.0.8"
helm3 upgrade --install karpenter-crd oci://public.ecr.aws/karpenter/karpenter-crd --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" --create-namespace \
    --set webhook.enabled=true \
    --set webhook.serviceName="karpenter" \
    --set webhook.port=8443
helm3 upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version ${KARPENTER_VERSION} --namespace "${KARPENTER_NAMESPACE}" --create-namespace \
  --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=${KARPENTER_IAM_ROLE_ARN} \
  --set settings.clusterName=${CLUSTER_NAME} \
  --set settings.interruptionQueue=${CLUSTER_NAME} \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi

It was throwing some errors like

{"level":"ERROR","time":"2025-xx","logger":"webhook.ConversionWebhook","message":"Reconcile error","commit":"a2875e3","knative.dev/traceid":"d5f875a6-7988-4bc5-8452-aa7e6c5b4692","knative.dev/key":"ec2nodeclasses.karpenter.k8s.aws","duration":"113.724849ms","error":"failed to update webhook: Operation cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io \"ec2nodeclasses.karpenter.k8s.aws\": the object has been modified; please apply your changes to the latest version and try again"}
{"level":"ERROR","time":"2025-xx","logger":"webhook","message":"http: TLS handshake error from 10.3.162.88:38504: EOF\n","commit":"a2875e3"}
{"level":"ERROR","time":"2025-xx","logger":"webhook","message":"http: TLS handshake error from 10.3.243.221:57554: EOF\n","commit":"a2875e3"}
{"level":"ERROR","time":"2025-xx","logger":"webhook","message":"http: TLS handshake error from 10.3.243.221:57572: EOF\n","commit":"a2875e3"}
{"level":"ERROR","time":"2025-xx","logger":"webhook","message":"http: TLS handshake error from 10.3.243.221:57614: read tcp 10.3.73.66:8443->10.3.243.221:57614: read: connection reset by peer\n","commit":"a2875e3"}

Theh I've added simple nodepool+ec2nodeclass and tested it. It worked as expected.

Then I've removed manually installed helm charts and installed v1.1 again via argocd with CRDs included into the chart and it also started to work.


just remembered: I've also had a rouge karpenter node which was visible in get nodes but not in nodeclaims so check your nodes before upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

5 participants