Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not consider daemonset resources for node pool if the daemonset matches the nodepool using a node Affininty #6391

Closed
myaser opened this issue Jun 20, 2024 · 1 comment
Labels
bug Something isn't working needs-triage Issues that need to be triaged

Comments

@myaser
Copy link
Contributor

myaser commented Jun 20, 2024

Description

Observed Behavior:
I craeted a daemonset that has the following nodeAffinity

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: zalando.org/nvidia-gpu
                operator: Exists
            - matchExpressions:
              - key: karpenter.k8s.aws/instance-gpu-manufacturer
                operator: In
                values:
                - nvidia

observing karpenter logs and experimenting with scheduling pods of different sizes, I could find that karpenter's calculations for daemonset resources excludes this pod

I could confirm this by checking the code.

here it is reading only the first of the affinities relying on an outer loop to remove the first affinity and continue with the next one

but, this is not happening for daemonset calculation as shown here

to validate my findings, I flipped the affinities order and the calculations were corrected

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: karpenter.k8s.aws/instance-gpu-manufacturer
                operator: In
                values:
                - nvidia
            - matchExpressions:
              - key: zalando.org/nvidia-gpu
                operator: Exists

Expected Behavior:
all affinities should be considered for calculating daemonset resources

Reproduction Steps (Please include YAML):

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: datalab-workloads
spec:
  disruption:
    budgets:
    - nodes: 10%
    consolidationPolicy: WhenUnderutilized
    expireAfter: Never
  template:
    metadata:
    spec:
      kubelet:
        clusterDNS:
        - 10.0.1.100
        cpuCFSQuota: false
        kubeReserved:
          cpu: 100m
          memory: 282Mi
        maxPods: 32
        systemReserved:
          cpu: 100m
          memory: 164Mi
      nodeClassRef:
        name: datalab-workloads
      requirements:
      - key: node.kubernetes.io/instance-type
        operator: In
        values:
        - g4dn.xlarge
        - g4dn.4xlarge
        - g4dn.12xlarge
        - g4dn.16xlarge
        - g4dn.metal
        - g5.xlarge
        - g5.4xlarge
        - g5.16xlarge
        - g5.24xlarge
        - g5.48xlarge
        - g3s.xlarge
        - g4dn.2xlarge
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - spot
        - on-demand
      - key: kubernetes.io/arch
        operator: In
        values:
        - arm64
        - amd64
      - key: topology.kubernetes.io/zone
        operator: In
        values:
        - eu-central-1a
        - eu-central-1b
        - eu-central-1c
      startupTaints:
      - effect: NoSchedule
        key: zalando.org/node-not-ready
      taints:
      - effect: NoSchedule
        key: dedicated
        value: datalab-workloads

Versions:

  • karpenter Version: 0.36.2
  • Kubernetes Version (kubectl version): v1.30.2
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@myaser myaser added bug Something isn't working needs-triage Issues that need to be triaged labels Jun 20, 2024
@myaser
Copy link
Contributor Author

myaser commented Jun 20, 2024

closed in favor of kubernetes-sigs/karpenter#1337.
this was created in the wrong repo

@myaser myaser closed this as completed Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage Issues that need to be triaged
Projects
None yet
Development

No branches or pull requests

1 participant