-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The batchIdleDuration
value is ignored
#1173
Comments
@MauroSoli Just to be sure, would you mind verifying if What is the size of the cluster that you are working with? Is it a test cluster that does not have too many pods/deployments? |
$ kubectl describe pod karpenter-678d69d4d5-6rpgw -n karpenter | grep BATCH_
BATCH_MAX_DURATION: 90s
BATCH_IDLE_DURATION: 10s
In the logs that I've shared there is the log that you are searching for:
We use Karpenter only to manage some specific workloads, like building or running cron jobs.
That's not what I was meaning. |
Let's say you scheduled a single pod and until 10 seconds there is no new pod then Karpenter will begin scheduling new nodeClaim after 10 seconds. However if a pod comes up before 10 seconds the batching window will be extended up to the maxDuration. From the logs that you have shared it seems like there were 10 pods and in that case Karpenter would wait until the |
I've been able to reproduce this using the latest Karpenter v0.35.4 and the latest EKS v1.29.1-eks-508b6b3 and it seems to me there could be some misunderstanding between what is stated above, what is stated in the documentation and what is the actual Karpenter behavior. If I set high and far enough times for IDLE_DURATION and MAX_DURATION (let's say 10 seconds and 300 seconds) and run a script like
In other words: the batch window gets immediately extended to Personally, I understood the current documentation in a different way. It says "
In other words: BATCH_IDLE_DURATION will always be the maximum "inactivity time" after which Karpenter starts computing node claims. To me, this makes much more sense because it allows shorter latencies between workload demands (pods creation) and its execution (node claims computed, nodes created, pods start running) while the MAX_DURATION still guarantees a maximum limit to said latency by closing batch windows even if new pods keep arriving. To me, the documentation seems correct and describes the behavior I would expect, but Karpenter is misbehaving and actually uses the IDLE_DURATION "only the first time", then skipping directly to the MAX_DURATION and thus causing higher latencies between the pods creation and the nodes starting. |
I had probably misunderstood what was being implied earlier. I was able to reproduce this. Looking into creating a fix for this. Thanks. |
/assign @jigisha620 |
/label v1 |
@billrayburn: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
Description
Observed Behavior:
Karpenter always wait to the
batchMaxDuration
value and ignore thebatchIdleDuration
value when different pods are in pending status.I change the above values to batchIdleDuration=10s and batchMaxDuration=90s so it's clearer the behaviour.
As you can see in the following image, pods are in pending state after more than 60 seconds and karpenter was not scheduling new nodeClaim yet.
Here are the controller logs:
Expected Behavior:
The NodeClaim should be created when the
batchIdleDuration
time is passed and no new pending pods have been scheduled on cluster.Versions:
0.35.4
kubectl version
):v1.29.1-eks-508b6b3
The text was updated successfully, but these errors were encountered: