You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have created EKSA Cluster with following configuration,
unhealthyMachineTimeout set to 30 seconds (minimum value) in the Cluster config file Worker node section
Enabled Autoscaling configuration in cluster config file for worker nodes
Installed Cluster Autoscaler curated package on the cluster
I went through two scenarios post cluster creation,
Scenario 1: Navigate to VMWare vSphere console, Click on one of worker node, Right Click and Power Off
Scenario 2: Click on one of worker node, Right Click > Power Off, Right click again > Delete from the disk
Scenario 1 fails all the time. No new node is created. capv pod logs do not show any event that node is unhealthy until 4-5 minutes. And then, node either gets deleted and new node is provisioned or node gets powered on.
Scenario 2 works all the time. Post deletion of node, new node gets provisioned within 30 seconds.
For scenario 1, capv should respect unhealthyMachineTimeout 30 seconds value. When unhealthyMachineTimeout is set to 5 minutes, capv takes around 20-40 minutes to realize the node is powered off or not ready.
I am not sure if we need something like a node termination handler that Amazon EKS on cloud has.
How to reproduce it (as minimally and precisely as possible):
Configure worker node section of Cluster config file as following.
The text was updated successfully, but these errors were encountered:
saiteja313
changed the title
unhealthyMachineTimeout not working when VM is powered off and VM not deleted from the disk
unhealthyMachineTimeout not working when VM is powered off (VM not deleted from disk)
Sep 17, 2024
What happened:
I have created EKSA Cluster with following configuration,
I went through two scenarios post cluster creation,
Scenario 1 fails all the time. No new node is created. capv pod logs do not show any event that node is unhealthy until 4-5 minutes. And then, node either gets deleted and new node is provisioned or node gets powered on.
Scenario 2 works all the time. Post deletion of node, new node gets provisioned within 30 seconds.
[1] https://anywhere.eks.amazonaws.com/docs/getting-started/optional/healthchecks/#__machinehealthcheckunhealthymachinetimeout__-optional
What you expected to happen:
For scenario 1, capv should respect unhealthyMachineTimeout 30 seconds value. When unhealthyMachineTimeout is set to 5 minutes, capv takes around 20-40 minutes to realize the node is powered off or not ready.
I am not sure if we need something like a node termination handler that Amazon EKS on cloud has.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment: EKSA with vSphere
The text was updated successfully, but these errors were encountered: