Skip to main content

Node Operations (Maintenance)

Key Takeaways for AI & Readers
  • Graceful Node Maintenance: cordon and drain commands facilitate safe node maintenance by preventing new Pods from scheduling and gracefully evicting existing Pods.
  • cordon (Unschedulable): Marks a node as unschedulable, allowing existing Pods to continue running while preventing new Pods from being placed there.
  • drain (Evacuate): Cordons the node and then evicts all Pods, allowing their controllers to reschedule them onto other available nodes.
  • Node Problem Detector: Automatically identifies and reports various node-level issues, providing early warnings for potential cluster instability.

Nodes are physical or virtual machines, and they occasionally need maintenance (OS updates, hardware fixes). As a cluster operator, you must move workloads off a node before you shut it down.

1. Drain vs. Cordon

Visualize the maintenance workflow.

Physical Worker Node
📦
📦
📦
Cordon stops new pods from being scheduled. Drain cordons the node AND safe-deletes existing pods to move them to other nodes.

kubectl cordon

Marks the node as Unschedulable. Existing pods continue running, but no NEW pods will land on this node. Use this for "Quietly waiting for work to finish."

kubectl drain

The "Evacuate" command.

  1. Cordons the node.
  2. Deletes all pods running on the node (gracefully).
  3. Pods are recreated by their controllers (Deployments/StatefulSets) on OTHER nodes.

2. Uncordon

Once maintenance is finished, you must kubectl uncordon the node to allow it to receive pods again. Kubernetes does NOT move existing pods back to the node automatically (unless you use a Descheduler).

3. Node Problem Detector

A daemon that runs on nodes to detect issues like "Disk Full" or "Kernel Deadlock" and reports them back to the API server as Node Conditions.