Resources: Requests, Limits & HPA
- Requests vs. Limits: Requests guarantee resources for scheduling, while Limits prevent a Pod from consuming more than its fair share (throttling for CPU, OOMKill for Memory).
- QoS Classes: Pods are prioritized for eviction based on their resource settings (Guaranteed, Burstable, or BestEffort).
- Auto-Scaling Logic: Horizontal Pod Autoscaler (HPA) requires the Metrics Server to monitor usage and adjusts replica counts based on defined utilization targets.
- Resource Governance: Use ResourceQuotas to cap total namespace consumption and LimitRanges to enforce default requests/limits for all Pods.
Before you can auto-scale, you must define how much CPU and RAM your application needs.
1. Requests vs Limits
Every container in a Pod can specify requests and limits.
Requests (The "Guarantee")
- What it means: "I need at least this much to start."
- Behavior: The Kubernetes Scheduler uses this to find a Node with enough free space. If no node has enough capacity, the Pod stays
Pending. - Best Practice: Always set requests.
Limits (The "Ceiling")
- What it means: "I should never go above this."
- Behavior:
- CPU: If you exceed the limit, your process is throttled (slowed down).
- Memory: If you exceed the limit, your process is OOMKilled (Out Of Memory Killed) and restarts.
resources:
requests:
memory: "64Mi"
cpu: "250m" # 1000m = 1 core. 250m = 1/4 core.
limits:
memory: "128Mi"
cpu: "500m"
2. Quality of Service (QoS) Classes
Kubernetes assigns a QoS class to every Pod based on its resource configuration. This determines which Pod gets killed first when the Node runs out of memory.
Guaranteed (The VIPs)
- Criteria: Requests == Limits (for both CPU and Memory).
- Behavior: Last to be killed.
Burstable (The Middle Class)
- Criteria: Requests < Limits.
- Behavior: Killed if the node is under pressure and the pod is using more than its request.
BestEffort (The Expendables)
- Criteria: No requests or limits set.
- Behavior: First to be killed when the node needs memory.
3. Horizontal Pod Autoscaling (HPA)
Once requests are set, HPA can automatically scale the number of Pods based on utilization (Request vs Usage).
The Prerequisite: Metrics Server
HPA is a consumer of data. By default, Kubernetes does NOT know how much CPU a pod is using. You must install the Metrics Server in your cluster.
- The Metrics Server scrapes data from the Kubelet on every node.
- HPA queries the Metrics API to make its scaling decisions.
- If
kubectl top podsreturns an error, HPA will not work.
Interactive Simulator
Increase the "Traffic Load" slider below. Watch how the Average CPU per Pod increases. When it exceeds the target (50%), the HPA controller adds more replicas to share the load.
The Algorithm
The HPA controller operates on the ratio between current metric value and desired metric value:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
YAML Example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
4. Namespace Restrictions
Admins can enforce rules to prevent one team from hogging the whole cluster.
ResourceQuota (The Hard Cap)
Limits the total resource usage of a Namespace.
- Example: "The
devnamespace can use max 10 CPUs and 20Gi RAM." - Effect: If you try to create a Pod that would exceed this, the API server rejects it (403 Forbidden).
LimitRange (The Defaults)
Sets default values for Pods that don't specify them.
- Example: "If a user forgets to set resources, give them
cpu: 100m, memory: 200Mi." - Effect: Prevents "BestEffort" pods from accidentally being created.