FinOps: Cost Optimization
- Requests Drive Cost, Not Usage: Cloud infrastructure costs are determined by the CPU and memory your pods request, not what they actually consume. Over-requesting is the single largest source of waste in Kubernetes — most clusters run at 15-30% actual utilization.
- Visibility First: You cannot optimize what you cannot measure. Deploy Kubecost or OpenCost to attribute spending to specific teams, namespaces, and workloads. This is the foundation of any FinOps practice.
- Right-Sizing with VPA: The Vertical Pod Autoscaler analyzes historical resource usage and recommends (or automatically applies) optimal CPU and memory requests, eliminating both waste and under-provisioning.
- Spot/Preemptible Instances: Leverage heavily discounted spare cloud capacity (60-90% savings) for stateless, interruption-tolerant workloads. Use Karpenter or Cluster Autoscaler to automate spot node provisioning.
- Resource Quotas as Guardrails: Enforce per-namespace resource limits to prevent a single team from accidentally provisioning excessive infrastructure and running up the bill.
- Node Consolidation: Tools like Karpenter actively bin-pack workloads onto fewer nodes and terminate underutilized nodes, recovering wasted capacity continuously.
Cloud providers charge you for the nodes you provision — the virtual machines that form your cluster's compute capacity. If your developers request too much CPU and memory in their pod specs, you are paying for "ghost resources": capacity that is reserved on the node scheduler but never actually consumed by the application. In a typical Kubernetes cluster without active cost management, 50-70% of provisioned resources sit idle.
This guide covers the principles, tools, and strategies for bringing Kubernetes costs under control.
1. Understanding the Kubernetes Cost Model
The fundamental truth of Kubernetes cost optimization:
Your cloud bill is proportional to the sum of all pod resource requests, not the sum of actual resource usage.
Here is why this matters. When a pod declares requests: {cpu: "1", memory: "2Gi"}, the Kubernetes scheduler reserves 1 CPU core and 2 GiB of memory on a node — regardless of whether the pod actually uses those resources. If the pod only uses 100m CPU and 256Mi memory, the remaining 900m CPU and ~1.75 GiB of memory are wasted. That capacity cannot be allocated to other pods (the scheduler treats it as consumed), yet the workload uses a fraction of it.
This creates a cascading effect:
- Developers over-request to avoid OOM kills and CPU throttling.
- Over-requesting causes nodes to fill up faster.
- The cluster autoscaler provisions more nodes.
- More nodes mean a higher cloud bill.
- The actual utilization of those nodes is 15-30%.
The Cost Equation
Monthly Cost ≈ (Total Requested CPU × Cost per CPU-hour × 730 hours)
+ (Total Requested Memory × Cost per GB-hour × 730 hours)
+ (Persistent Volume costs)
+ (Network egress costs)
+ (Load balancer costs)
Compute (CPU + memory) typically accounts for 60-80% of the total bill. That is where the biggest savings are.
2. Kubecost and OpenCost: Visibility
You cannot optimize what you cannot measure. The first step in any FinOps program is deploying a cost visibility tool.
OpenCost
OpenCost is the CNCF open-source standard for Kubernetes cost monitoring. It provides:
- Real-time cost allocation by namespace, deployment, pod, and label.
- Idle cost identification — exactly how much capacity is reserved but unused.
- Cloud provider billing integration for accurate pricing.
# Install OpenCost via Helm
# helm repo add opencost https://opencost.github.io/opencost-helm-chart
# helm install opencost opencost/opencost --namespace opencost --create-namespace
apiVersion: v1
kind: ConfigMap
metadata:
name: opencost-config
namespace: opencost
data:
# Configure cloud provider pricing
default.json: |
{
"provider": "aws",
"description": "AWS US-East-1 pricing",
"CPU": "0.0464",
"RAM": "0.00580",
"storage": "0.000138889"
}
Kubecost
Kubecost builds on OpenCost with an enterprise-grade UI and additional features:
- Allocation reports: Drill down into cost by cluster, namespace, controller, pod, or any label.
- Efficiency scoring: Compare requested vs. actual usage to find the most over-provisioned workloads.
- Savings recommendations: Specific, actionable suggestions (e.g., "Reduce
store-apiCPU request from 500m to 120m to save $43/month"). - Budget alerts: Set spending thresholds per team and get notified when they are exceeded.
- Network cost tracking: Attribute cross-zone and cross-region egress to specific workloads.
What to Look For
Once you have visibility, focus on these metrics:
- Cluster utilization: Actual usage / total capacity. Target 60-70% for production (leave headroom for bursts).
- Request efficiency: Actual usage / total requests. If this is below 30%, you have significant right-sizing opportunities.
- Idle cost: The dollar value of reserved-but-unused capacity.
- Cost per team/namespace: Enables chargeback or showback models.
3. Right-Sizing with VPA
The Vertical Pod Autoscaler (VPA) analyzes historical CPU and memory usage for a workload and recommends optimal resource requests.
VPA in Recommendation Mode (Safest)
Start with updateMode: "Off" to get recommendations without automatic changes:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: store-api-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: store-api
updatePolicy:
updateMode: "Off" # recommend only, do not apply
resourcePolicy:
containerPolicies:
- containerName: store-api
minAllowed:
cpu: 50m # never recommend below 50m
memory: 64Mi # never recommend below 64Mi
maxAllowed:
cpu: 2 # never recommend above 2 cores
memory: 4Gi # never recommend above 4Gi
controlledResources:
- cpu
- memory
# View VPA recommendations
kubectl describe vpa store-api-vpa -n production
# Output includes:
# Target: cpu: 120m, memory: 280Mi (what VPA recommends)
# Lower: cpu: 80m, memory: 200Mi (minimum safe value)
# Upper: cpu: 300m, memory: 600Mi (for peak loads)
VPA in Auto Mode
In updateMode: "Auto", VPA will evict pods and recreate them with updated requests. This causes brief disruption, so it works best with PodDisruptionBudgets:
spec:
updatePolicy:
updateMode: "Auto" # automatically adjust requests
Important: VPA and HPA should not both target CPU for the same workload. They will conflict. A common pattern is to use HPA for horizontal scaling based on custom metrics and VPA for memory right-sizing only.
4. Spot and Preemptible Instances
Cloud providers sell spare compute capacity at massive discounts:
- AWS Spot Instances: 60-90% discount. 2-minute termination notice via instance metadata.
- GCP Preemptible/Spot VMs: 60-91% discount. 30-second termination notice.
- Azure Spot VMs: Up to 90% discount. 30-second termination notice.
Strategy: Split Workloads by Reliability Requirement
# Node pool labels (configured at the cloud provider level)
# On-demand pool: node-type=on-demand
# Spot pool: node-type=spot
# Stateless workloads → spot nodes
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-frontend
spec:
replicas: 6
template:
spec:
# Prefer spot, tolerate spot taints
tolerations:
- key: "cloud.google.com/gke-spot" # GKE
operator: "Equal"
value: "true"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- spot
# Spread across zones for spot availability
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: web-frontend
containers:
- name: web
image: registry.example.com/web:v1.0
resources:
requests:
cpu: 200m
memory: 256Mi
# Stateful workloads → on-demand nodes (never spot)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- on-demand
containers:
- name: postgres
image: postgres:15
Handling Spot Interruptions
- PodDisruptionBudgets: Ensure minimum replicas survive interruptions.
- Graceful shutdown: Handle SIGTERM in your application and drain connections within the 2-minute (AWS) or 30-second (GCP/Azure) window.
- Multi-zone deployment: Spread pods across availability zones — spot shortages are typically zone-specific.
- Diversify instance types: Use multiple instance families and sizes to reduce the probability of simultaneous eviction.
5. Resource Quotas for Budget Control
ResourceQuota prevents a single team or namespace from consuming excessive cluster resources:
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-alpha-budget
namespace: team-alpha
spec:
hard:
requests.cpu: "20" # max 20 CPU cores total
requests.memory: 40Gi # max 40 GiB memory total
limits.cpu: "40"
limits.memory: 80Gi
persistentvolumeclaims: "10" # max 10 PVCs
services.loadbalancers: "2" # max 2 LB services (each costs ~$18/mo)
count/deployments.apps: "20" # max 20 deployments
Combine with LimitRange to set per-pod defaults and maximums:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-alpha
spec:
limits:
- type: Container
default: # applied if developer omits limits
cpu: 200m
memory: 256Mi
defaultRequest: # applied if developer omits requests
cpu: 100m
memory: 128Mi
max: # maximum any single container can request
cpu: 4
memory: 8Gi
min: # minimum (prevents trivially small requests)
cpu: 10m
memory: 16Mi
6. Idle Resource Detection and Node Consolidation
Detecting Idle Resources
Look for these patterns of waste:
- Zombie deployments: Services with zero traffic that were never decommissioned.
- Dev/staging resources running 24/7: Scale dev environments to zero outside business hours.
- Oversized jobs: Batch jobs requesting 4 CPUs that run for 10 seconds once a day.
- Orphaned PVCs: Persistent volumes not attached to any pod.
# Find pods with very low CPU utilization (using kubectl top)
kubectl top pods -A --sort-by=cpu | head -20
# Find PVCs not mounted by any pod
kubectl get pvc -A -o json | jq '.items[] | select(.status.phase == "Bound") | .metadata.name'
Node Consolidation with Karpenter
Karpenter (originally AWS-focused, now multi-cloud via the Karpenter project) actively consolidates workloads:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"] # prefer spot
- key: node.kubernetes.io/instance-type
operator: In
values: ["m5.large", "m5.xlarge", "m5a.large", "c5.large"]
expireAfter: 720h # replace nodes after 30 days
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 60s # consolidate quickly
limits:
cpu: 100 # max 100 CPU cores total
memory: 200Gi # max 200 GiB memory total
Karpenter will:
- Detect underutilized nodes (e.g., a node using only 20% of its capacity).
- Cordon and drain the node.
- Reschedule pods onto other nodes with available capacity.
- Terminate the empty node, saving money.
7. Multi-Tenancy Cost Allocation
For organizations sharing clusters across teams, implement chargeback or showback:
- Showback: Show each team what they are spending (no actual billing). Creates awareness.
- Chargeback: Actually bill each team's cost center. Creates accountability.
Implementation pattern:
- Require a
cost-centerlabel on every namespace and deployment (enforce with Kyverno or Gatekeeper). - Use Kubecost or OpenCost to generate per-label cost reports.
- Export reports to your finance system monthly.
# Kyverno policy to enforce cost-center label
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-cost-center
spec:
validationFailureAction: Enforce
rules:
- name: check-cost-center
match:
any:
- resources:
kinds:
- Deployment
- StatefulSet
validate:
message: "A 'cost-center' label is required for cost allocation."
pattern:
metadata:
labels:
cost-center: "?*"
8. Cloud Provider-Specific Savings
AWS
- Savings Plans / Reserved Instances: Commit to 1-3 year usage for 30-60% discount on baseline capacity.
- Graviton (ARM) instances: 20-40% cheaper than equivalent x86 instances. Ensure your container images are multi-arch.
- EBS gp3 volumes: 20% cheaper than gp2 with better baseline performance.
GCP
- Committed Use Discounts (CUDs): 1-3 year commitments for 28-52% discount.
- E2 machine types: Cost-optimized for general workloads.
- GKE Autopilot: Pay per pod resource request, not per node. Eliminates node-level waste.
Azure
- Azure Reservations: 1-3 year commitments for up to 72% discount.
- Azure Spot VMs: Up to 90% discount with eviction handling.
- B-series burstable VMs: Cost-effective for workloads with low average CPU but occasional spikes.
Common Pitfalls
- Setting requests equal to limits: This prevents bin-packing and wastes capacity. Set requests to the p95 usage and limits to the maximum burst your application needs.
- Ignoring network costs: Cross-zone and cross-region traffic can add up to 20-30% of the compute bill. Use topology-aware routing and keep pods close to their data.
- Not accounting for system overhead: DaemonSets (logging, monitoring, CNI agents) consume resources on every node. Factor this in when calculating node capacity.
- Running VPA in Auto mode without PDBs: VPA evicts pods to resize them. Without a PodDisruptionBudget, this can cause downtime.
- Buying reserved instances too early: Understand your actual usage patterns for 1-2 months before committing to reservations.
- Treating all workloads the same: Not all workloads can run on spot. Databases, message queues, and singleton controllers need on-demand instances.
Best Practices
- Deploy cost visibility (Kubecost/OpenCost) before any optimization. You need a baseline.
- Start with VPA in recommendation mode and manually review suggestions before enabling auto mode.
- Use spot instances for 60-80% of stateless workloads with proper disruption handling.
- Set ResourceQuotas on every namespace — even generous ones prevent runaway spending.
- Require
cost-centerlabels on all workloads for accurate chargeback. - Review and right-size monthly — workload patterns change over time.
- Scale non-production environments to zero outside business hours using cron-based scaling (KEDA, CronJobs, or Karpenter's
expireAfter). - Use multi-arch container images to take advantage of cheaper ARM instances (AWS Graviton, GCP Tau T2A).
- Set a FinOps review cadence — weekly for the first month, then monthly. Assign an owner for cluster cost.
What's Next?
- Learn about the Cluster Autoscaler and Karpenter for automated node scaling and consolidation.
- Explore Multi-Tenancy for sharing clusters across teams with proper resource isolation.
- See Policy as Code to enforce resource requests, cost-center labels, and prevent wasteful configurations.
- Understand Logging Architecture cost implications — logging at scale can be a significant hidden cost.