FinOps: Cost Optimization

Key Takeaways for AI & Readers

Requests Drive Cost, Not Usage: Cloud infrastructure costs are determined by the CPU and memory your pods request, not what they actually consume. Over-requesting is the single largest source of waste in Kubernetes — most clusters run at 15-30% actual utilization.
Visibility First: You cannot optimize what you cannot measure. Deploy Kubecost or OpenCost to attribute spending to specific teams, namespaces, and workloads. This is the foundation of any FinOps practice.
Right-Sizing with VPA: The Vertical Pod Autoscaler analyzes historical resource usage and recommends (or automatically applies) optimal CPU and memory requests, eliminating both waste and under-provisioning.
Spot/Preemptible Instances: Leverage heavily discounted spare cloud capacity (60-90% savings) for stateless, interruption-tolerant workloads. Use Karpenter or Cluster Autoscaler to automate spot node provisioning.
Resource Quotas as Guardrails: Enforce per-namespace resource limits to prevent a single team from accidentally provisioning excessive infrastructure and running up the bill.
Node Consolidation: Tools like Karpenter actively bin-pack workloads onto fewer nodes and terminate underutilized nodes, recovering wasted capacity continuously.

Cloud providers charge you for the nodes you provision — the virtual machines that form your cluster's compute capacity. If your developers request too much CPU and memory in their pod specs, you are paying for "ghost resources": capacity that is reserved on the node scheduler but never actually consumed by the application. In a typical Kubernetes cluster without active cost management, 50-70% of provisioned resources sit idle.

This guide covers the principles, tools, and strategies for bringing Kubernetes costs under control.

1. Understanding the Kubernetes Cost Model

The fundamental truth of Kubernetes cost optimization:

Your cloud bill is proportional to the sum of all pod resource requests, not the sum of actual resource usage.

spec:

replicas:

template:

spec:

resources:

requests:

cpu: m

memory: Mi

Estimated Monthly Cost

$18.00

Efficiency (Requests vs Usage)42%

💡 Kubecost suggests reducing CPU to 100m.

In Kubernetes, you pay for what you Request, not what you use. Tuning these numbers is the key to FinOps.

Here is why this matters. When a pod declares requests: {cpu: "1", memory: "2Gi"}, the Kubernetes scheduler reserves 1 CPU core and 2 GiB of memory on a node — regardless of whether the pod actually uses those resources. If the pod only uses 100m CPU and 256Mi memory, the remaining 900m CPU and ~1.75 GiB of memory are wasted. That capacity cannot be allocated to other pods (the scheduler treats it as consumed), yet the workload uses a fraction of it.

This creates a cascading effect:

Developers over-request to avoid OOM kills and CPU throttling.
Over-requesting causes nodes to fill up faster.
The cluster autoscaler provisions more nodes.
More nodes mean a higher cloud bill.
The actual utilization of those nodes is 15-30%.

The Cost Equation

Monthly Cost ≈ (Total Requested CPU × Cost per CPU-hour × 730 hours)
             + (Total Requested Memory × Cost per GB-hour × 730 hours)
             + (Persistent Volume costs)
             + (Network egress costs)
             + (Load balancer costs)

Compute (CPU + memory) typically accounts for 60-80% of the total bill. That is where the biggest savings are.

2. Kubecost and OpenCost: Visibility

You cannot optimize what you cannot measure. The first step in any FinOps program is deploying a cost visibility tool.

OpenCost

OpenCost is the CNCF open-source standard for Kubernetes cost monitoring. It provides:

Real-time cost allocation by namespace, deployment, pod, and label.
Idle cost identification — exactly how much capacity is reserved but unused.
Cloud provider billing integration for accurate pricing.

# Install OpenCost via Helm
# helm repo add opencost https://opencost.github.io/opencost-helm-chart
# helm install opencost opencost/opencost --namespace opencost --create-namespace
apiVersion: v1
kind: ConfigMap
metadata:
  name: opencost-config
  namespace: opencost
data:
  # Configure cloud provider pricing
  default.json: |
    {
      "provider": "aws",
      "description": "AWS US-East-1 pricing",
      "CPU": "0.0464",
      "RAM": "0.00580",
      "storage": "0.000138889"
    }

Kubecost

Kubecost builds on OpenCost with an enterprise-grade UI and additional features:

Allocation reports: Drill down into cost by cluster, namespace, controller, pod, or any label.
Efficiency scoring: Compare requested vs. actual usage to find the most over-provisioned workloads.
Savings recommendations: Specific, actionable suggestions (e.g., "Reduce store-api CPU request from 500m to 120m to save $43/month").
Budget alerts: Set spending thresholds per team and get notified when they are exceeded.
Network cost tracking: Attribute cross-zone and cross-region egress to specific workloads.

What to Look For

Once you have visibility, focus on these metrics:

Cluster utilization: Actual usage / total capacity. Target 60-70% for production (leave headroom for bursts).
Request efficiency: Actual usage / total requests. If this is below 30%, you have significant right-sizing opportunities.
Idle cost: The dollar value of reserved-but-unused capacity.
Cost per team/namespace: Enables chargeback or showback models.

3. Right-Sizing with VPA

The Vertical Pod Autoscaler (VPA) analyzes historical CPU and memory usage for a workload and recommends optimal resource requests.

VPA in Recommendation Mode (Safest)

Start with updateMode: "Off" to get recommendations without automatic changes:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: store-api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: store-api
  updatePolicy:
    updateMode: "Off"           # recommend only, do not apply
  resourcePolicy:
    containerPolicies:
      - containerName: store-api
        minAllowed:
          cpu: 50m              # never recommend below 50m
          memory: 64Mi          # never recommend below 64Mi
        maxAllowed:
          cpu: 2                # never recommend above 2 cores
          memory: 4Gi           # never recommend above 4Gi
        controlledResources:
          - cpu
          - memory

# View VPA recommendations
kubectl describe vpa store-api-vpa -n production
# Output includes:
#   Target:     cpu: 120m, memory: 280Mi  (what VPA recommends)
#   Lower:      cpu: 80m,  memory: 200Mi  (minimum safe value)
#   Upper:      cpu: 300m, memory: 600Mi  (for peak loads)

VPA in Auto Mode

In updateMode: "Auto", VPA will evict pods and recreate them with updated requests. This causes brief disruption, so it works best with PodDisruptionBudgets:

spec:
  updatePolicy:
    updateMode: "Auto"          # automatically adjust requests

Important: VPA and HPA should not both target CPU for the same workload. They will conflict. A common pattern is to use HPA for horizontal scaling based on custom metrics and VPA for memory right-sizing only.

4. Spot and Preemptible Instances

Cloud providers sell spare compute capacity at massive discounts:

AWS Spot Instances: 60-90% discount. 2-minute termination notice via instance metadata.
GCP Preemptible/Spot VMs: 60-91% discount. 30-second termination notice.
Azure Spot VMs: Up to 90% discount. 30-second termination notice.

Strategy: Split Workloads by Reliability Requirement

# Node pool labels (configured at the cloud provider level)
# On-demand pool: node-type=on-demand
# Spot pool:      node-type=spot

# Stateless workloads → spot nodes
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-frontend
spec:
  replicas: 6
  template:
    spec:
      # Prefer spot, tolerate spot taints
      tolerations:
        - key: "cloud.google.com/gke-spot"    # GKE
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-type
                    operator: In
                    values:
                      - spot
      # Spread across zones for spot availability
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: web-frontend
      containers:
        - name: web
          image: registry.example.com/web:v1.0
          resources:
            requests:
              cpu: 200m
              memory: 256Mi

# Stateful workloads → on-demand nodes (never spot)
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-type
                    operator: In
                    values:
                      - on-demand
      containers:
        - name: postgres
          image: postgres:15

Handling Spot Interruptions

PodDisruptionBudgets: Ensure minimum replicas survive interruptions.
Graceful shutdown: Handle SIGTERM in your application and drain connections within the 2-minute (AWS) or 30-second (GCP/Azure) window.
Multi-zone deployment: Spread pods across availability zones — spot shortages are typically zone-specific.
Diversify instance types: Use multiple instance families and sizes to reduce the probability of simultaneous eviction.

5. Resource Quotas for Budget Control

ResourceQuota prevents a single team or namespace from consuming excessive cluster resources:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-budget
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "20"                # max 20 CPU cores total
    requests.memory: 40Gi             # max 40 GiB memory total
    limits.cpu: "40"
    limits.memory: 80Gi
    persistentvolumeclaims: "10"      # max 10 PVCs
    services.loadbalancers: "2"       # max 2 LB services (each costs ~$18/mo)
    count/deployments.apps: "20"      # max 20 deployments

Combine with LimitRange to set per-pod defaults and maximums:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-alpha
spec:
  limits:
    - type: Container
      default:                       # applied if developer omits limits
        cpu: 200m
        memory: 256Mi
      defaultRequest:                # applied if developer omits requests
        cpu: 100m
        memory: 128Mi
      max:                           # maximum any single container can request
        cpu: 4
        memory: 8Gi
      min:                           # minimum (prevents trivially small requests)
        cpu: 10m
        memory: 16Mi

6. Idle Resource Detection and Node Consolidation

Detecting Idle Resources

Look for these patterns of waste:

Zombie deployments: Services with zero traffic that were never decommissioned.
Dev/staging resources running 24/7: Scale dev environments to zero outside business hours.
Oversized jobs: Batch jobs requesting 4 CPUs that run for 10 seconds once a day.
Orphaned PVCs: Persistent volumes not attached to any pod.

# Find pods with very low CPU utilization (using kubectl top)
kubectl top pods -A --sort-by=cpu | head -20

# Find PVCs not mounted by any pod
kubectl get pvc -A -o json | jq '.items[] | select(.status.phase == "Bound") | .metadata.name'

Node Consolidation with Karpenter

Karpenter (originally AWS-focused, now multi-cloud via the Karpenter project) actively consolidates workloads:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]    # prefer spot
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m5.large", "m5.xlarge", "m5a.large", "c5.large"]
      expireAfter: 720h                     # replace nodes after 30 days
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 60s                   # consolidate quickly
  limits:
    cpu: 100                                # max 100 CPU cores total
    memory: 200Gi                           # max 200 GiB memory total

Karpenter will:

Detect underutilized nodes (e.g., a node using only 20% of its capacity).
Cordon and drain the node.
Reschedule pods onto other nodes with available capacity.
Terminate the empty node, saving money.

7. Multi-Tenancy Cost Allocation

For organizations sharing clusters across teams, implement chargeback or showback:

Showback: Show each team what they are spending (no actual billing). Creates awareness.
Chargeback: Actually bill each team's cost center. Creates accountability.

Implementation pattern:

Require a cost-center label on every namespace and deployment (enforce with Kyverno or Gatekeeper).
Use Kubecost or OpenCost to generate per-label cost reports.
Export reports to your finance system monthly.

# Kyverno policy to enforce cost-center label
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-cost-center
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-cost-center
      match:
        any:
          - resources:
              kinds:
                - Deployment
                - StatefulSet
      validate:
        message: "A 'cost-center' label is required for cost allocation."
        pattern:
          metadata:
            labels:
              cost-center: "?*"

8. Cloud Provider-Specific Savings

AWS

Savings Plans / Reserved Instances: Commit to 1-3 year usage for 30-60% discount on baseline capacity.
Graviton (ARM) instances: 20-40% cheaper than equivalent x86 instances. Ensure your container images are multi-arch.
EBS gp3 volumes: 20% cheaper than gp2 with better baseline performance.

GCP

Committed Use Discounts (CUDs): 1-3 year commitments for 28-52% discount.
E2 machine types: Cost-optimized for general workloads.
GKE Autopilot: Pay per pod resource request, not per node. Eliminates node-level waste.

Azure

Azure Reservations: 1-3 year commitments for up to 72% discount.
Azure Spot VMs: Up to 90% discount with eviction handling.
B-series burstable VMs: Cost-effective for workloads with low average CPU but occasional spikes.

Common Pitfalls

Setting requests equal to limits: This prevents bin-packing and wastes capacity. Set requests to the p95 usage and limits to the maximum burst your application needs.
Ignoring network costs: Cross-zone and cross-region traffic can add up to 20-30% of the compute bill. Use topology-aware routing and keep pods close to their data.
Not accounting for system overhead: DaemonSets (logging, monitoring, CNI agents) consume resources on every node. Factor this in when calculating node capacity.
Running VPA in Auto mode without PDBs: VPA evicts pods to resize them. Without a PodDisruptionBudget, this can cause downtime.
Buying reserved instances too early: Understand your actual usage patterns for 1-2 months before committing to reservations.
Treating all workloads the same: Not all workloads can run on spot. Databases, message queues, and singleton controllers need on-demand instances.

Best Practices

Deploy cost visibility (Kubecost/OpenCost) before any optimization. You need a baseline.
Start with VPA in recommendation mode and manually review suggestions before enabling auto mode.
Use spot instances for 60-80% of stateless workloads with proper disruption handling.
Set ResourceQuotas on every namespace — even generous ones prevent runaway spending.
Require cost-center labels on all workloads for accurate chargeback.
Review and right-size monthly — workload patterns change over time.
Scale non-production environments to zero outside business hours using cron-based scaling (KEDA, CronJobs, or Karpenter's expireAfter).
Use multi-arch container images to take advantage of cheaper ARM instances (AWS Graviton, GCP Tau T2A).
Set a FinOps review cadence — weekly for the first month, then monthly. Assign an owner for cluster cost.

What's Next?

Learn about the Cluster Autoscaler and Karpenter for automated node scaling and consolidation.
Explore Multi-Tenancy for sharing clusters across teams with proper resource isolation.
See Policy as Code to enforce resource requests, cost-center labels, and prevent wasteful configurations.
Understand Logging Architecture cost implications — logging at scale can be a significant hidden cost.

1. Understanding the Kubernetes Cost Model​

The Cost Equation​

2. Kubecost and OpenCost: Visibility​

OpenCost​

Kubecost​

What to Look For​

3. Right-Sizing with VPA​

VPA in Recommendation Mode (Safest)​

VPA in Auto Mode​

4. Spot and Preemptible Instances​

Strategy: Split Workloads by Reliability Requirement​

Handling Spot Interruptions​

5. Resource Quotas for Budget Control​

6. Idle Resource Detection and Node Consolidation​

Detecting Idle Resources​

Node Consolidation with Karpenter​

7. Multi-Tenancy Cost Allocation​

8. Cloud Provider-Specific Savings​

AWS​

GCP​

Azure​

Common Pitfalls​

Best Practices​

What's Next?​