Topology Spread Constraints

Key Takeaways for AI & Readers

High Availability Enforcement: TopologySpreadConstraints ensure Pods are evenly distributed across different topological domains (nodes, zones, regions) to prevent single points of failure. If an entire availability zone goes down, properly spread workloads continue serving from remaining zones.
Minimizing Skew: The maxSkew parameter defines the maximum allowed difference in pod count between the most-populated and least-populated topology domain. A maxSkew of 1 enforces near-perfect balance; higher values allow more flexibility in scheduling.
Granular Control with topologyKey: By specifying topologyKey (for example, topology.kubernetes.io/zone for zone-level spreading or kubernetes.io/hostname for node-level spreading), you dictate the axis along which Pods should be distributed.
Flexible Scheduling: The whenUnsatisfiable field offers two modes: DoNotSchedule (hard constraint -- pods stay Pending if the spread cannot be satisfied) and ScheduleAnyway (soft constraint -- the scheduler tries its best but schedules the pod regardless). This lets you balance between strict HA and scheduling flexibility.
Complement to Pod Anti-Affinity: TopologySpreadConstraints provide more granular control than Pod Anti-Affinity. Anti-affinity is binary (one pod per domain or zero), while topology spread allows multiple pods per domain as long as the skew across domains stays within bounds.

How do you ensure your application stays online when an entire data center or availability zone goes dark? Even if you have 10 replicas, the Kubernetes scheduler might place all of them on the same node or in the same availability zone (AZ). If that zone experiences an outage, all 10 replicas go down simultaneously.

TopologySpreadConstraints allow you to control how Pods are distributed across your cluster's topology, ensuring true high availability by spreading replicas across failure domains.

1. The Problem: Uneven Distribution

Without topology spread constraints, the Kubernetes scheduler optimizes for resource utilization and node fit, not for distribution across failure domains. Consider a cluster with three availability zones:

Zone A: 10 nodes, 80% utilized
Zone B: 5 nodes, 50% utilized
Zone C: 5 nodes, 50% utilized

When you deploy 6 replicas, the scheduler may place 4 in Zone B and 2 in Zone C (because those zones have the most available resources), leaving zero replicas in Zone A. If Zone B fails, you lose 4 of 6 replicas instantly.

The goal of topology spread is to minimize skew -- the difference in pod count between the most-populated and least-populated zones.

Topology Spread

maxSkew: 1 | whenUnsatisfiable: DoNotSchedule

AZ-A

0 Pods

AZ-B

0 Pods

AZ-C

0 Pods

Kubernetes ensures the difference between the minimum and maximum count in each zone is never greater than maxSkew.

2. Core Parameters

topologyKey

The topologyKey is a node label key that defines the topology domains. The scheduler groups nodes by the value of this label and distributes pods across the resulting groups.

Common topology keys:

topologyKey	What It Spreads Across	Use Case
`topology.kubernetes.io/zone`	Availability zones	Zone-level HA (most common)
`topology.kubernetes.io/region`	Cloud regions	Multi-region spreading
`kubernetes.io/hostname`	Individual nodes	Node-level spreading (similar to anti-affinity)
Custom label (e.g., `rack`)	Custom domains	Rack-aware scheduling in bare-metal clusters

maxSkew

The maximum degree to which pods may be unevenly distributed across topology domains. The skew is calculated as: maxPodCount(domain) - minPodCount(domain).

maxSkew: 1: The difference between the most-populated and least-populated domain can be at most 1. This enforces near-perfect balance. With 6 replicas across 3 zones, you get 2-2-2.
maxSkew: 2: Allows more imbalance. With 6 replicas across 3 zones, distributions like 3-2-1 are acceptable.
maxSkew: 3 or higher: Provides very loose spreading. Useful when you want a soft preference for distribution without strict enforcement.

whenUnsatisfiable

Determines what happens when the scheduler cannot place a pod while satisfying the maxSkew constraint.

DoNotSchedule (hard constraint): The pod stays in Pending state until a node becomes available that satisfies the skew constraint. Use this for critical production workloads where HA is non-negotiable.
ScheduleAnyway (soft constraint): The scheduler places the pod on the node that minimizes skew, even if the constraint would be violated. Use this when you prefer balanced distribution but cannot afford pods stuck in Pending.

labelSelector

Defines which pods count toward the skew calculation. Only pods matching this selector are considered when computing the current distribution. This is typically set to match the same labels as your Deployment's selector.

3. YAML Examples

Basic Zone-Level Spreading

# Spread pods evenly across availability zones
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-frontend
  namespace: production
spec:
  replicas: 6
  selector:
    matchLabels:
      app: web-frontend
  template:
    metadata:
      labels:
        app: web-frontend
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone  # Spread across AZs
          whenUnsatisfiable: DoNotSchedule           # Hard constraint
          labelSelector:
            matchLabels:
              app: web-frontend
      containers:
        - name: frontend
          image: myregistry.io/web-frontend:v3.1.0
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"

With this configuration, the scheduler distributes the 6 replicas as evenly as possible across zones: 2-2-2 in a three-zone cluster.

Combined Node and Zone Spreading

# Spread across both zones AND nodes for maximum distribution
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  namespace: production
spec:
  replicas: 9
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
    spec:
      topologySpreadConstraints:
        # First constraint: spread across zones
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: payment-service
        # Second constraint: spread across nodes within each zone
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway          # Soft for node-level
          labelSelector:
            matchLabels:
              app: payment-service
      containers:
        - name: payment
          image: myregistry.io/payment-service:v2.0.0
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"

This configuration first ensures pods are evenly distributed across zones (hard constraint), then tries to spread them across nodes within each zone (soft constraint). The result is maximum distribution: 3 pods per zone, each on a different node if possible.

Soft Spreading with ScheduleAnyway

# Best-effort spreading — prefer balance but don't block scheduling
apiVersion: apps/v1
kind: Deployment
metadata:
  name: logging-agent
  namespace: monitoring
spec:
  replicas: 4
  selector:
    matchLabels:
      app: logging-agent
  template:
    metadata:
      labels:
        app: logging-agent
    spec:
      topologySpreadConstraints:
        - maxSkew: 2                                 # Allow some imbalance
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway          # Never block scheduling
          labelSelector:
            matchLabels:
              app: logging-agent
      containers:
        - name: agent
          image: myregistry.io/logging-agent:v1.5.0

4. Multiple Constraints

When multiple topologySpreadConstraints are specified, the scheduler evaluates all of them. A pod is only placed on a node that satisfies all constraints simultaneously (for DoNotSchedule constraints) or that minimizes the total skew (for ScheduleAnyway constraints).

If constraints conflict (for example, zone spreading requires placing the pod in Zone A, but node spreading requires placing it on a node in Zone B), DoNotSchedule constraints take priority and the pod stays Pending until a valid placement exists.

5. Interaction with Affinity Rules

TopologySpreadConstraints and affinity/anti-affinity rules can be used together, but understanding their interaction is essential.

With Node Affinity

Node affinity restricts which nodes are eligible. TopologySpreadConstraints then distribute pods among the eligible nodes.

# Combine node affinity with topology spread
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: node-type
                operator: In
                values: ["compute-optimized"]    # Only schedule on compute nodes
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app: ml-inference

With Pod Anti-Affinity

Pod anti-affinity and topology spread can work together but may conflict. Anti-affinity says "do not place two pods on the same node," while topology spread says "keep zones balanced." If the only node that balances zones already has a pod, anti-affinity blocks placement and the pod stays Pending.

In general, prefer TopologySpreadConstraints over Pod Anti-Affinity for most spreading use cases. TopologySpreadConstraints are more flexible and predictable.

6. Comparison: TopologySpreadConstraints vs. Pod Anti-Affinity

Aspect	TopologySpreadConstraints	Pod Anti-Affinity
Granularity	Controls the maximum skew (allows multiple pods per domain)	Binary: one pod per domain or zero
Multi-domain	Can spread across multiple topology levels simultaneously	Requires separate affinity rules per level
Soft/Hard	`DoNotSchedule` (hard) or `ScheduleAnyway` (soft)	`required` (hard) or `preferred` (soft)
Scaling behavior	Handles any replica count gracefully	Breaks when replicas > domains (e.g., 5 replicas on 3 nodes)
Performance	Efficient scheduler evaluation	Can be expensive for large clusters
Introduced	Kubernetes 1.19 (stable in 1.24)	Kubernetes 1.4

Pod Anti-Affinity is still useful when you truly need at most one pod per node (for DaemonSet-like patterns or stateful workloads). For all other spreading use cases, TopologySpreadConstraints are the preferred mechanism.

7. Default Cluster-Level Topology Spread

Starting in Kubernetes 1.24, you can configure default TopologySpreadConstraints at the cluster level via the kube-scheduler configuration. This ensures all workloads get basic spreading even if individual Deployments do not specify constraints.

# kube-scheduler configuration: default topology spread for all pods
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - pluginConfig:
      - name: PodTopologySpread
        args:
          defaultConstraints:
            - maxSkew: 3
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway
            - maxSkew: 5
              topologyKey: kubernetes.io/hostname
              whenUnsatisfiable: ScheduleAnyway
          defaultingType: List             # Use these defaults for pods without constraints

When defaultingType is List, the specified constraints are applied to pods that do not define their own TopologySpreadConstraints. When set to System, Kubernetes applies built-in defaults (spread by zone and hostname with ScheduleAnyway).

8. Real-World HA Patterns

Three-AZ Production Deployment

The most common HA pattern: deploy replicas across three availability zones with strict zone spreading and soft node spreading.

# Production HA: 3 AZs, strict zone balance, soft node balance
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
  namespace: production
spec:
  replicas: 9                              # 3 per zone in a 3-zone cluster
  selector:
    matchLabels:
      app: api-gateway
  template:
    metadata:
      labels:
        app: api-gateway
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: api-gateway
        - maxSkew: 2
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: api-gateway
      containers:
        - name: gateway
          image: myregistry.io/api-gateway:v4.0.0
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
      # Ensure pods spread across nodes during disruptions
      # PDB ensures at least 6 of 9 replicas are always running
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-gateway-pdb
  namespace: production
spec:
  minAvailable: 6                          # At least 6 of 9 must be running
  selector:
    matchLabels:
      app: api-gateway

Rack-Aware Bare-Metal Spreading

For bare-metal clusters, custom node labels define rack topology:

# Spread across racks in a bare-metal cluster
topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: rack                      # Custom label: rack=rack-01, rack-02, etc.
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: database-proxy

9. Version History and Feature Gates

TopologySpreadConstraints were introduced as an alpha feature in Kubernetes 1.16, moved to beta in 1.18, and became generally available in Kubernetes 1.19. The minDomains field (which sets a minimum number of eligible domains) was added in 1.24 (beta) and reached GA in 1.30. The matchLabelKeys field (for automatic label matching) was added in 1.25 (alpha) and reached beta in 1.27, allowing the scheduler to use pod-template-hash automatically without explicit labelSelector configuration.

Common Pitfalls

Uneven zone sizes: If Zone A has 10 nodes and Zone B has 2 nodes, maxSkew: 1 with DoNotSchedule can leave pods Pending because there are not enough nodes in Zone B. Use ScheduleAnyway or ensure zones have roughly equal capacity.
Forgetting the labelSelector: Without labelSelector, the constraint applies to all pods in the namespace. This almost certainly is not what you want and will produce confusing scheduling behavior.
Conflicting constraints: Multiple hard constraints that cannot be simultaneously satisfied result in pods stuck in Pending. Use kubectl describe pod to see scheduler events and identify which constraint is blocking placement.
maxSkew too restrictive during scale-up: With maxSkew: 1 and DoNotSchedule, scaling from 3 to 4 replicas in a 3-zone cluster requires placing the 4th pod in the zone with the fewest pods. If that zone has no available capacity, the pod stays Pending. Use ScheduleAnyway for non-critical workloads.
Not combining with PodDisruptionBudget: Topology spread only controls initial placement. During node drains or voluntary disruptions, pods may be evicted and rescheduled unevenly. PDBs ensure that enough replicas remain available during disruptions to maintain the HA posture.
Ignoring minDomains: Without minDomains, the scheduler counts only domains that already have matching pods. If you have a 3-zone cluster but pods only exist in 2 zones, the third zone is ignored in skew calculations. Setting minDomains: 3 forces the scheduler to consider all zones.

What's Next?

Apply TopologySpreadConstraints to your critical production Deployments with maxSkew: 1 across availability zones.
Configure cluster-level default constraints in the kube-scheduler configuration to ensure all workloads get basic spreading.
Combine topology spread with PodDisruptionBudgets to maintain HA guarantees during node maintenance and voluntary disruptions.
Explore the minDomains field (GA in Kubernetes 1.30) to force the scheduler to consider all topology domains.
Use the matchLabelKeys field to automatically match pod-template-hash labels, simplifying topology spread for Deployments with rolling updates.
Monitor pod distribution with kubectl get pods -o wide and verify that replicas are actually spread across the expected zones and nodes.

1. The Problem: Uneven Distribution​

Topology Spread

2. Core Parameters​

topologyKey​

maxSkew​

whenUnsatisfiable​

labelSelector​

3. YAML Examples​

Basic Zone-Level Spreading​

Combined Node and Zone Spreading​

Soft Spreading with ScheduleAnyway​

4. Multiple Constraints​

5. Interaction with Affinity Rules​

With Node Affinity​

With Pod Anti-Affinity​

6. Comparison: TopologySpreadConstraints vs. Pod Anti-Affinity​

7. Default Cluster-Level Topology Spread​

8. Real-World HA Patterns​

Three-AZ Production Deployment​

Rack-Aware Bare-Metal Spreading​

9. Version History and Feature Gates​

Common Pitfalls​

What's Next?​

1. The Problem: Uneven Distribution

2. Core Parameters

topologyKey

maxSkew

whenUnsatisfiable

labelSelector

3. YAML Examples

Basic Zone-Level Spreading

Combined Node and Zone Spreading

Soft Spreading with ScheduleAnyway

4. Multiple Constraints

5. Interaction with Affinity Rules

With Node Affinity

With Pod Anti-Affinity

6. Comparison: TopologySpreadConstraints vs. Pod Anti-Affinity

7. Default Cluster-Level Topology Spread

8. Real-World HA Patterns

Three-AZ Production Deployment

Rack-Aware Bare-Metal Spreading

9. Version History and Feature Gates

Common Pitfalls

What's Next?