DaemonSets: Per-Node Agents

Key Takeaways for AI & Readers

Scope: Ensures exactly one copy of a Pod runs on every (or a subset of) Node(s) in the cluster. You do not set a replica count -- the cluster topology determines how many Pods exist.
Use Cases: Log collectors (Fluentd, Filebeat), monitoring agents (Prometheus Node Exporter, Datadog), CNI plugins (Calico, Cilium), and storage daemons (Ceph, GlusterFS).
Automatic Scaling: Automatically adds a Pod when a new node joins the cluster and garbage-collects the Pod when a node is removed.
Taints and Tolerations: DaemonSet Pods often tolerate taints that would repel normal workloads, allowing them to run on control plane nodes and other special-purpose nodes.
Update Strategies: Supports RollingUpdate (default) and OnDelete strategies for managing rollouts of new Pod templates.

Node 1

👁️

ds-pod-1

Node 2

👁️

ds-pod-2

Notice: You don't scale the pods directly. You scale the Nodes, and the Daemon Controller automatically schedules a pod on the new node.

A DaemonSet ensures that all (or some) Nodes run a copy of a Pod.

As nodes are added to the cluster, Pods are added to them automatically.
As nodes are removed from the cluster, those Pods are garbage collected.
Deleting a DaemonSet cleans up all the Pods it created.

Unlike Deployments and ReplicaSets where you specify a replica count, a DaemonSet derives its Pod count from the number of eligible nodes in the cluster.

Use Cases

DaemonSets are the standard mechanism for deploying infrastructure-level agents that must run on every node:

Cluster Storage Daemons: Running glusterd, ceph, or CSI node plugins on each node to provide distributed storage.
Log Collection: Running fluentd, fluent-bit, filebeat, or logstash on every node to collect container logs and forward them to a central logging backend (Elasticsearch, Loki, Splunk).
Node Monitoring: Running Prometheus Node Exporter, collectd, Datadog agent, or New Relic agent on every node to collect system-level metrics (CPU, memory, disk, network).
CNI Plugins: Networking plugins like Calico, Cilium, or Weave Net run as DaemonSets to configure pod networking on each node.
Security Agents: Runtime security tools like Falco or Sysdig run as DaemonSets to monitor system calls on every node.
GPU Device Plugins: NVIDIA device plugins run as DaemonSets on GPU-equipped nodes to expose GPU resources to the kubelet.

How Scheduling Works

The DaemonSet controller operates differently from the standard scheduler:

The DaemonSet controller watches for node events (node added, node removed, node labels changed).
For each eligible node, the controller checks whether a Pod matching its selector already exists.
If no matching Pod exists on an eligible node, the controller creates one. If a Pod exists on an ineligible node, the controller deletes it.

The DaemonSet controller sets the .spec.nodeName field on each Pod it creates, which effectively bypasses the default Kubernetes scheduler. However, since Kubernetes 1.12, DaemonSets use the default scheduler by default (controlled by the ScheduleDaemonSetPods feature gate, which is enabled by default). This means DaemonSet Pods go through the normal scheduling pipeline and respect node affinity, taints, and tolerations.

nodeSelector

You can restrict a DaemonSet to a subset of nodes using nodeSelector:

spec:
  template:
    spec:
      nodeSelector:
        disk: ssd

This DaemonSet only runs on nodes labeled disk=ssd. When you add or remove that label from a node, the DaemonSet controller adds or removes the Pod accordingly.

nodeAffinity

For more flexible node selection, use nodeAffinity:

spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/arch
                    operator: In
                    values:
                      - amd64
                      - arm64

This ensures the DaemonSet runs only on amd64 and arm64 nodes, which is useful in heterogeneous clusters.

Taints and Tolerations

By default, DaemonSet Pods respect node taints. A tainted node repels Pods that do not have a matching toleration. However, DaemonSet workloads (like log collectors and CNI plugins) typically need to run on every node, including control plane nodes and nodes with special taints.

The DaemonSet controller automatically adds the following tolerations to DaemonSet Pods:

Toleration Key	Effect	Purpose
`node.kubernetes.io/not-ready`	`NoExecute`	Keep running during node problems
`node.kubernetes.io/unreachable`	`NoExecute`	Keep running when node is unreachable
`node.kubernetes.io/disk-pressure`	`NoSchedule`	Run even under disk pressure
`node.kubernetes.io/memory-pressure`	`NoSchedule`	Run even under memory pressure
`node.kubernetes.io/pid-pressure`	`NoSchedule`	Run even under PID pressure
`node.kubernetes.io/unschedulable`	`NoSchedule`	Run even when node is cordoned

To run a DaemonSet on control plane nodes (which typically have the node-role.kubernetes.io/control-plane:NoSchedule taint), you must explicitly add a toleration:

spec:
  template:
    spec:
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule

Full YAML Example: Monitoring Agent DaemonSet

Here is a complete DaemonSet manifest for running a Prometheus Node Exporter on every node:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
  labels:
    app: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: node-exporter
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9100"
    spec:
      hostNetwork: true
      hostPID: true
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule
      containers:
        - name: node-exporter
          image: prom/node-exporter:v1.7.0
          args:
            - --path.procfs=/host/proc
            - --path.sysfs=/host/sys
            - --path.rootfs=/host/root
            - --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
          ports:
            - containerPort: 9100
              hostPort: 9100
              name: metrics
          resources:
            requests:
              cpu: 50m
              memory: 64Mi
            limits:
              cpu: 200m
              memory: 128Mi
          volumeMounts:
            - name: proc
              mountPath: /host/proc
              readOnly: true
            - name: sys
              mountPath: /host/sys
              readOnly: true
            - name: root
              mountPath: /host/root
              readOnly: true
              mountPropagation: HostToContainer
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: sys
          hostPath:
            path: /sys
        - name: root
          hostPath:
            path: /

Key observations:

hostNetwork: true and hostPID: true allow the exporter to collect host-level metrics directly.
Host paths are mounted read-only for security.
The toleration allows this DaemonSet to run on control plane nodes.
hostPort: 9100 exposes the metrics endpoint directly on the node IP, making it discoverable by Prometheus.
Low resource requests (50m CPU, 64Mi memory) reflect the lightweight nature of monitoring agents.

Update Strategies

DaemonSets support two update strategies via .spec.updateStrategy.type:

RollingUpdate (Default)

When you update the Pod template, the DaemonSet controller terminates old Pods and creates new ones node by node. The maxUnavailable field controls how many nodes can have their DaemonSet Pod down simultaneously during an update:

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1

With maxUnavailable: 1, only one node at a time loses its DaemonSet Pod during the rollout. You can also use a percentage (e.g., maxUnavailable: "25%") for large clusters where updating one node at a time would take too long.

The maxSurge field (available since Kubernetes 1.22) controls whether a new Pod is created before the old one is deleted, enabling zero-downtime DaemonSet updates:

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1

OnDelete

With OnDelete, the DaemonSet controller does not automatically roll out updates. You must manually delete each Pod, and the controller recreates it with the updated template. This is useful when you want to control exactly when each node gets updated, which is common for critical infrastructure like CNI plugins.

spec:
  updateStrategy:
    type: OnDelete

DaemonSet vs. Deployment

Feature	DaemonSet	Deployment
Replica count	One per eligible node (automatic)	Explicitly set via `.spec.replicas`
Scheduling	One Pod per node guaranteed	Scheduler decides placement (may colocate)
Node coverage	Runs on every node (or a filtered subset)	No per-node guarantee
Scaling	Scales with cluster size	Scales with replica count
Use case	Infrastructure agents, node-level daemons	Application workloads, APIs, web servers
Host access	Often uses `hostNetwork`, `hostPath`	Rarely needs host access

When to use a DaemonSet: Your workload must run on every node (or a specific subset of nodes), and you need exactly one instance per node. Examples include log collectors, monitoring agents, and networking plugins.

When to use a Deployment: Your workload is an application that should scale based on load, not based on cluster size. The scheduler should decide where Pods land based on available resources.

Common Pitfalls

Forgetting tolerations for control plane nodes: If you want full cluster coverage (including control plane nodes), you must add explicit tolerations. Without them, DaemonSet Pods are not scheduled on tainted nodes.
Resource contention with host resources: DaemonSet Pods using hostNetwork or hostPort occupy actual node ports. If two DaemonSets try to bind the same hostPort, one will fail. Always use unique port numbers.
Not setting resource requests: If you omit resource requests, the scheduler cannot account for DaemonSet Pods when making placement decisions for other workloads. This can lead to node overcommitment and OOM kills.
Ignoring maxUnavailable during updates: With the default maxUnavailable: 1, updating a 100-node cluster takes a long time. For non-critical DaemonSets, consider setting maxUnavailable to a percentage like "10%" to speed up rollouts.
Using DaemonSets for application workloads: DaemonSets are not a substitute for Deployments. If your workload does not need per-node guarantees, use a Deployment with appropriate resource requests and let the scheduler optimize placement.
Forgetting that DaemonSet Pods count toward node resource limits: Every DaemonSet Pod consumes CPU and memory on the node. In large clusters with many DaemonSets, the aggregate overhead can be significant. Account for this when sizing your nodes.

Best Practices

Set resource requests and limits: Even small agents add up. A monitoring agent using 100Mi of memory across 100 nodes consumes 10Gi of cluster memory. Set accurate requests so the scheduler can account for this overhead.
Use readOnlyRootFilesystem: true: For security, mount the container filesystem as read-only and use emptyDir or hostPath volumes only where writes are needed.
Label your DaemonSet Pods consistently: Use labels like app, component, and tier so that monitoring and log aggregation pipelines can easily identify DaemonSet workloads.
Use priorityClassName: Set a high priority (e.g., system-node-critical) for essential DaemonSets like CNI plugins and log collectors. This ensures they are not evicted when the node is under resource pressure.
Monitor DaemonSet rollout status: Use kubectl rollout status daemonset/<name> to track update progress. Set up alerts if the number of desiredNumberScheduled does not match numberReady for an extended period.
Prefer maxSurge for zero-downtime updates: When running critical node agents, use maxSurge: 1 with maxUnavailable: 0 so a new Pod is running before the old one is removed.

What's Next?

StatefulSets: Learn how StatefulSets provide stable identity and persistent storage for databases and distributed systems.
ConfigMaps & Secrets: Inject configuration and credentials into your DaemonSet Pods without hardcoding them in the image.
Scheduling & Affinity: Deep dive into nodeSelector, nodeAffinity, taints, tolerations, and pod topology spread constraints.
Observability: Learn how to build monitoring and logging stacks that DaemonSets power.
Pod Security: Understand security contexts, Pod Security Standards, and how to secure DaemonSet Pods that require privileged access.

Use Cases​

How Scheduling Works​

nodeSelector​

nodeAffinity​

Taints and Tolerations​

Full YAML Example: Monitoring Agent DaemonSet​

Update Strategies​

RollingUpdate (Default)​

OnDelete​

DaemonSet vs. Deployment​

Common Pitfalls​

Best Practices​

What's Next?​