Skip to main content

Pods: The Atomic Unit

Key Takeaways for AI & Readers
  • Definition: The smallest deployable unit in Kubernetes — a thin wrapper around one or more containers that share networking and storage.
  • Isolation: Containers in a Pod share the same Network namespace (IP address and port space), IPC namespace, and optionally Storage volumes.
  • Ephemeral: Pods are disposable by design; you should never treat a specific Pod instance as permanent. Controllers (Deployments, StatefulSets, Jobs) manage their lifecycle.
  • Scheduling Unit: The Pod — not the container — is what the Kubernetes Scheduler places onto a Node. All containers in a Pod always run on the same Node.
  • Identity: Each Pod receives a unique IP address within the cluster. Containers inside communicate over localhost; containers in different Pods communicate over the cluster network.
P

Pending

Restarts: 0

The Pod has been accepted by the cluster, but one or more container images has not been created.


0. Anatomy of a Pod

A Pod is a logical wrapper around one or more containers that are tightly coupled and need to share resources. Think of it as a "logical host" — the containers inside behave as though they are running on the same machine.

Every Pod has:

  • A unique cluster IP address (assigned from the Pod CIDR range).
  • One or more application containers running your workload.
  • An infrastructure "pause" container (hidden from you) that holds the network namespace alive even if your app container restarts.
  • Optional volumes that any container in the Pod can mount.

Minimal Pod Definition

Here is the simplest possible Pod manifest:

apiVersion: v1
kind: Pod
metadata:
name: my-app
labels:
app: my-app
tier: backend
spec:
containers:
- name: app
image: nginx:1.27
ports:
- containerPort: 80

You create it with kubectl apply -f pod.yaml and inspect it with kubectl get pod my-app -o wide.

In practice, you almost never create bare Pods. You use a controller like a Deployment or StatefulSet that manages Pods for you, handling restarts, scaling, and rolling updates. Bare Pods are useful for learning and for one-off debugging tasks.

Single-Container Pod

The most common pattern. "One Pod = One Container." The Pod exists solely to give Kubernetes a unit to schedule, monitor, and network. Your container runs, and the Pod wraps it.

Multi-Container Pod

A Pod can hold multiple containers that work together as a cohesive unit. These containers share the following Linux namespaces:

  • Shared Networking (Network Namespace): All containers in a Pod share the same IP and localhost.
    • The Port Constraint: Since they share an IP, two containers in the same Pod cannot listen on the same port (e.g., you cannot have two Nginx containers both on port 80).
    • Inter-container Communication: Container A can reach Container B at 127.0.0.1:<port> — no Service or DNS required.
  • Shared Storage (Volumes): They can mount the same Volume to exchange files on disk.
  • Shared IPC Namespace: They can communicate using SystemV semaphores or POSIX shared memory.

When should you put multiple containers in the same Pod? Only when they are tightly coupled — they must run on the same node, share the network, or share files. If two processes can communicate over the network and do not need to share local files, they belong in separate Pods.


Multi-Container Design Patterns

Kubernetes formalizes three well-known patterns for multi-container Pods. Each addresses a different operational concern.

The Sidecar Pattern

A helper container that extends or enhances the main container without the main container knowing about it.

Native Sidecar Support (K8s 1.28+): You can now define sidecars as initContainers with restartPolicy: Always. This ensures they start before the main app (blocking until ready) but remain running throughout the Pod's life. This solves the long-standing "sidecar startup race condition."

Use cases: Log shipping, metrics export, configuration reloading, TLS proxy.

apiVersion: v1
kind: Pod
metadata:
name: web-with-log-shipper
spec:
initContainers:
- name: istio-proxy
image: istio/proxyv2:1.20
restartPolicy: Always # This makes it a Native Sidecar!
volumes:
- name: shared-logs
emptyDir: {}
containers:
- name: web-server
image: nginx:1.27
volumeMounts:
- name: shared-logs
mountPath: /var/log/nginx

The Nginx container writes access logs to /var/log/nginx. The Fluent Bit sidecar reads those same files (via the shared emptyDir volume) and ships them to your centralized logging stack. Nginx does not know or care that the sidecar exists.

The Ambassador Pattern

A proxy container that simplifies outbound connections for the main container. The main container connects to localhost, and the ambassador handles the complexity of connecting to the real external service (connection pooling, service discovery, retries).

Use cases: Database proxy (e.g., Cloud SQL Proxy, PgBouncer), API gateway for external services.

apiVersion: v1
kind: Pod
metadata:
name: app-with-db-proxy
spec:
containers:
- name: app
image: my-app:2.4
env:
- name: DB_HOST
value: "127.0.0.1" # App talks to localhost
- name: DB_PORT
value: "5432"
- name: cloud-sql-proxy
image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.11
args:
- "--structured-logs"
- "my-project:us-central1:my-db"
ports:
- containerPort: 5432

The application is configured to connect to a PostgreSQL database at 127.0.0.1:5432. The Cloud SQL Proxy ambassador container handles authentication and encrypted tunneling to the real Cloud SQL instance.

The Adapter Pattern

A container that standardizes or transforms the output of the main container into a format that external systems expect.

Use cases: Prometheus metric export from legacy apps, log format normalization, protocol translation.

apiVersion: v1
kind: Pod
metadata:
name: legacy-app-with-exporter
spec:
volumes:
- name: metrics-socket
emptyDir: {}
containers:
- name: legacy-app
image: my-legacy-app:1.0
volumeMounts:
- name: metrics-socket
mountPath: /tmp/metrics
- name: prometheus-adapter
image: prom/statsd-exporter:v0.27.0
ports:
- containerPort: 9102
volumeMounts:
- name: metrics-socket
mountPath: /tmp/metrics
readOnly: true

The legacy app writes StatsD-format metrics to a Unix socket. The adapter container reads them and exposes them as Prometheus-format metrics on port 9102.


Init Containers

Init containers run before any app containers start. They run sequentially (init-1 completes, then init-2 starts, etc.), and the Pod's app containers will not start until every init container has exited successfully (exit code 0).

Why Init Containers?

  • Separation of concerns: Keep setup utilities (like database migration tools, schema validators, or certificate generators) out of your production application image.
  • Ordering guarantees: Wait for dependent services to be available before your app starts.
  • Security: Run one-time privileged setup (like adjusting sysctl parameters) in an init container, then run the main app as non-root.

Init Container Example

apiVersion: v1
kind: Pod
metadata:
name: app-with-init
spec:
initContainers:
- name: wait-for-db
image: busybox:1.36
command:
- "sh"
- "-c"
- |
echo "Waiting for database to be ready..."
until nc -z db-service 5432; do
echo " ...db not ready yet, retrying in 2s"
sleep 2
done
echo "Database is ready!"
- name: run-migrations
image: my-app:2.4
command: ["python", "manage.py", "migrate", "--noinput"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
containers:
- name: app
image: my-app:2.4
ports:
- containerPort: 8000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url

This Pod does three things in strict order:

  1. wait-for-db: Loops until TCP port 5432 on db-service is reachable.
  2. run-migrations: Applies database schema migrations.
  3. app: Only after both init containers succeed does the Django application start.

If any init container fails, Kubernetes restarts it according to the Pod's restartPolicy. The app containers will not start until all init containers succeed.


1. The Startup Phase

When a Pod is created, it goes through several initialization steps before your application actually runs:

  1. Pending: The Pod has been accepted by the API Server and persisted to etcd. The Scheduler is evaluating which Node has enough CPU, memory, and other constraints (affinity, taints, topology spread) to place the Pod.
  2. InitContainers: If defined, these run sequentially to completion. Each must exit with code 0 before the next one starts. If an init container fails, the kubelet restarts it (subject to restartPolicy).
  3. ContainerCreating: The kubelet on the selected Node instructs the container runtime (containerd, CRI-O) to pull the image (unless it is cached with imagePullPolicy: IfNotPresent), create the container, and set up networking via the CNI plugin.
  4. Running: The container's ENTRYPOINT/CMD process is executing.
  5. PostStart Hook: (Optional) Executes immediately after the container starts. Warning: There is no guarantee it runs before the container's ENTRYPOINT. The PostStart handler and the ENTRYPOINT run concurrently. If the PostStart handler fails, the container is killed.

Image Pull Policies

The imagePullPolicy field on a container controls when the kubelet pulls the image:

  • Always: Pull the image every time the container starts. This is the default when you use the :latest tag.
  • IfNotPresent: Pull only if the image is not already cached on the Node. This is the default when you use a specific tag like :1.27.
  • Never: Never pull. The image must already exist on the Node.

Best practice: Always use a specific image tag (not :latest) and use imagePullPolicy: IfNotPresent. This ensures reproducible deployments and avoids unnecessary image pulls.


2. The Termination Phase (Graceful Shutdown)

This is the most misunderstood part of Kubernetes. When you delete a Pod (or scale down, or roll out an update), it does not just "die" instantly. Kubernetes orchestrates a careful shutdown sequence.

The Sequence of Events:

  1. API Server Update: The Pod's deletionTimestamp is set. The Pod state becomes "Terminating."
  2. Endpoint Removal (concurrent): The Endpoints controller removes the Pod's IP from all Service Endpoints. The kube-proxy on every Node updates its iptables/IPVS rules. This propagation takes time — during this window, some Nodes may still send traffic to the terminating Pod.
  3. PreStop Hook (concurrent): If defined, this command or HTTP request runs inside the container. The kubelet waits for it to complete before sending SIGTERM. This is your chance to trigger a clean shutdown in applications that do not handle UNIX signals.
  4. SIGTERM: The kubelet sends the SIGTERM signal to PID 1 in the container.
    • Your Application's Job: Catch this signal, stop accepting new requests, finish processing in-flight requests, close database connections, flush buffers, and exit cleanly.
  5. Grace Period: The kubelet waits for terminationGracePeriodSeconds (default: 30s). This timer starts when the termination process begins (step 1), not when SIGTERM is sent.
  6. SIGKILL: If the container is still running after the grace period, the kubelet sends SIGKILL. The process is forcibly killed by the kernel. No cleanup happens.

Why PreStop Hooks Matter

Steps 2 and 3 happen concurrently. This is critical. The endpoint removal (step 2) takes time to propagate across all Nodes in the cluster. If your application receives SIGTERM and shuts down instantly, some in-flight requests routed by Nodes that have not yet received the endpoint update will fail with connection errors.

The solution: Add a preStop hook with a short sleep to give endpoint changes time to propagate:

lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 5"]

This ensures your container stays alive and can serve any straggler requests for 5 seconds while the Service endpoint removal propagates across the cluster.

Graceful Shutdown: Handling SIGTERM in Your Application

Your application must handle SIGTERM to shut down cleanly. If it ignores the signal, Kubernetes waits the full grace period and then sends SIGKILL — killing the process with no opportunity to close connections, flush writes, or release locks. Every production service should implement a shutdown handler.

Node.js:

const server = http.createServer(app);

process.on('SIGTERM', () => {
console.log('SIGTERM received, starting graceful shutdown...');

// 1. Stop accepting new connections
server.close(() => {
console.log('HTTP server closed');

// 2. Close database pool and other resources
db.end().then(() => {
console.log('DB pool drained');
process.exit(0);
});
});

// 3. Force exit if cleanup takes too long
setTimeout(() => {
console.error('Forced exit — cleanup exceeded timeout');
process.exit(1);
}, 25000); // Leave 5s buffer before SIGKILL
});

Go:

ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGTERM)
defer stop()

srv := &http.Server{Addr: ":8080", Handler: mux}
go func() { srv.ListenAndServe() }()

<-ctx.Done() // Block until SIGTERM
log.Println("SIGTERM received, draining...")

shutdownCtx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
defer cancel()

srv.Shutdown(shutdownCtx) // Drains in-flight requests
db.Close()
log.Println("Clean shutdown complete")

Python:

import signal, sys, threading

shutdown_event = threading.Event()

def handle_sigterm(signum, frame):
print("SIGTERM received, shutting down...")
shutdown_event.set()

signal.signal(signal.SIGTERM, handle_sigterm)

# In your main loop or server:
while not shutdown_event.is_set():
# process requests...
pass

# Cleanup: close DB connections, flush buffers
db.close()
sys.exit(0)
The PID 1 Problem

If your Dockerfile uses the shell form of CMD (e.g., CMD node server.js), the shell (/bin/sh -c) becomes PID 1 and does not forward signals to your application. Your app never receives SIGTERM and is always SIGKILL'd after the grace period.

Fix: Use the exec form (CMD ["node", "server.js"]) so your process is PID 1, or use a lightweight init like tini that properly forwards signals and reaps zombie processes.

PreStop Hook Patterns for Production

The preStop hook runs before SIGTERM is sent, giving you a chance to perform cleanup that your application cannot do itself (e.g., deregistering from external service discovery).

PatternPreStop CommandWhen to Use
Sleep (simple delay)sleep 5Most apps — gives endpoint removal time to propagate
HTTP drain endpointcurl -X POST localhost:8080/drainApps with a dedicated drain API that stops accepting new work
Exec deregisterconsul services deregister -id=my-svcApps registered in external service discovery (Consul, Eureka)

Pattern 1: Drain connections via HTTP endpoint

lifecycle:
preStop:
httpGet:
path: /drain
port: 8080

Your application's /drain endpoint sets a flag to stop accepting new requests and waits for in-flight requests to complete. The kubelet waits for the HTTP call to return before sending SIGTERM.

Pattern 2: Deregister from external service discovery

lifecycle:
preStop:
exec:
command: ["sh", "-c", "curl -X PUT http://consul:8500/v1/agent/service/deregister/my-svc && sleep 3"]

This deregisters the instance from Consul and sleeps briefly to allow clients to receive the updated service list.

Pattern 3: PostStart for registration (the complement)

lifecycle:
postStart:
exec:
command: ["sh", "-c", "curl -X PUT http://consul:8500/v1/agent/service/register -d @/etc/consul/service.json"]
preStop:
exec:
command: ["sh", "-c", "curl -X PUT http://consul:8500/v1/agent/service/deregister/my-svc"]

Use postStart to register with service discovery when the container starts, and preStop to deregister before shutdown. This ensures the external registry always reflects reality.

FinOps Note

Long terminationGracePeriodSeconds values cost compute during rolling updates — every terminating Pod occupies a node slot until the grace period expires or the process exits. Set the grace period to your actual drain time + 5 seconds, not an arbitrarily large number.


3. Restart Policies

Defined by spec.restartPolicy at the Pod level (applies to all containers in the Pod):

  • Always (Default): If the container exits for any reason (even with exit code 0 — success), restart it. This is the correct policy for long-running servers.
  • OnFailure: Only restart if the exit code is non-zero. This is the correct policy for Jobs and batch workloads — you want the container to stay stopped once it completes successfully.
  • Never: Never restart. Useful for one-shot debugging pods where you want to inspect logs and state after the process exits.

The kubelet uses exponential backoff for restarts: 10s, 20s, 40s, ... up to 5 minutes. This is why you see the CrashLoopBackOff status — Kubernetes is waiting before trying again. The backoff timer resets after 10 minutes of successful running.


4. Pod Networking Model

Understanding how Pods communicate is essential. Kubernetes enforces a simple, flat networking model with three rules:

  1. Every Pod gets its own unique IP address. There is no NAT between Pods.
  2. All Pods can communicate with all other Pods across any Node using the Pod's IP directly (without NAT).
  3. Agents on a Node (like the kubelet) can communicate with all Pods on that Node.

This means that from any container, you can reach any other Pod's IP:port combination directly. A CNI (Container Network Interface) plugin implements this model. Popular choices include Calico, Cilium, Flannel, and Weave Net.

DNS Within Pods

Every Pod gets DNS resolution configured automatically (via /etc/resolv.conf). CoreDNS provides cluster DNS:

  • Service DNS: my-service.my-namespace.svc.cluster.local resolves to the Service's ClusterIP.
  • Pod DNS (rarely used): 10-244-1-5.my-namespace.pod.cluster.local (the IP with dots replaced by dashes).
  • Short names: Within the same Namespace, my-service is sufficient (the search domain is prepended).

For details on how Services route traffic to Pods, see Services (Networking).


5. Resource Requests and Limits Basics

Every container in a Pod can (and should) declare how much CPU and memory it needs.

containers:
- name: app
image: my-app:2.4
resources:
requests:
cpu: "250m" # 250 millicores = 0.25 CPU
memory: "128Mi" # 128 mebibytes
limits:
cpu: "500m" # Hard ceiling: 0.5 CPU
memory: "256Mi" # Hard ceiling: 256 MiB
  • Requests are what the Scheduler uses to find a Node with enough capacity. If no Node can satisfy the request, the Pod stays Pending.
  • Limits are enforced at runtime by the Linux kernel. CPU limits cause throttling (your process gets fewer cycles). Memory limits cause OOMKill (your process is killed if it exceeds the limit).

The critical rule: requests <= limits. If you set limits without requests, Kubernetes sets requests equal to limits.

For a deep dive into resource management, QoS classes, and autoscaling, see Resources (HPA).


6. Security Context Basics

A securityContext lets you control privilege and access for a Pod or individual container. This is how you implement the principle of least privilege.

Pod-Level Security Context

Applies to all containers in the Pod:

apiVersion: v1
kind: Pod
metadata:
name: secure-app
spec:
securityContext:
runAsUser: 1000 # Run all containers as UID 1000
runAsGroup: 3000 # Primary GID 3000
fsGroup: 2000 # Volumes are owned by GID 2000
runAsNonRoot: true # Refuse to start if image runs as root
containers:
- name: app
image: my-app:2.4
securityContext:
allowPrivilegeEscalation: false # Prevent setuid binaries
readOnlyRootFilesystem: true # Prevent writes to /
capabilities:
drop: ["ALL"] # Drop all Linux capabilities

Key Security Context Fields

FieldLevelPurpose
runAsUserPod/ContainerSets the UID the process runs as
runAsNonRootPod/ContainerBlocks startup if the image would run as root (UID 0)
readOnlyRootFilesystemContainerMakes the root filesystem read-only (writes must go to mounted volumes)
allowPrivilegeEscalationContainerPrevents child processes from gaining more privileges than the parent
capabilities.dropContainerDrops Linux capabilities (use ["ALL"] and then add back only what you need)

Best practice: Start with the most restrictive settings (drop all capabilities, read-only root filesystem, non-root user) and relax only as needed. For cluster-wide enforcement, see Pod Security.


7. Static Pods

Most Pods are managed by the API Server. Static Pods are managed directly by the kubelet on a specific Node, without the API Server being involved in their creation.

  • How: Drop a YAML manifest into the kubelet's configured manifest directory (typically /etc/kubernetes/manifests/ on the Node).
  • Behavior: The kubelet watches this directory and automatically creates (and restarts) the Pod. It also creates a "mirror Pod" in the API Server so that kubectl get pods can show it, but you cannot edit or delete static Pods via the API.
  • Use Case: The Control Plane components — etcd, kube-apiserver, kube-scheduler, kube-controller-manager — often run as Static Pods on control plane Nodes. This is how a cluster bootstraps itself: the kubelet starts the API Server as a Static Pod before the API Server even exists.

8. Common Pitfalls

These are the mistakes that catch experienced engineers when they start working with Kubernetes Pods.

1. Running as Root

Many container images default to running as root (UID 0). This is a security risk. Always set runAsNonRoot: true and runAsUser in your security context, and ensure your Dockerfile uses a USER directive.

2. Not Handling SIGTERM

If your application does not catch SIGTERM, it will be forcibly killed after 30 seconds. This causes dropped connections, lost in-progress work, and data corruption in some cases. Every production application should gracefully handle SIGTERM.

3. Missing Resource Requests

Without resource requests, the Scheduler has no information to make placement decisions. Your Pod lands on a random Node and gets BestEffort QoS class, which means it is the first to be evicted under memory pressure.

4. Using :latest Tag

The :latest tag is mutable. Two Nodes might pull different versions of the "same" image. Always use immutable, specific tags (e.g., nginx:1.27.0 or a SHA digest like nginx@sha256:abc123...).

5. Ignoring the Endpoint Propagation Delay

As described in the Termination section, there is a race condition between SIGTERM and endpoint removal. Without a preStop sleep, you will see intermittent 502/503 errors during deployments.

6. Too Many Containers in One Pod

If two containers do not need to share localhost or local files, they should be in separate Pods. Over-packing a Pod prevents independent scaling — you cannot scale the web server without also scaling the background worker if they are in the same Pod.

7. Not Setting terminationGracePeriodSeconds

The default is 30 seconds. If your application needs more time to drain (e.g., it processes long-running requests or needs to flush large buffers), increase this value. If your app shuts down in 2 seconds, lower it to speed up deployments.

8. Shell Entrypoints That Don't Forward Signals

A common Dockerfile mistake causes containers to silently ignore SIGTERM:

# BAD: shell form — /bin/sh becomes PID 1 and absorbs SIGTERM
CMD java -jar app.jar

# GOOD: exec form — java becomes PID 1 and receives SIGTERM directly
CMD ["java", "-jar", "app.jar"]

When using the shell form, Docker wraps your command in /bin/sh -c "java -jar app.jar". The shell becomes PID 1 and does not forward SIGTERM to the child java process. Your application never sees the signal, runs for the full grace period, and is forcibly killed with SIGKILL.

Fixes:

  • Use the exec form in your Dockerfile (CMD ["executable", "arg1"]).
  • If you need shell features (variable expansion, pipes), use exec to replace the shell: CMD ["sh", "-c", "exec java -jar app.jar"].
  • Use tini as your entrypoint — it forwards signals and reaps zombie processes.

9. Troubleshooting: "My Pod is Stuck!"

Stuck in Pending?

  • Check Scheduling: kubectl describe pod <name>. Look in the Events section for messages like Insufficient cpu, Insufficient memory, or 0/3 nodes are available: 3 node(s) had taint {key=value}.
  • Check ResourceQuota: The Namespace might have a quota that is already full. Run kubectl describe resourcequota -n <namespace>.
  • Check PVC Binding: If the Pod uses a PersistentVolumeClaim, it will stay Pending until the PVC is bound to a PV. Run kubectl get pvc.

Stuck in ContainerCreating?

  • Image Pull Issues: Is the image tag correct? Is it a private registry (missing imagePullSecrets)? Run kubectl describe pod <name> and look for ImagePullBackOff or ErrImagePull events.
  • CNI Issues: The container runtime could not set up networking. Check the CNI plugin logs on the Node.

Stuck in Terminating?

Pods can get stuck in the Terminating state for several distinct reasons. Identifying the cause determines the correct resolution.

Cause 1: Finalizers

Finalizers are keys in metadata.finalizers that tell the API server "do not delete this object until the controller owning this finalizer has finished cleanup." If the responsible controller is broken, scaled to zero, or the finalizer references a controller that no longer exists, the Pod will never be deleted.

# Check for finalizers
kubectl get pod <name> -o jsonpath='{.metadata.finalizers}'

# Identify which controller owns the finalizer, then check its logs
kubectl logs -n <controller-namespace> <controller-pod>

Resolution: Fix the controller so it completes its cleanup and removes the finalizer. As a last resort, you can manually remove the finalizer by patching the Pod — but understand that this skips whatever cleanup the finalizer was meant to perform:

kubectl patch pod <name> -p '{"metadata":{"finalizers":null}}'

Cause 2: Unresponsive kubelet / Node failure

The API server sets deletionTimestamp, but the kubelet on the node is responsible for actually stopping the container. If the node is down or the kubelet is unresponsive, the API server cannot confirm the Pod has stopped. The node is marked NotReady after ~40 seconds, but Pods on that node are not automatically force-deleted — they remain Terminating indefinitely.

# Check if the node is Ready
kubectl get nodes
kubectl describe node <node-name> | grep -A5 Conditions

Resolution: If the node is truly dead and will not return, force-delete the Pod. Understand that this only removes the API object — if the node comes back, the container may still be running:

kubectl delete pod <name> --grace-period=0 --force

Cause 3: Long-running PreStop hook

If a preStop hook takes longer than terminationGracePeriodSeconds, the entire termination sequence stalls. The grace period timer starts when the termination process begins (step 1 in the sequence above), not when SIGTERM is sent. So if your preStop hook takes 25 seconds and the grace period is 30 seconds, your application only gets 5 seconds between SIGTERM and SIGKILL.

Resolution: Ensure your preStop hook completes well within the grace period. If it needs more time, increase terminationGracePeriodSeconds accordingly.

Cause 4: Container ignoring SIGTERM

If your container's PID 1 process does not handle SIGTERM (see the Graceful Shutdown section above), the container keeps running until the grace period expires and SIGKILL is sent. This is the most common cause of slow termination.

Force Delete: The Nuclear Option

kubectl delete pod <name> --grace-period=0 --force removes the Pod object from the API server immediately. However, it does not guarantee the container has actually stopped on the node. The kubelet will still attempt to stop the container, but you lose visibility into whether it succeeded.

Risks for StatefulSets: Force-deleting a StatefulSet Pod can cause two Pods with the same identity to run simultaneously if the original node is still alive. This can corrupt data in storage systems that assume single-writer semantics. Only force-delete StatefulSet Pods after confirming the original node is permanently gone.

CrashLoopBackOff?

  • View Logs: kubectl logs <pod> --previous shows logs from the last crashed instance.
  • Exit Code 137 (128 + 9 = SIGKILL): OOMKilled. The container exceeded its memory limit. Run kubectl describe pod <name> and look for OOMKilled in the Last State. Increase the memory limit or fix the memory leak.
  • Exit Code 1 or 255: Generic application error. Check configuration, environment variables, and command arguments.
  • Exit Code 126: Permission denied — the entrypoint binary is not executable.
  • Exit Code 127: Entrypoint binary not found — check your command or args fields.

Useful Debugging Commands

# Get Pod status with Node placement and IP
kubectl get pod <name> -o wide

# Detailed events and conditions
kubectl describe pod <name>

# Stream live logs
kubectl logs <name> -f

# Logs from a specific container in a multi-container Pod
kubectl logs <name> -c <container-name>

# Logs from the previous (crashed) instance
kubectl logs <name> --previous

# Open a shell inside a running container
kubectl exec -it <name> -- /bin/sh

# Inspect the full Pod spec as YAML
kubectl get pod <name> -o yaml

10. Hands-On Exercise

This exercise builds a multi-container Pod that demonstrates init containers, the sidecar pattern, shared volumes, and resource management.

Step 1: Create the Manifest

Save this as exercise-pod.yaml:

apiVersion: v1
kind: Pod
metadata:
name: pod-exercise
labels:
app: pod-exercise
spec:
terminationGracePeriodSeconds: 15
securityContext:
runAsNonRoot: true
runAsUser: 1000
initContainers:
- name: create-index
image: busybox:1.36
securityContext:
allowPrivilegeEscalation: false
command:
- "sh"
- "-c"
- |
echo "<html><body>" > /work-dir/index.html
echo "<h1>Hello from the init container!</h1>" >> /work-dir/index.html
echo "<p>Pod: $(hostname)</p>" >> /work-dir/index.html
echo "<p>Generated at: $(date -u)</p>" >> /work-dir/index.html
echo "</body></html>" >> /work-dir/index.html
echo "Init container finished."
volumeMounts:
- name: web-content
mountPath: /work-dir
containers:
- name: nginx
image: nginxinc/nginx-unprivileged:1.27
ports:
- containerPort: 8080
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "200m"
memory: "128Mi"
volumeMounts:
- name: web-content
mountPath: /usr/share/nginx/html
readOnly: true
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
- name: content-refresher
image: busybox:1.36
securityContext:
allowPrivilegeEscalation: false
command:
- "sh"
- "-c"
- |
while true; do
echo "Updating timestamp..."
echo "<html><body>" > /work-dir/index.html
echo "<h1>Hello from the sidecar!</h1>" >> /work-dir/index.html
echo "<p>Pod: $(hostname)</p>" >> /work-dir/index.html
echo "<p>Last updated: $(date -u)</p>" >> /work-dir/index.html
echo "</body></html>" >> /work-dir/index.html
sleep 10
done
resources:
requests:
cpu: "50m"
memory: "32Mi"
limits:
cpu: "100m"
memory: "64Mi"
volumeMounts:
- name: web-content
mountPath: /work-dir
volumes:
- name: web-content
emptyDir: {}

Step 2: Apply and Observe

# Create the Pod
kubectl apply -f exercise-pod.yaml

# Watch the Pod go through init -> running
kubectl get pod pod-exercise -w

# Check the init container logs
kubectl logs pod-exercise -c create-index

# Check the sidecar container logs
kubectl logs pod-exercise -c content-refresher

# Port-forward to see the web page
kubectl port-forward pod/pod-exercise 8080:8080

# In another terminal (or browser): curl http://localhost:8080
# Wait 10 seconds and curl again — the timestamp changes (sidecar is updating it)

Step 3: Observe Graceful Termination

# Delete the Pod and watch the termination sequence
kubectl delete pod pod-exercise &
kubectl get pod pod-exercise -w

# You should see: Running -> Terminating -> (gone)
# The 15s terminationGracePeriodSeconds gives containers time to shut down

What to Observe

  • The init container (create-index) runs first and creates the HTML file. Only after it exits successfully do the app containers start.
  • The Nginx container serves the HTML file from the shared volume (read-only mount).
  • The sidecar (content-refresher) overwrites the file every 10 seconds with a fresh timestamp.
  • Both app containers mount the same emptyDir volume at different paths.
  • Resource requests and limits are set on every container.
  • The Pod runs as non-root (UID 1000) with privilege escalation disabled.

Interactive: Try It in the Simulator

Practice Pod commands without a real cluster. Create, inspect, and delete Pods interactively:


What's Next?

Now that you understand Pods, continue building your Kubernetes knowledge:

  • Probes (Health Checks) — Configure liveness, readiness, and startup probes to let Kubernetes monitor your application's health.
  • Deployments — Manage Pods at scale with rolling updates, rollbacks, and declarative scaling.
  • Services — Give your Pods a stable network identity and load-balance traffic across replicas.
  • Storage (PV & PVC) — Persist data beyond the Pod lifecycle with PersistentVolumes.
  • Resources (HPA) — Deep dive into resource requests, limits, QoS classes, and horizontal autoscaling.
  • Scheduling & Affinity — Control which Nodes your Pods land on with node affinity, taints, and tolerations.
  • Pod Security — Enforce security standards across your cluster with Pod Security Admission.