You Don't Actually Know How Kubernetes Deploys Your App

> [!note] > ### Most engineers cargo-cult their way through rollouts, copy YAML from Stack Overflow, and pray. This ends today. ## 1. Introduction Let's be honest about something the Kubernetes community rarely admits out loud: **most engineers deploying to production don't actually understand what happens when they run `kubectl apply`.** They know it _works_ — until it doesn't. Until a rollout stalls at 2 out of 4 replicas at 2 AM and nobody can explain why. Until a canary silently poisons 10% of your users for 45 minutes because nobody configured a readiness probe correctly. Until `kubectl rollout undo` saves the night and the engineer doing it has no idea why it worked, just relieved that it did. This is not a beginner problem. This is a **systemic gap** in how the industry learns Kubernetes. We teach people to write Deployment YAML. We do not teach them what the Deployment controller actually _does_ with it. We teach strategies like Blue-Green and Canary as abstract concepts. We rarely explain that they are nothing more than policies for orchestrating Pod replacement — and that if you don't understand Pod replacement, you don't understand any of it. Here is the uncomfortable truth: **Kubernetes does not update your application. It kills it and starts it again.** Every deployment strategy in existence — from a simple rolling update to a sophisticated progressive delivery pipeline — is just a different choreography for that same act of controlled destruction and recreation. The engineers who sleep through incidents while their colleagues are frantically typing in Slack at 3 AM are not smarter. They are not luckier. They simply internalized this model early, and every Kubernetes behavior they encounter now has a clear mechanical explanation. This guide will give you that model. We cover both the strategic layer — the patterns your organization uses to manage risk during releases — and the mechanical layer — the 15 ways Kubernetes actually terminates and recreates Pods under the hood. By the end, these two layers will collapse into one coherent picture. - **Deployment strategies** — from Recreate to Progressive Delivery — including trade-offs, diagrams, and when each one will betray you if misapplied - **Pod replacement mechanisms** — the full catalog of how, when, and why Kubernetes replaces Pods, including the ones nobody documents until they cause an outage No cargo-culted YAML. No hand-waving. Just the mechanics. ## 2. The Fundamental Kubernetes Principle: Pod Immutability ### Why Pods Are Immutable In Kubernetes, a Pod is the smallest deployable unit — a wrapper around one or more containers sharing a network namespace and storage volumes. Once a Pod is scheduled and running, its specification is effectively immutable. You cannot change the container image, environment variables injected at runtime, or resource limits of a running Pod by patching the Pod object itself in a meaningful way. This is by design. Kubernetes enforces immutability at the Pod level for several architectural reasons: **Consistency and predictability.** A running Pod represents a known, tested configuration. Mutating it in-place introduces state drift — the running container would diverge from the image it was built from. **Stateless scheduling model.** The scheduler places Pods based on resource availability at scheduling time. Mutating a running Pod bypasses the scheduler entirely, potentially causing resource accounting errors. **Container runtime limitations.** Container runtimes (containerd, CRI-O) do not support hot-swapping the root filesystem of a running container. A new image requires a new container process. **Auditability.** Immutability creates a clear audit trail. Each Pod generation corresponds to a specific image digest and ConfigMap revision. Mutations obscure this. ### Recreate-and-Replace Over In-Place Modification When you update a Deployment's container image, Kubernetes does not SSH into the node and pull the new image into the running container. Instead: 1. The Deployment controller detects a spec change. 2. It creates a new ReplicaSet with the updated Pod template. 3. New Pods are scheduled and started from the new template. 4. Old Pods are gracefully terminated. This recreate-and-replace pattern is the atomic unit of all Kubernetes update operations. Every deployment strategy — from Rolling Update to Canary — is simply a different policy for _orchestrating_ this replacement sequence. ``` ┌─────────────────────────────────────────────────────────────┐ │ POD IMMUTABILITY MODEL │ │ │ │ ❌ WRONG (not how Kubernetes works) │ │ Running Pod ──── patch image ────▶ Updated Pod │ │ │ │ ✅ CORRECT (actual Kubernetes behavior) │ │ Old Pod ──── terminate ──▶ [deleted] │ │ New Pod ◀─── create ────── new ReplicaSet │ └─────────────────────────────────────────────────────────────┘ ``` --- ## 3. Major Deployment Strategies Used in Cloud Infrastructure ### 3.1 Recreate The simplest strategy. All existing Pods are terminated before new Pods are created. ``` v1 ───────────────────────────────── [Pod][Pod][Pod] ──STOP──▶ [] ↓ v2 [Pod][Pod][Pod] ─────────────────────────────────────────────▶ time DOWNTIME WINDOW ``` **Use case:** Non-production environments, batch jobs, or stateful workloads that cannot run two versions simultaneously (e.g., database schema migrations). **Kubernetes config:** ```yaml strategy: type: Recreate ``` --- ### 3.2 Rolling Update The default Kubernetes strategy. Pods are replaced incrementally — new Pods come up, old Pods go down, controlled by `maxSurge` and `maxUnavailable`. ``` Step 1: [v1][v1][v1][v1] Step 2: [v2][v1][v1][v1] ← 1 new, 1 old terminated Step 3: [v2][v2][v1][v1] Step 4: [v2][v2][v2][v1] Step 5: [v2][v2][v2][v2] ← complete ``` **Use case:** Stateless services where both versions can serve traffic simultaneously. **Kubernetes config:** ```yaml strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 ``` --- ### 3.3 Blue-Green Deployment Two identical environments run in parallel — "Blue" (current production) and "Green" (new version). Traffic is switched atomically at the load balancer or Kubernetes Service level. ``` ┌──────────────────────────────────────┐ │ LOAD BALANCER / INGRESS │ └────────────┬─────────────────────────┘ │ ┌────────────▼──────────┐ │ Service selector │ └────┬──────────────────┘ │ ┌──────────▼──────────┐ ┌──────────────────────┐ │ BLUE (v1) - LIVE │ │ GREEN (v2) - STANDBY │ │ [Pod][Pod][Pod] │ │ [Pod][Pod][Pod] │ └─────────────────────┘ └──────────────────────┘ After verification: flip selector to GREEN ``` **Kubernetes implementation:** Change the Service's `selector` label. ```yaml # Blue Service (before) selector: app: myapp version: blue # Green Service (after cutover) selector: app: myapp version: green ``` **Use case:** High-stakes releases requiring instant rollback capability. Common in regulated industries. --- ### 3.4 Canary Deployment A small percentage of traffic is routed to the new version while the majority continues to the stable version. Traffic is shifted progressively as confidence builds. ``` USERS (100%) │ ├──── 90% ───▶ [v1][v1][v1][v1][v1] (stable) │ └──── 10% ───▶ [v2] (canary) After validation: ├──── 50% ───▶ [v1][v1][v1] └──── 50% ───▶ [v2][v2][v2] Final: └── 100% ───▶ [v2][v2][v2][v2][v2] ``` **Kubernetes implementation:** Two Deployments, one Service, controlled by replica ratio or an ingress controller (NGINX, Traefik) with weight annotations. ```yaml # Canary ingress annotation (NGINX) nginx.ingress.kubernetes.io/canary: "true" nginx.ingress.kubernetes.io/canary-weight: "10" ``` **Use case:** High-traffic production services where you need real user validation before full rollout. --- ### 3.5 A/B Testing Similar to canary, but traffic splitting is based on **user attributes** (headers, cookies, geography, user segment) rather than weight percentage. ``` Request with header X-Beta: true ──▶ [v2] (variant B) Request without header ──▶ [v1] (variant A) ``` **Use case:** Product teams testing UX changes, feature variants, or pricing experiments on specific user cohorts. --- ### 3.6 Shadow Deployment Production traffic is mirrored to the shadow environment. The shadow service processes requests but its responses are discarded. No user impact. ``` User Request │ ├──▶ [v1 - LIVE] ──▶ Response returned to user │ └──▶ [v2 - SHADOW] ──▶ Response discarded (logged only) ``` **Use case:** Validating new service behavior under real production load without any user-facing risk. --- ### 3.7 Traffic Mirroring A variant of Shadow Deployment implemented at the infrastructure layer (Istio, Envoy, NGINX). The service mesh or proxy duplicates requests asynchronously. ```yaml # Istio VirtualService mirroring example mirror: host: myapp-v2 mirrorPercentage: value: 100.0 ``` **Use case:** Performance testing, data validation, ML model comparison under real traffic. --- ### 3.8 Feature Flags Deployment is decoupled from feature activation. New code ships to all Pods, but features are toggled at runtime via a flag service (LaunchDarkly, Unleash, Flagsmith). ``` All Pods run v2 code │ ├── Flag: new_checkout=OFF ──▶ old checkout flow └── Flag: new_checkout=ON ──▶ new checkout flow ``` **Use case:** Gradual feature rollout to user segments without infrastructure changes. Enables instant kill-switch for problematic features. --- ### 3.9 Dark Launch New features are deployed and execute in production but results are hidden from users. Infrastructure and backend systems are exercised without UI exposure. **Use case:** Pre-warming caches, validating backend integrations, testing third-party API performance. --- ### 3.10 Progressive Delivery An umbrella strategy combining canary, automated metrics analysis, and automated promotion/rollback. Tools like Argo Rollouts and Flagger implement this natively in Kubernetes. ``` Deploy Canary (10%) │ ▼ Automated Analysis (error rate, latency, custom metrics) │ ├── Metrics OK ──▶ Promote to 30% ──▶ 60% ──▶ 100% │ └── Metrics BAD ──▶ Automatic Rollback ``` **Kubernetes tooling:** Argo Rollouts (`Rollout` CRD), Flagger with Prometheus/Datadog integration. --- ### 3.11 Strategy Comparison Table |Strategy|Downtime Risk|Rollback Speed|Operational Complexity|Infrastructure Cost|Real Traffic Validation| |---|---|---|---|---|---| |Recreate|High|Fast|Low|Low|No| |Rolling Update|None|Medium|Low|Low|Partial| |Blue-Green|None|Instant|Medium|High (2x infra)|No| |Canary|None|Fast|Medium|Low-Medium|Yes| |A/B Testing|None|Fast|High|Medium|Yes| |Shadow|None|N/A|High|High (2x traffic)|Yes (passive)| |Traffic Mirroring|None|N/A|High|Medium-High|Yes (passive)| |Feature Flags|None|Instant|Medium|Low|Yes| |Dark Launch|None|Instant|Medium|Low|Partial| |Progressive Delivery|None|Automatic|High|Medium|Yes| --- ## 4. How Kubernetes Actually Updates Applications Kubernetes workload controllers are responsible for managing Pod lifecycle. Each controller has a distinct update mechanism. ### 4.1 Deployment The most common workload type for stateless applications. A Deployment manages ReplicaSets, not Pods directly. When you update a Deployment spec: 1. A new ReplicaSet is created with the updated Pod template hash. 2. The Deployment controller scales up the new RS and scales down the old RS according to the rollout strategy (`RollingUpdate` or `Recreate`). 3. Old ReplicaSets are retained (by default, last 10) to enable rollback via `kubectl rollout undo`. ### 4.2 ReplicaSet A ReplicaSet maintains a stable number of Pod replicas. It does **not** manage rolling updates — that is the Deployment's responsibility. When a ReplicaSet's Pod template is changed directly, it only affects new Pods; existing Pods are not replaced until they fail or are manually deleted. ### 4.3 StatefulSet For stateful applications (databases, message brokers, distributed systems). Key differences from Deployment: - Pods have stable, persistent identities (`pod-0`, `pod-1`, `pod-2`). - Updates proceed in **reverse ordinal order** (`pod-2` → `pod-1` → `pod-0`). - Supports `RollingUpdate` and `OnDelete` strategies. - `partition` parameter allows staged rollouts — only Pods with an ordinal ≥ partition value are updated. ```yaml updateStrategy: type: RollingUpdate rollingUpdate: partition: 2 # Only pod-2 and above are updated ``` ### 4.4 DaemonSet Ensures one Pod runs on every (or selected) node. Common for node-level agents: log collectors, monitoring exporters, CNI plugins. - `RollingUpdate`: Replaces Pods one node at a time, controlled by `maxUnavailable`. - `OnDelete`: Pods are only replaced when manually deleted (for tight control during maintenance). ### 4.5 Job Runs one-off batch workloads to completion. Pods created by Jobs are replaced if they fail, based on `restartPolicy` and `backoffLimit`. Not designed for rolling updates — a new Job manifest creates a new Job object. ### 4.6 CronJob Manages time-based Job execution. Each scheduled trigger creates a new Job object, which creates new Pods. `concurrencyPolicy` (`Allow`, `Forbid`, `Replace`) controls behavior when a Job is still running at the next trigger time. --- ## 5. 15 Ways Kubernetes Replaces or Recreates Pods |#|Mechanism|Controller|Trigger|New Pod Created|Old Pod Terminated| |---|---|---|---|---|---| |1|Deployment RollingUpdate|Deployment|Image/spec change|Yes (new RS)|Yes (gradually)| |2|Deployment Recreate|Deployment|Image/spec change|Yes (after all old Pods terminated)|Yes (all at once)| |3|ReplicaSet replacement|ReplicaSet|Pod count drift|Yes|Only if over desired count| |4|StatefulSet RollingUpdate|StatefulSet|Image/spec change|Yes (reverse ordinal)|Yes (one at a time)| |5|StatefulSet OnDelete|StatefulSet|Manual Pod deletion|Yes|Only on manual delete| |6|DaemonSet RollingUpdate|DaemonSet|Image/spec change|Yes (per node)|Yes (controlled by maxUnavailable)| |7|DaemonSet OnDelete|DaemonSet|Manual Pod deletion|Yes|Only on manual delete| |8|Node drain|Node/kubelet|`kubectl drain`|Yes (rescheduled)|Yes (evicted)| |9|Pod eviction|kubelet/API|Resource pressure, PodDisruptionBudget|Yes (if controller-managed)|Yes| |10|Horizontal Pod Autoscaler|HPA|CPU/memory/custom metric threshold|Yes (scale out)|Yes (scale in)| |11|Vertical Pod Autoscaler|VPA|Resource recommendation change|Yes (recreated with new limits)|Yes| |12|Image update (manual)|Deployment|`kubectl set image`|Yes|Yes (per strategy)| |13|Manual rollout restart|Deployment/DS/SS|`kubectl rollout restart`|Yes|Yes (per strategy)| |14|Job restart|Job|Pod failure + backoffLimit|Yes|Previous failed Pod remains| |15|CronJob execution|CronJob|Schedule trigger|Yes (new Job+Pod)|Previous Job Pods cleaned up| ### Key Mechanisms Explained **Node drain (`kubectl drain`):** Marks a node as unschedulable (`cordon`) and evicts all Pods using the Eviction API. Controller-managed Pods are rescheduled to other nodes. PodDisruptionBudgets are respected during eviction. **Pod eviction by kubelet:** When a node experiences memory or disk pressure, the kubelet evicts Pods based on QoS class. `BestEffort` Pods are evicted first, then `Burstable`, then `Guaranteed`. **Vertical Pod Autoscaler (VPA):** VPA in `Auto` mode terminates Pods and recreates them with updated resource requests. This is one of the more disruptive automatic mechanisms since it does not respect rolling update constraints by default. **Manual rollout restart:** ```bash kubectl rollout restart deployment/myapp ``` Injects a `kubectl.kubernetes.io/restartedAt` annotation into the Pod template, triggering a rolling replacement of all Pods even when the image has not changed. Useful for picking up updated Secrets or ConfigMaps mounted as volumes. --- ## 6. What Actually Triggers Pod Replacement ### Container Image Updates The most common trigger. When the `image` field in a Pod template changes (detected by the Deployment controller via template hash comparison), a rollout begins. ```bash kubectl set image deployment/myapp app=registry.example.com/myapp:v2.1.0 ``` ### Configuration Changes Changes to `env`, `envFrom`, `args`, or `command` fields in the Pod template modify the template hash and trigger a rollout. ### Secret and ConfigMap Updates **Important nuance:** Updating a Secret or ConfigMap object does **not** automatically trigger Pod replacement if the values are mounted as volumes or environment variables. The Pods must be restarted manually: ```bash kubectl rollout restart deployment/myapp ``` Exception: If you reference a versioned Secret name (e.g., `myapp-config-v3`) and update the Deployment to reference the new name, that spec change triggers a rollout. ### Resource Limit Changes Modifying `resources.requests` or `resources.limits` in the Pod template constitutes a spec change and triggers a rolling update. ### Node Failures When a node becomes `NotReady`, the node lifecycle controller marks Pods on that node as `Unknown`. After a timeout (default 5 minutes), these Pods are forcibly deleted and rescheduled by the relevant controller. ### Scaling Events HPA scale-out creates new Pods from the existing template. Scale-in terminates Pods (preferring Pods on over-allocated nodes, as per the default termination policy). ### Summary Table: Triggers and Their Rollout Impact |Trigger|Automatic Rollout|Manual Action Required|Notes| |---|---|---|---| |Image tag change|Yes|No|Via CI/CD or `kubectl set image`| |Env var change|Yes|No|Pod template hash changes| |ConfigMap value change|No|`rollout restart`|Volume-mounted values updated in ~60s without restart| |Secret value change|No|`rollout restart`|Same as ConfigMap| |Resource limits change|Yes|No|Template spec change| |Node failure|Yes (automatic)|No|After `node-monitor-grace-period`| |HPA scale event|Yes (automatic)|No|Based on metric thresholds| |VPA recommendation|Yes (in Auto mode)|No|Disruptive, Pod terminated| |Manual restart|Yes|`rollout restart`|Injects annotation| --- ## 7. Real Kubernetes Rollout Workflow The following describes the complete internal workflow from a developer's `git push` to running Pods in production. ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ COMPLETE ROLLOUT PIPELINE │ │ │ │ Developer │ │ │ │ │ ▼ │ │ ┌──────┐ push ┌────────┐ trigger ┌────────────────────┐ │ │ │ Git │ ──────▶ │ CI │ ────────▶ │ Build & Test │ │ │ │ repo │ │ system │ │ (lint, unit, int.) │ │ │ └──────┘ └────────┘ └────────┬───────────┘ │ │ │ docker build │ │ ▼ │ │ ┌────────────────┐ │ │ │ Container Image│ │ │ │ Registry │ │ │ │ (Harbor/ECR) │ │ │ └───────┬────────┘ │ │ │ image push │ │ ▼ │ │ ┌────────────────────┐ │ │ │ CD System │ │ │ │ (ArgoCD/Flux) │ │ │ │ updates manifest │ │ │ └────────┬───────────┘ │ │ │ kubectl apply │ │ ▼ │ │ ┌─────────────────┐ │ │ │ Deployment │ ◀── API Server │ │ │ Controller │ │ │ └────────┬────────┘ │ │ │ creates │ │ ▼ │ │ ┌─────────────────┐ │ │ │ New ReplicaSet │ │ │ │ (v2 template) │ │ │ └────────┬─────────┘ │ │ │ creates │ │ ▼ │ │ ┌─────────────────┐ │ │ │ New Pods │ │ │ │ (Pending) │ │ │ └────────┬─────────┘ │ │ │ scheduler assigns │ │ ▼ │ │ ┌─────────────────┐ │ │ │ Node │ │ │ │ (kubelet pulls │ │ │ │ image, starts │ │ │ │ container) │ │ │ └────────┬─────────┘ │ │ │ readinessProbe passes │ │ ▼ │ │ ┌─────────────────┐ │ │ │ Pod: Running │ │ │ │ Ready: true │ │ │ └─────────────────┘ │ │ Old ReplicaSet Pods terminate in parallel │ └─────────────────────────────────────────────────────────────────────────┘ ``` ### Step-by-Step Breakdown 1. **Git commit** — Developer pushes code. GitOps tools (ArgoCD, Flux) detect drift between desired state (Git) and current state (cluster). 2. **CI pipeline** — Runs tests, builds container image, tags with commit SHA or semantic version. 3. **Image push** — Image is pushed to container registry with immutable tag (never use `:latest` in production). 4. **Manifest update** — CI/CD updates the image tag in Kubernetes manifests (Helm values, Kustomize overlay, or raw YAML). 5. **`kubectl apply`** — New Deployment spec is submitted to the API server. 6. **Deployment controller** — Detects spec change via template hash. Creates new ReplicaSet. 7. **ReplicaSet controller** — Creates new Pods from the updated template. New Pods enter `Pending` state. 8. **Scheduler** — Assigns Pods to nodes based on resource requests, affinity rules, and taints. 9. **kubelet** — Pulls the container image from registry, creates containers via container runtime (containerd/CRI-O). 10. **readinessProbe** — Pod passes readiness check, is marked `Ready: true`, and is added to the Service endpoints. 11. **Scale-down** — Deployment controller scales down old ReplicaSet, terminating old Pods gracefully via `SIGTERM`. --- ## 8. Observability and Debugging Rollouts ### Core Rollout Commands **Monitor rollout progress in real time:** ```bash kubectl rollout status deployment/myapp # Output: Waiting for deployment "myapp" rollout to finish: 2 out of 4 new replicas have been updated... # Output: deployment "myapp" successfully rolled out ``` **View rollout history:** ```bash kubectl rollout history deployment/myapp # REVISION CHANGE-CAUSE # 1 kubectl apply --filename=deploy.yaml # 2 kubectl set image deployment/myapp app=myapp:v2 kubectl rollout history deployment/myapp --revision=2 ``` **Rollback to previous revision:** ```bash kubectl rollout undo deployment/myapp kubectl rollout undo deployment/myapp --to-revision=1 ``` **Inspect Deployment state:** ```bash kubectl describe deployment myapp # Shows: events, conditions, replica counts, image, strategy config ``` **List all ReplicaSets (including old ones):** ```bash kubectl get rs -l app=myapp # NAME DESIRED CURRENT READY AGE # myapp-7d9c4f8b 4 4 4 10m ← current # myapp-6b8c3a7c 0 0 0 2d ← previous (retained for rollback) ``` **Watch Pod transitions:** ```bash kubectl get pods -l app=myapp -w ``` **View cluster events for a namespace:** ```bash kubectl get events --sort-by='.metadata.creationTimestamp' -n production ``` **Check rollout with detailed Pod conditions:** ```bash kubectl get pods -l app=myapp -o wide kubectl describe pod myapp-7d9c4f8b-xk2pj ``` ### Debugging a Stuck Rollout A rollout stalls when new Pods cannot reach `Ready: true`. Common diagnostic sequence: ```bash # 1. Check rollout status kubectl rollout status deployment/myapp --timeout=2m # 2. Identify which Pods are not Ready kubectl get pods -l app=myapp # 3. Describe the failing Pod kubectl describe pod <failing-pod-name> # Look for: Events section, Conditions, Container state # 4. Check container logs kubectl logs <failing-pod-name> --previous # if crashed kubectl logs <failing-pod-name> -c <container> # 5. Check ReplicaSet events kubectl describe rs <new-replicaset-name> # 6. If stuck, manually pause rollout while investigating kubectl rollout pause deployment/myapp # 7. Resume after fix kubectl rollout resume deployment/myapp ``` --- ## 9. Production Failure Scenarios ### Scenario 1: CrashLoopBackOff During Rollout **Symptom:** New Pods start but immediately crash. Rollout stalls. Old Pods remain running (if `maxUnavailable: 0`). **Cause:** Application fails to start — missing environment variable, bad configuration, OOM at startup. **Detection:** ```bash kubectl get pods # NAME READY STATUS RESTARTS # myapp-7d9c4f8b-xk2pj 0/1 CrashLoopBackOff 5 kubectl logs myapp-7d9c4f8b-xk2pj --previous ``` **Resolution:** ```bash # Roll back immediately kubectl rollout undo deployment/myapp # Fix root cause (missing env var, bad secret, etc.) # Redeploy with fix ``` --- ### Scenario 2: Failing readinessProbe **Symptom:** Pods are running but never become `Ready`. Rollout stalls indefinitely. **Cause:** Application starts but readiness endpoint returns non-2xx. Misconfigured probe path, port, or application takes longer to initialize than `initialDelaySeconds`. **Detection:** ```bash kubectl describe pod <pod-name> # Events: # Warning Unhealthy readiness probe failed: HTTP probe failed with statuscode: 503 ``` **Resolution options:** - Increase `initialDelaySeconds` or `failureThreshold` - Fix application health endpoint - Roll back if urgent ```yaml readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 30 # Increased periodSeconds: 10 failureThreshold: 6 # More tolerance ``` --- ### Scenario 3: Insufficient Cluster Resources **Symptom:** New Pods remain in `Pending` state. Rollout stalls. No error in the Deployment, but Pods cannot be scheduled. **Cause:** Cluster has insufficient CPU or memory to schedule new Pods, especially with `maxSurge > 0`. **Detection:** ```bash kubectl describe pod <pending-pod-name> # Events: # Warning FailedScheduling 0/5 nodes are available: # 3 Insufficient memory, 2 node(s) had taint that pod didn't tolerate. kubectl top nodes ``` **Resolution:** - Scale up node group (cluster autoscaler, manual) - Reduce `resources.requests` if over-specified - Temporarily set `maxSurge: 0` and `maxUnavailable: 1` to avoid needing extra capacity --- ### Scenario 4: Rollout Stuck Due to Unavailable Pods **Symptom:** Rollout progress bar does not advance. The Deployment shows fewer ready replicas than desired. **Cause:** PodDisruptionBudget blocks eviction. Node is NotReady. Affinity/anti-affinity rules prevent scheduling. **Detection:** ```bash kubectl describe deployment myapp # Conditions: # Available True MinimumReplicasAvailable # Progressing True ReplicaSetUpdated (but no progress) kubectl get pdb -n production # NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS # myapp 3 N/A 0 ← PDB blocking eviction ``` **Resolution:** - Verify PDB configuration is not overly restrictive - Check if nodes are healthy: `kubectl get nodes` - Check for pending Pods blocked by affinity: `kubectl describe pod` --- ## 10. Best Practices for Safe Kubernetes Deployments ### 10.1 Always Define readinessProbe and livenessProbe Without probes, Kubernetes considers a Pod ready immediately after container startup — before the application is actually serving traffic. ```yaml readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 10 periodSeconds: 5 failureThreshold: 3 livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 30 periodSeconds: 10 failureThreshold: 3 ``` **Rule of thumb:** - `readinessProbe`: "Is the app ready to receive traffic?" - `livenessProbe`: "Is the app alive? Should it be restarted?" ### 10.2 Configure maxSurge and maxUnavailable Appropriately ```yaml strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # Allow 1 extra Pod during rollout maxUnavailable: 0 # Never reduce available capacity ``` |Goal|maxSurge|maxUnavailable|Trade-off| |---|---|---|---| |Zero downtime|1+|0|Needs extra capacity| |Fast rollout|25%|25%|Brief capacity reduction| |Resource-constrained|0|1|Slower, no extra capacity| |High availability|2|0|Higher cost during rollout| ### 10.3 Use PodDisruptionBudget PDBs protect against simultaneous voluntary disruptions (node drains, cluster upgrades, HPA scale-in). ```yaml apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: myapp-pdb spec: minAvailable: 2 # or use maxUnavailable: 1 selector: matchLabels: app: myapp ``` **Important:** PDB only applies to voluntary disruptions. Node failures are involuntary and are not blocked by PDBs. ### 10.4 Pin Image Tags — Never Use `:latest` ```yaml # Bad image: myapp:latest # Good image: registry.example.com/myapp:v2.1.0 # Better (immutable) image: registry.example.com/myapp@sha256:a1b2c3d4... ``` Using `:latest` means different nodes may pull different image versions, creating non-deterministic cluster state. ### 10.5 Set Resource Requests and Limits Without resource requests, the scheduler makes poor placement decisions. Without limits, a single Pod can starve neighbors. ```yaml resources: requests: cpu: "250m" memory: "256Mi" limits: cpu: "500m" memory: "512Mi" ``` ### 10.6 Monitor Rollout Metrics Track these signals during and after every rollout: |Metric|Tool|Alert Threshold| |---|---|---| |HTTP error rate (5xx)|Prometheus + Grafana|> 1% for 2 minutes| |Pod restart count|Kubernetes metrics|> 2 restarts in 5 min| |Rollout duration|Argo Rollouts / CI|> 2x baseline| |Request latency (p99)|Prometheus|> 2x baseline| |Ready Pod count|kube-state-metrics|< minAvailable| ### 10.7 Use Deployment Annotations for Change Tracking ```bash kubectl annotate deployment/myapp \ kubernetes.io/change-cause="Release v2.1.0: fix payment timeout bug" ``` This populates `CHANGE-CAUSE` in `kubectl rollout history`, giving engineers context when rolling back. ### 10.8 Test Rollbacks Regularly Rollback capability is only valuable if it has been tested. Include rollback verification in your release runbooks and periodically drill the procedure in staging. ```bash # Full rollback drill kubectl rollout undo deployment/myapp kubectl rollout status deployment/myapp kubectl get pods -l app=myapp ``` --- ## 11. Conclusion Kubernetes deployment strategies and Pod replacement mechanisms are two sides of the same coin. Every strategy — from the simplest Recreate to the most sophisticated Progressive Delivery pipeline — is ultimately an orchestration policy built on top of a single primitive: **terminate old Pod, create new Pod**. Understanding this relationship equips engineers to make better operational decisions: - When you choose **Blue-Green**, you are choosing to maintain two full ReplicaSets simultaneously and switch Service selectors atomically. - When you choose **Canary**, you are managing two concurrent Deployments and controlling traffic split at the ingress layer. - When you tune `maxSurge` and `maxUnavailable`, you are directly controlling the rate and capacity impact of Pod replacement. - When you define a `readinessProbe`, you are telling Kubernetes when a new Pod is eligible to replace an old one in the Service endpoints. - When you configure a `PodDisruptionBudget`, you are constraining how many Pods can be voluntarily replaced at any given time. The engineers who build the most reliable Kubernetes platforms are those who internalize this model: the cluster is a distributed system that achieves application updates through controlled Pod replacement, and every configuration decision — probes, strategies, budgets, resource limits — shapes how safely and quickly that replacement happens. Master the mechanisms, and the strategies follow naturally. --- ## Quick Reference ### Essential kubectl Commands for Rollout Management ```bash # Deploy kubectl apply -f deployment.yaml kubectl set image deployment/myapp app=myapp:v2 # Monitor kubectl rollout status deployment/myapp kubectl get pods -l app=myapp -w kubectl get rs -l app=myapp # Debug kubectl describe deployment myapp kubectl describe pod <pod-name> kubectl logs <pod-name> --previous kubectl get events --sort-by=.metadata.creationTimestamp # Control kubectl rollout pause deployment/myapp kubectl rollout resume deployment/myapp kubectl rollout undo deployment/myapp kubectl rollout undo deployment/myapp --to-revision=2 # History kubectl rollout history deployment/myapp kubectl annotate deployment/myapp kubernetes.io/change-cause="v2.1.0" # Restart (force new Pods without image change) kubectl rollout restart deployment/myapp kubectl rollout restart daemonset/node-exporter kubectl rollout restart statefulset/postgres ``` --- _Article maintained at [doc.thedevops.dev](https://doc.thedevops.dev/) | Last updated: March 2026_