Project Lab 03: Zero-Downtime Rolling Update and Service Handoff
Achieving true zero-downtime deployments is a core requirement for production workloads. This lab demonstrates how to configure a Deployment to perform an update in a non-disruptive, highly-available manner by controlling the speed and safety of the rollout using maxSurge and the Readiness Probe.
Reference Material: docs/02-workloads-services/1-pods-rc-rs-deployment.md
1. OBJECTIVE: ROLLING UPDATE WITH 100% AVAILABILITY
The goal is to update a running application from version v1.0.0 to v2.0.0 while maintaining the full desired replica count, ensuring no user ever receives an error.
- Key Setting: We will use
maxUnavailable: 0to ensure no old pods are terminated until a new pod is fully Ready.
2. LAB SETUP: THE VERSIONED SERVICE
We use a simple NGINX deployment that exposes its application version via a custom header.
2.1 The Application Image (Conceptual)
We need two versions of an image (v1.0.0 and v2.0.0) that take time to start.
| Image Tag | Application State | Readiness Probe Path |
|---|---|---|
v1.0.0 (Old) | Returns {"version": "v1.0.0"} | /ready returns 200 OK |
v2.0.0 (New) | Returns {"version": "v2.0.0"} | /ready returns 200 OK after 30 seconds |
2.2 Initial Deployment Manifest (v1-initial.yaml)
We start with 4 replicas and enforce Pod Anti-Affinity to spread them across nodes (best practice).
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-router
labels:
app: api-router
version: v1
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 # CRITICAL: No pods down at any time
maxSurge: 1 # CRITICAL: Allow 1 extra pod during the rollout (4+1 = 5 max pods)
selector:
matchLabels:
app: api-router
template:
metadata:
labels:
app: api-router
version: v1
spec:
# Spread replicas across different nodes
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: api-router
topologyKey: "kubernetes.io/hostname"
containers:
- name: router-app
image: custom-registry/router-app:v1.0.0 # Old version
ports:
- containerPort: 8080
resources:
limits:
cpu: "250m"
readinessProbe: # The key to a safe rollout
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
2.3 Service Manifest (service.yaml)
The service only selects on the generic app: api-router label.
apiVersion: v1
kind: Service
metadata:
name: api-svc
spec:
type: ClusterIP
selector:
app: api-router # Will target both v1 and v2 during the rollout
ports:
- port: 80
targetPort: 8080
3. EXECUTION AND AUDIT
3.1 Initial Deployment
Apply the initial manifests.
kubectl apply -f v1-initial.yaml
kubectl apply -f service.yaml
kubectl get pods -l app=api-router
kubectl get endpoints api-svc
Expected State: 4 Pods (v1 only), Service Endpoints = 4 IPs.
3.2 The Upgrade Command
Update the deployment image to the new version (v2.0.0) and change the metadata label to trigger a new ReplicaSet.
kubectl set image deployment/api-router router-app=custom-registry/router-app:v2.0.0
kubectl patch deployment api-router -p '{"spec":{"template":{"metadata":{"labels":{"version":"v2"}}}}}'
3.3 The Rolling Update Audit (The Core Test)
Use kubectl rollout status and watch the ReplicaSets in parallel.
# Terminal 1: Watch the Rollout Status
kubectl rollout status deployment/api-router -w
# Terminal 2: Watch the Pods and their Readiness
kubectl get pods -l app=api-router -L version -w
# Terminal 3: The Load Test (The Zero-Downtime Proof)
# Run a continuous loop to hit the service endpoint
while true; do curl -s api-svc/version ; sleep 0.1; done
# OR: kubectl exec -it debug-pod -- while true; do curl -s api-svc/version; sleep 0.1; done
Expected Observation in Terminal 2 (The Rollout Sequence):
| Pods | Old RS (v1) | New RS (v2) | Status Change | Endpoints |
|---|---|---|---|---|
| Start | 4 | 0 | All Running/Ready | 4 IPs |
| Surge | 4 | 1 | New Pod v2 starts $\to$ Running | 4 IPs (v2 not Ready yet) |
| Ready | 4 | 1 | New Pod v2 $\to$ Ready (After 30s) | 5 IPs |
| Drain | 3 | 1 | Old Pod v1 $\to$ Terminating | 4 IPs |
| Continue | 0 | 4 | Process repeats until completion | 4 IPs |
Expected Observation in Terminal 3 (Load Test):
The output should continuously show version strings (v1.0.0 or v2.0.0) without any connection errors or 502/503 responses. The service ensures that traffic is only sent to the 4 or 5 pods that report Ready.
4. TROUBLESHOOTING AND NINJA COMMANDS
4.1 Checking Endpoint Stalling
If the rollout stalls, the Readiness Probe is likely failing or the maxSurge / maxUnavailable settings are impossible.
# Check the status of the Service Endpoints (should never drop below 4)
kubectl get endpoints api-svc -o jsonpath='{.subsets[*].addresses[*].ip}'
4.2 Force a Failed Rollout Investigation
If a rollout is stuck in Progressing: True but the Pods are CrashLoopBackOff, the deployment will hang forever.
# Check the internal progress of the rollout
kubectl rollout history deployment api-router
kubectl rollout status deployment api-router
Fix: Rollback immediately.
kubectl rollout undo deployment/api-router