Skip to main content

Project Lab 03: Zero-Downtime Rolling Update and Service Handoff

Achieving true zero-downtime deployments is a core requirement for production workloads. This lab demonstrates how to configure a Deployment to perform an update in a non-disruptive, highly-available manner by controlling the speed and safety of the rollout using maxSurge and the Readiness Probe.

Reference Material: docs/02-workloads-services/1-pods-rc-rs-deployment.md


1. OBJECTIVE: ROLLING UPDATE WITH 100% AVAILABILITY

The goal is to update a running application from version v1.0.0 to v2.0.0 while maintaining the full desired replica count, ensuring no user ever receives an error.

  • Key Setting: We will use maxUnavailable: 0 to ensure no old pods are terminated until a new pod is fully Ready.

2. LAB SETUP: THE VERSIONED SERVICE

We use a simple NGINX deployment that exposes its application version via a custom header.

2.1 The Application Image (Conceptual)

We need two versions of an image (v1.0.0 and v2.0.0) that take time to start.

Image TagApplication StateReadiness Probe Path
v1.0.0 (Old)Returns {"version": "v1.0.0"}/ready returns 200 OK
v2.0.0 (New)Returns {"version": "v2.0.0"}/ready returns 200 OK after 30 seconds

2.2 Initial Deployment Manifest (v1-initial.yaml)

We start with 4 replicas and enforce Pod Anti-Affinity to spread them across nodes (best practice).

apiVersion: apps/v1
kind: Deployment
metadata:
name: api-router
labels:
app: api-router
version: v1
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0 # CRITICAL: No pods down at any time
maxSurge: 1 # CRITICAL: Allow 1 extra pod during the rollout (4+1 = 5 max pods)
selector:
matchLabels:
app: api-router
template:
metadata:
labels:
app: api-router
version: v1
spec:
# Spread replicas across different nodes
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: api-router
topologyKey: "kubernetes.io/hostname"
containers:
- name: router-app
image: custom-registry/router-app:v1.0.0 # Old version
ports:
- containerPort: 8080
resources:
limits:
cpu: "250m"
readinessProbe: # The key to a safe rollout
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3

2.3 Service Manifest (service.yaml)

The service only selects on the generic app: api-router label.

apiVersion: v1
kind: Service
metadata:
name: api-svc
spec:
type: ClusterIP
selector:
app: api-router # Will target both v1 and v2 during the rollout
ports:
- port: 80
targetPort: 8080

3. EXECUTION AND AUDIT

3.1 Initial Deployment

Apply the initial manifests.

kubectl apply -f v1-initial.yaml
kubectl apply -f service.yaml
kubectl get pods -l app=api-router
kubectl get endpoints api-svc

Expected State: 4 Pods (v1 only), Service Endpoints = 4 IPs.

3.2 The Upgrade Command

Update the deployment image to the new version (v2.0.0) and change the metadata label to trigger a new ReplicaSet.

kubectl set image deployment/api-router router-app=custom-registry/router-app:v2.0.0
kubectl patch deployment api-router -p '{"spec":{"template":{"metadata":{"labels":{"version":"v2"}}}}}'

3.3 The Rolling Update Audit (The Core Test)

Use kubectl rollout status and watch the ReplicaSets in parallel.

# Terminal 1: Watch the Rollout Status
kubectl rollout status deployment/api-router -w

# Terminal 2: Watch the Pods and their Readiness
kubectl get pods -l app=api-router -L version -w

# Terminal 3: The Load Test (The Zero-Downtime Proof)
# Run a continuous loop to hit the service endpoint
while true; do curl -s api-svc/version ; sleep 0.1; done
# OR: kubectl exec -it debug-pod -- while true; do curl -s api-svc/version; sleep 0.1; done

Expected Observation in Terminal 2 (The Rollout Sequence):

PodsOld RS (v1)New RS (v2)Status ChangeEndpoints
Start40All Running/Ready4 IPs
Surge41New Pod v2 starts $\to$ Running4 IPs (v2 not Ready yet)
Ready41New Pod v2 $\to$ Ready (After 30s)5 IPs
Drain31Old Pod v1 $\to$ Terminating4 IPs
Continue04Process repeats until completion4 IPs

Expected Observation in Terminal 3 (Load Test): The output should continuously show version strings (v1.0.0 or v2.0.0) without any connection errors or 502/503 responses. The service ensures that traffic is only sent to the 4 or 5 pods that report Ready.


4. TROUBLESHOOTING AND NINJA COMMANDS

4.1 Checking Endpoint Stalling

If the rollout stalls, the Readiness Probe is likely failing or the maxSurge / maxUnavailable settings are impossible.

# Check the status of the Service Endpoints (should never drop below 4)
kubectl get endpoints api-svc -o jsonpath='{.subsets[*].addresses[*].ip}'

4.2 Force a Failed Rollout Investigation

If a rollout is stuck in Progressing: True but the Pods are CrashLoopBackOff, the deployment will hang forever.

# Check the internal progress of the rollout
kubectl rollout history deployment api-router
kubectl rollout status deployment api-router

Fix: Rollback immediately.

kubectl rollout undo deployment/api-router