Pod Lifecycle: Termination, Restarts, and Pull Policies

1. POD TERMINATION: THE GRACEFUL RACE

When a Pod is marked for deletion, Kubernetes initiates two parallel workflows. A common production failure is not accounting for the race condition between these two.

1.1 The Internal Sequence

State Change: Pod metadata.deletionTimestamp is set. Status becomes Terminating.
Workflow A (Networking): The Service Controller and Endpoint Controller observe the deletion and remove the Pod's IP from all Endpoints/EndpointSlices.
Workflow B (Node-Level):
- preStop Hook: If defined, the Kubelet executes the preStop hook synchronously.
- SIGTERM: Kubelet sends SIGTERM (Signal 15) to PID 1 inside each container.
- Grace Period: Kubelet waits for terminationGracePeriodSeconds (default 30s).
- SIGKILL: If containers are still running, Kubelet sends SIGKILL (Signal 9).

1.2 The Race Condition Pitfall

The Problem: Workflow A (removing IPs from Load Balancers) and Workflow B (killing the process) happen simultaneously. In highly distributed clusters, iptables/IPVS updates can take several seconds to propagate to all nodes. The Result: A Load Balancer might send a request to a Pod that has already received SIGTERM and closed its listener, resulting in a 502 Bad Gateway.

The Bible-Grade Solution: Use a preStop hook to delay the SIGTERM, giving the network layer time to finish propagation.

spec:
  terminationGracePeriodSeconds: 60
  containers:
  - name: app
    image: my-app:v1
    lifecycle:
      preStop:
        exec:
          # Wait 10 seconds for Endpoints to propagate before sending SIGTERM
          command: ["/bin/sh", "-c", "sleep 10"]

2. RESTART POLICY & EXPONENTIAL BACKOFF

The restartPolicy (defined at spec.restartPolicy) determines how the Kubelet reacts to a container exit.

Policy	Exit Code 0 (Success)	Exit Code >0 (Error)	Internal Logic
`Always`	Restart	Restart	Standard for Deployments/StatefulSets.
`OnFailure`	Do Not Restart	Restart	Standard for Jobs/CronJobs.
`Never`	Do Not Restart	Do Not Restart	One-off tasks/Debug sessions.

2.1 The Backoff Algorithm

To prevent a "Hot Loop" (draining CPU/Logs by restarting a crashing container thousands of times per second), Kubelet implements an Exponential Backoff.

Initial delay: 10 seconds.
Multiplier: 2x per subsequent failure.
Maximum delay: 300 seconds (5 minutes).
Reset: If a container runs successfully for 10 minutes, the Kubelet resets the backoff timer.

3. IMAGE PULL POLICY: MECHANICS & CACHING

Pulling images is often the slowest part of the Pod lifecycle.

Policy	Technical Behavior
`Always`	Kubelet queries the container registry for the Digest (SHA256). If the local digest doesn't match the remote, it pulls the image.
`IfNotPresent`	Kubelet checks the local node cache. If the Tag exists locally, it skips the registry check entirely.
`Never`	Kubelet assumes the image is pre-loaded on the node (e.g., via AMI or specialized disk). Reaches `ErrImageNeverPull` if missing.

3.1 The `:latest` Trap

If you use the :latest tag (or no tag), Kubernetes implicitly sets the pull policy to Always. This introduces a dependency on the registry for every single Pod restart/scale event. Production Requirement: Always use specific semantic versions (e.g., v1.2.3) to ensure IfNotPresent works as intended and to guarantee deterministic rollbacks.

4. THE POD STATE MACHINE (Phases vs. Conditions)

A Pod's "Phase" is a high-level summary. For real debugging, you must look at Conditions.

4.1 Pod Phases

Pending: The API Server accepted the Pod, but it hasn't been scheduled or the image is still pulling.
Running: All containers are created; at least one is still running.
Succeeded/Failed: Terminal states (Job completed or crashed).
Unknown: Kubelet is unresponsive (Node loss).

4.2 Pod Conditions (The "Truth")

Conditions provide the "Why" behind the "Phase."

kubectl get pod <name> -o jsonpath='{.status.conditions[*]}'

Condition	Meaning
`PodScheduled`	The Scheduler has assigned a Node.
`Initialized`	All Init Containers have finished successfully.
`ContainersReady`	All application containers have passed their Readiness Probes.
`Ready`	The Pod is officially in the Load Balancer (Service).

5. TROUBLESHOOTING & ARCHITECT COMMANDS

5.1 Inspecting Termination Reasons

If a Pod vanished or was killed, check the Last State.

kubectl get pod <pod-name> -o json | jq '.status.containerStatuses[0].lastState'

Sample Output (OOM):

{
  "terminated": {
    "exitCode": 137,
    "reason": "OOMKilled",
    "startedAt": "2023-10-27T10:00:00Z",
    "finishedAt": "2023-10-27T10:05:00Z"
  }
}

5.2 Common Exit Codes for Senior Engineers

0: Graceful exit.
1: Application-level crash (check logs).
137: SIGKILL (Likely OOMKilled or Grace Period expired).
139: Segmentation Fault.
143: SIGTERM (Normal shutdown).

6. PRODUCTION CHECKLIST

Handle SIGTERM: Ensure your application's entrypoint (PID 1) is not a shell script that eats signals. Use exec my-binary in your shell wrapper or use tools like tini.
Match Grace Period to App: If your Java app takes 45 seconds to flush a buffer, set terminationGracePeriodSeconds: 60.
Use preStop for Propagations: In high-traffic environments, a sleep 5 in the preStop hook is the simplest way to ensure zero-downtime during rolling updates.
Avoid Latest: Use immutable tags to prevent ImagePullBackOff during regional registry outages if the image is already cached.

1. POD TERMINATION: THE GRACEFUL RACE​

1.1 The Internal Sequence​

1.2 The Race Condition Pitfall​

2. RESTART POLICY & EXPONENTIAL BACKOFF​

2.1 The Backoff Algorithm​

3. IMAGE PULL POLICY: MECHANICS & CACHING​

3.1 The :latest Trap​

4. THE POD STATE MACHINE (Phases vs. Conditions)​

4.1 Pod Phases​

4.2 Pod Conditions (The "Truth")​

5. TROUBLESHOOTING & ARCHITECT COMMANDS​

5.1 Inspecting Termination Reasons​

5.2 Common Exit Codes for Senior Engineers​

6. PRODUCTION CHECKLIST​