Advanced Ninja Commands: The Forensics and Recovery Toolkit
Mastering Kubernetes operations involves knowing not just how to run declarative commands, but how to perform deep forensics and disaster recovery. This section consolidates advanced JSONPath queries and control plane recovery procedures.
1. JSONPATH MASTERCLASS: MINING THE API
JSONPath is the standard, shell-agnostic way to extract precise data from the Kubernetes API. It is critical for monitoring health checks, scripting, and auditing.
1.1 Core Syntax Reference
| Component | Syntax | Purpose |
|---|---|---|
| Root Object | . | The starting point of the JSON document. |
| Array Element | [n] | Selects the n-th element of an array. |
| Array Slice | [-1] | Selects the last element (e.g., last condition). |
| Wildcard | [*] | Selects all elements in an array or object. |
| Filter | [?(<expression>)] | Filters an array based on a condition (e.g., status=="True"). |
| Range | {range .items[*]}{...}{end} | Allows looping and custom output formatting. |
1.2 Production Forensics Queries
| Goal | Command |
|---|---|
| Find Unhealthy Nodes | kubectl get nodes -o jsonpath='{range .items[?(@.status.conditions[-1].status!="True")]}{.metadata.name}{"\t"}{.status.conditions[-1].message}{"\n"}{end}' |
| Find Pods with High Restarts (> 5) | kubectl get pods -A -o jsonpath='{range .items[?(@.status.containerStatuses[0].restartCount > 5)]}{.metadata.namespace}{"\t"}{.metadata.name}{"\n"}{end}' |
| Audit Image IDs/Digests | kubectl get pods -A -o jsonpath='{.items[*].status.containerStatuses[*].imageID}' |
| Find Pods Missing Node Assignment | kubectl get pods -A -o jsonpath='{range .items[?(@.spec.nodeName=="")]}{.metadata.name}{"\n"}{end}' |
| Decode A Secret | `kubectl get secret my-secret -o jsonpath='{.data.password}' |
2. ETCD DISASTER RECOVERY & MAINTENANCE
2.1 Full Etcd Restore Procedure (Single Master)
This procedure is required when Etcd loses quorum or data is corrupted. It stops the cluster and restores the database from a backup.
- Stop Control Plane: Move all static pod manifests away from the Kubelet's watch directory to stop the cluster safely.
sudo mv /etc/kubernetes/manifests/*.yaml /root/etcd-backup/
# Wait for containers to terminate (check crictl ps) - Execute Restore: Restore the snapshot to a NEW directory (
/var/lib/etcd-new) to avoid conflict with corrupted data.export ETCDCTL_API=3
etcdctl snapshot restore /tmp/snapshot.db \
--data-dir=/var/lib/etcd-new \
--initial-cluster='master-node-01=https://127.0.0.1:2380' \
--initial-advertise-peer-urls=https://127.0.0.1:2380 \
--name=master-node-01 - Update Manifest: Edit
/root/etcd-backup/etcd.yaml. Change thehostPathfor the etcd volume to point to the new data directory (/var/lib/etcd-new). - Restart Control Plane: Move the manifests back. Kubelet restarts the pods using the restored data.
sudo mv /root/etcd-backup/*.yaml /etc/kubernetes/manifests/
2.2 Maintenance: Etcd Defragmentation
Etcd uses MVCC (Multi-Version Concurrency Control). Over time, old versions of keys accumulate. This increases disk I/O and latency.
- Logic: Compaction removes old revisions. Defragmentation reclaims disk space.
- Command: Run this periodically (non-leader node preferred):
etcdctl defrag --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
3. ADVANCED OPERATIONAL TOPICS
3.1 API Server Audit Logging (The Black Box)
The Audit Log records every authenticated request made to the API Server. This is the most crucial tool for security forensics and compliance.
- Configuration: Enabled via flags on the
kube-apiserverstatic pod:--audit-policy-file=/etc/kubernetes/audit-policy.yaml
--audit-log-path=/var/log/kubernetes/audit.log - Log Content: JSON payload containing the user, the verb (GET/POST/DELETE), the resource (Pod, Secret), and the source IP.
3.2 Certificate Rotation (kubeadm)
Kubernetes certificates expire (usually after 1 year). kubeadm automates the renewal process for all control plane certificates.
- Check Status:
kubeadm certs check-expiration - Renewal:
kubeadm certs renew all - Note: This command updates the files in
/etc/kubernetes/pki/. You must restart the Control Plane components (or wait for the Kubelet to restart the static pods) and manually update any externalkubeconfigfiles.
3.3 The Descheduler (The Load Balancer)
The default Kubernetes scheduler only runs when a Pod is Pending. It does not correct imbalances when node capacity or resource usage changes.
- Role: The Descheduler runs periodically and evicts Pods that violate policies (e.g., Pods on over-utilized nodes, multiple replicas of an app on the same node).
- Logic: It uses the Eviction API to terminate Pods, forcing the default scheduler to place them more optimally.
- Goal: Enforce the ideal state (Affinity, Spreading) that the scheduler wished it could enforce during initial placement.