Project Lab 02: Auditing the Control Plane Workflow & Security
Immediately after setting up a cluster (or joining a new one), an architect must verify the core components. This lab uses low-level host commands and cluster-internal services to prove the health and integrity of the Control Plane components and their communication workflow.
Reference Material: For a refresher on component roles, refer to docs/01-architecture-workflow/1-architecture.md.
1. OBJECTIVE: VALIDATE THE CONTROL LOOP
The core Kubernetes workflow is a Control Loop based on the relationship:
Etcd (Desired State) connects to API Server (Gatekeeper) connects to Controllers/Scheduler (Reconcilers).
We will validate the integrity of each connection link in this chain.
2. LAB SETUP & PRE-FLIGHT CHECK
Assumptions:
- You have a running, self-managed Kubernetes cluster (e.g., Kubeadm, Kind).
- You have SSH access to the Control Plane node.
kubectlis configured and accessible.
2.1 Initial Health Check
Ensure all primary pods are running in the kube-system namespace.
kubectl get pods -n kube-system
Expected Output: All pods (kube-apiserver, etcd, kube-scheduler, kube-controller-manager, coredns, kube-proxy) should be in the Running status.
3. VALIDATING ETCD (THE SOURCE OF TRUTH)
The API Server must communicate with Etcd securely and ensure its quorum is healthy.
3.1 Etcd Health Check (From Host)
Etcd runs as a Static Pod. We must use its container's tools and certificates to query its health directly.
# 1. Find the etcd static pod name (adjust to your hostname)
ETCD_POD=$(kubectl get pods -n kube-system -l component=etcd -o jsonpath='{.items[0].metadata.name}')
# 2. Exec into the pod and run etcdctl endpoint health
# This confirms the etcd process is running and the data directory is available.
kubectl exec -it -n kube-system $ETCD_POD -- etcdctl endpoint health
Expected Output:
https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 2.45ms
3.2 Security Audit: Read-Only Access
Objective: Prove that the security boundary is intact—no component (or engineer) can access Etcd without a valid client certificate issued by the Etcd CA.
- Attempt Direct Access (Unauthenticated):
Expected Output:
curl -k https://127.0.0.1:2379/v3/range -X POST -d '{"key": "foo"}'Client certificate required(or similar TLS error).- Conclusion: Etcd successfully enforces mTLS; unauthenticated access is blocked.
4. VALIDATING THE API SERVER (THE GATEKEEPER)
The API Server must prove it is reachable and has the correct security context.
4.1 Internal Health Check
Use the standard API endpoint to verify the health of the entire API Server process, bypassing kubectl overhead.
# The API server runs on port 6443. Use the local IP to bypass any external LB.
curl -k https://127.0.0.1:6443/readyz
Expected Output: ok
- Deep Dive: The
/readyzendpoint checks not just the API Server process, but also its dependency health (e.g., its connection to Etcd).
4.2 Certificate Audit (PKI Flow)
Verify the API Server's certificate is valid and issued by the cluster root CA.
# 1. Extract and display certificate details
sudo openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout
Audit Checklist (in output):
Subject:Should containCN=kube-apiserver.Issuer:Should match the cluster root CA (e.g.,CN=kubernetes).X509v3 Subject Alternative Name:Must include127.0.0.1, the Control Plane's private IP, and the Cluster IP (10.96.0.1).
5. VALIDATING THE SCHEDULER & CONTROLLER MANAGER
These components run reconciliation loops. We validate them by checking their logs and health.
5.1 Leader Election Check
In an HA setup, only one instance of the Controller Manager and Scheduler is active. We check which one holds the Lease.
# Get the leader for the 'kube-scheduler'
kubectl get lease kube-scheduler -n kube-system -o jsonpath='{.spec.holderIdentity}'
# Get the leader for the 'kube-controller-manager'
kubectl get lease kube-controller-manager -n kube-system -o jsonpath='{.spec.holderIdentity}'
Expected Output: The output should be a single hostname/identity, proving the Leader Election mechanism is active and preventing "split-brain" syndrome.
5.2 The Reconciliation Test
Objective: Verify that the Deployment Controller immediately reacts to a change in the desired state.
- Change Desired State: Scale a dummy deployment to 0.
kubectl scale deployment coredns -n kube-system --replicas=0 - Audit KCM Logs: The Controller Manager immediately detects the change in the Deployment object and sends a command to scale the ReplicaSet.
kubectl logs -n kube-system -l component=kube-controller-manager | grep -i "scaling up" - Restore State: Scale it back up.
kubectl scale deployment coredns -n kube-system --replicas=2
- Conclusion: This confirms the Controller Manager is actively watching the API Server and reconciling the cluster state, thus completing the validation of the core workflow.
6. COMMAND SUMMARY
| Component | Goal | Command |
|---|---|---|
| Etcd | Quorum Health | kubectl exec ... -- etcdctl endpoint health |
| API Server | Readiness | curl -k https://127.0.0.1:6443/readyz |
| Scheduler | Active Leader | kubectl get lease kube-scheduler -n kube-system |
| Controller | Active Reconciliation | kubectl scale deployment ... then check logs. |
| PKI | Cert Validity | sudo openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout |