Project Lab 02: Auditing the Control Plane Workflow & Security

Immediately after setting up a cluster (or joining a new one), an architect must verify the core components. This lab uses low-level host commands and cluster-internal services to prove the health and integrity of the Control Plane components and their communication workflow.

Reference Material: For a refresher on component roles, refer to docs/01-architecture-workflow/1-architecture.md.

1. OBJECTIVE: VALIDATE THE CONTROL LOOP

The core Kubernetes workflow is a Control Loop based on the relationship:

Etcd (Desired State) connects to API Server (Gatekeeper) connects to Controllers/Scheduler (Reconcilers).

We will validate the integrity of each connection link in this chain.

2. LAB SETUP & PRE-FLIGHT CHECK

Assumptions:

You have a running, self-managed Kubernetes cluster (e.g., Kubeadm, Kind).
You have SSH access to the Control Plane node.
kubectl is configured and accessible.

2.1 Initial Health Check

Ensure all primary pods are running in the kube-system namespace.

kubectl get pods -n kube-system

Expected Output: All pods (kube-apiserver, etcd, kube-scheduler, kube-controller-manager, coredns, kube-proxy) should be in the Running status.

3. VALIDATING ETCD (THE SOURCE OF TRUTH)

The API Server must communicate with Etcd securely and ensure its quorum is healthy.

3.1 Etcd Health Check (From Host)

Etcd runs as a Static Pod. We must use its container's tools and certificates to query its health directly.

# 1. Find the etcd static pod name (adjust to your hostname)
ETCD_POD=$(kubectl get pods -n kube-system -l component=etcd -o jsonpath='{.items[0].metadata.name}')

# 2. Exec into the pod and run etcdctl endpoint health
# This confirms the etcd process is running and the data directory is available.
kubectl exec -it -n kube-system $ETCD_POD -- etcdctl endpoint health

Expected Output:

https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 2.45ms

3.2 Security Audit: Read-Only Access

Objective: Prove that the security boundary is intact—no component (or engineer) can access Etcd without a valid client certificate issued by the Etcd CA.

Attempt Direct Access (Unauthenticated):
```
curl -k https://127.0.0.1:2379/v3/range -X POST -d '{"key": "foo"}'
```
Expected Output: Client certificate required (or similar TLS error).
- Conclusion: Etcd successfully enforces mTLS; unauthenticated access is blocked.

4. VALIDATING THE API SERVER (THE GATEKEEPER)

The API Server must prove it is reachable and has the correct security context.

4.1 Internal Health Check

Use the standard API endpoint to verify the health of the entire API Server process, bypassing kubectl overhead.

# The API server runs on port 6443. Use the local IP to bypass any external LB.
curl -k https://127.0.0.1:6443/readyz

Expected Output: ok

Deep Dive: The /readyz endpoint checks not just the API Server process, but also its dependency health (e.g., its connection to Etcd).

4.2 Certificate Audit (PKI Flow)

Verify the API Server's certificate is valid and issued by the cluster root CA.

# 1. Extract and display certificate details
sudo openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout

Audit Checklist (in output):

Subject: Should contain CN=kube-apiserver.
Issuer: Should match the cluster root CA (e.g., CN=kubernetes).
X509v3 Subject Alternative Name: Must include 127.0.0.1, the Control Plane's private IP, and the Cluster IP (10.96.0.1).

5. VALIDATING THE SCHEDULER & CONTROLLER MANAGER

These components run reconciliation loops. We validate them by checking their logs and health.

5.1 Leader Election Check

In an HA setup, only one instance of the Controller Manager and Scheduler is active. We check which one holds the Lease.

# Get the leader for the 'kube-scheduler'
kubectl get lease kube-scheduler -n kube-system -o jsonpath='{.spec.holderIdentity}'

# Get the leader for the 'kube-controller-manager'
kubectl get lease kube-controller-manager -n kube-system -o jsonpath='{.spec.holderIdentity}'

Expected Output: The output should be a single hostname/identity, proving the Leader Election mechanism is active and preventing "split-brain" syndrome.

5.2 The Reconciliation Test

Objective: Verify that the Deployment Controller immediately reacts to a change in the desired state.

Change Desired State: Scale a dummy deployment to 0.

kubectl scale deployment coredns -n kube-system --replicas=0

Audit KCM Logs: The Controller Manager immediately detects the change in the Deployment object and sends a command to scale the ReplicaSet.
```
kubectl logs -n kube-system -l component=kube-controller-manager | grep -i "scaling up"
```

Restore State: Scale it back up.

kubectl scale deployment coredns -n kube-system --replicas=2

Conclusion: This confirms the Controller Manager is actively watching the API Server and reconciling the cluster state, thus completing the validation of the core workflow.

6. COMMAND SUMMARY

Component	Goal	Command
Etcd	Quorum Health	`kubectl exec ... -- etcdctl endpoint health`
API Server	Readiness	`curl -k https://127.0.0.1:6443/readyz`
Scheduler	Active Leader	`kubectl get lease kube-scheduler -n kube-system`
Controller	Active Reconciliation	`kubectl scale deployment ...` then check logs.
PKI	Cert Validity	`sudo openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout`

1. OBJECTIVE: VALIDATE THE CONTROL LOOP​

2. LAB SETUP & PRE-FLIGHT CHECK​

2.1 Initial Health Check​

3. VALIDATING ETCD (THE SOURCE OF TRUTH)​

3.1 Etcd Health Check (From Host)​

3.2 Security Audit: Read-Only Access​

4. VALIDATING THE API SERVER (THE GATEKEEPER)​

4.1 Internal Health Check​

4.2 Certificate Audit (PKI Flow)​

5. VALIDATING THE SCHEDULER & CONTROLLER MANAGER​

5.1 Leader Election Check​

5.2 The Reconciliation Test​

6. COMMAND SUMMARY​