Project Lab 05: Resource Governance & Elastic Scaling

In a shared cluster, "noisy neighbors" can starve other applications of CPU and Memory. Without proper health probes, a failing application might still receive traffic, leading to 500 errors. This lab focuses on building a self-healing, self-scaling, and governed environment.

Reference Material:

docs/04-resource-mgmt-probes/1-resource-limits-quotas.md
docs/04-resource-mgmt-probes/2-hpa-vpa.md
docs/04-resource-mgmt-probes/4-health-probes.md

1. OBJECTIVE: THE STABLE COMMERCE PLATFORM

The goal is to configure the e-commerce namespace so that:

No team can deploy a Pod without resource limits.
The entire namespace is capped to prevent cloud-bill spikes.
The application scales horizontally when CPU exceeds 50%.
The application survives a "heavy" startup phase (e.g., cache loading).

2. PHASE 1: NAMESPACE GOVERNANCE

Before deploying the app, we must set the "Rules of Engagement" for the namespace.

2.1 Create Namespace and ResourceQuota

The Quota acts as a hard ceiling for the aggregate resources in the namespace.

# 01-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: checkout-dept-quota
  namespace: e-commerce
spec:
  hard:
    requests.cpu: "2"
    requests.memory: "2Gi"
    limits.cpu: "4"
    limits.memory: "4Gi"
    pods: "10" # Max 10 pods allowed in this namespace

2.2 Create LimitRange

The LimitRange ensures that every container has a default size if the developer forgets to specify one.

# 02-limitrange.yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: checkout-defaults
  namespace: e-commerce
spec:
  limits:
  - default:
      cpu: 500m
      memory: 512Mi
    defaultRequest:
      cpu: 250m
      memory: 256Mi
    type: Container

3. PHASE 2: DEPLOYING THE RESILIENT WORKLOAD

We will deploy the checkout-api. It is designed to be "Slow to start" (takes 30s to initialize) and "CPU intensive" under load.

3.1 The Deployment Manifest (`checkout-deploy.yaml`)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: checkout-api
  namespace: e-commerce
spec:
  replicas: 2
  selector:
    matchLabels:
      app: checkout
  template:
    metadata:
      labels:
        app: checkout
    spec:
      containers:
      - name: api
        image: k8s.gcr.io/hpa-example # A lightweight image that allows CPU stress testing
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "200m" # Baseline for HPA calculation
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
        
        # 1. STARTUP PROBE: Handles the 30s initialization
        startupProbe:
          httpGet:
            path: /
            port: 80
          failureThreshold: 30
          periodSeconds: 1 # Total 30s wait time
          
        # 2. READINESS PROBE: Ensures traffic only hits healthy pods
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
          
        # 3. LIVENESS PROBE: Restarts if process deadlocks
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 10
          periodSeconds: 20

4. PHASE 3: ELASTIC SCALING (HPA)

We want the checkout service to scale out to handle surges in traffic.

4.1 Define the HPA

kubectl autoscale deployment checkout-api \
  --cpu-percent=50 \
  --min=2 \
  --max=8 \
  -n e-commerce

The Math: If current average CPU utilization exceeds 100m (50% of the 200m request), HPA will trigger a scale-out.

5. PHASE 4: THE LOAD TEST (VALIDATION)

5.1 Monitor the Scaling

Open two terminals.

# Terminal 1: Watch HPA
kubectl get hpa checkout-api -n e-commerce -w

# Terminal 2: Watch Pods
kubectl get pods -n e-commerce -w

5.2 Trigger the Surge

Run a "Generator" pod to bombard the service with requests.

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -n e-commerce -- /bin/sh -c "while true; do wget -q -O- http://checkout-api; done"

Expected Observation:

Terminal 1 will show CPU increasing (e.g., 250% / 50%).
Terminal 2 will show new checkout-api-xxxx pods moving from Pending to Running.
Pods will stay in 0/1 Running for 30 seconds (Startup Probe) before moving to 1/1 Running (Readiness Probe).

6. TROUBLESHOOTING & NINJA COMMANDS

6.1 Audit Quota Usage

If the HPA fails to create pods, check if you hit the Namespace Quota.

kubectl describe quota checkout-dept-quota -n e-commerce

Observation: If Used equals Hard for pods (10/10), the HPA will be blocked from scaling further.

6.2 Check Component Resource Usage

# Verify Metrics Server is working
kubectl top pods -n e-commerce

6.3 Identify Probe Failures

If a pod keeps restarting, find out which probe failed:

kubectl describe pod <pod-name> -n e-commerce | grep -i "probe failed"

7. ARCHITECT'S KEY TAKEAWAYS

Requests are for Scheduling: The HPA uses the request value as the denominator for percentage calculations.
Startup Probes save Liveness: Without a Startup probe, a slow-starting app might be killed by the Liveness probe before it ever finishes booting.
Namespace Isolation: Quotas are the only way to prevent one team's auto-scaling from consuming the entire cluster's budget.
Limits prevent Crashes: Memory limits are hard (OOMKill); CPU limits are soft (Throttling). Always provide both for production stability.

1. OBJECTIVE: THE STABLE COMMERCE PLATFORM​

2. PHASE 1: NAMESPACE GOVERNANCE​

2.1 Create Namespace and ResourceQuota​

2.2 Create LimitRange​

3. PHASE 2: DEPLOYING THE RESILIENT WORKLOAD​

3.1 The Deployment Manifest (checkout-deploy.yaml)​

4. PHASE 3: ELASTIC SCALING (HPA)​

4.1 Define the HPA​

5. PHASE 4: THE LOAD TEST (VALIDATION)​

5.1 Monitor the Scaling​

5.2 Trigger the Surge​

6. TROUBLESHOOTING & NINJA COMMANDS​

6.1 Audit Quota Usage​

6.2 Check Component Resource Usage​

6.3 Identify Probe Failures​

7. ARCHITECT'S KEY TAKEAWAYS​