Skip to main content

Project Lab 05: Resource Governance & Elastic Scaling

In a shared cluster, "noisy neighbors" can starve other applications of CPU and Memory. Without proper health probes, a failing application might still receive traffic, leading to 500 errors. This lab focuses on building a self-healing, self-scaling, and governed environment.

Reference Material:

  • docs/04-resource-mgmt-probes/1-resource-limits-quotas.md
  • docs/04-resource-mgmt-probes/2-hpa-vpa.md
  • docs/04-resource-mgmt-probes/4-health-probes.md

1. OBJECTIVE: THE STABLE COMMERCE PLATFORM

The goal is to configure the e-commerce namespace so that:

  1. No team can deploy a Pod without resource limits.
  2. The entire namespace is capped to prevent cloud-bill spikes.
  3. The application scales horizontally when CPU exceeds 50%.
  4. The application survives a "heavy" startup phase (e.g., cache loading).

2. PHASE 1: NAMESPACE GOVERNANCE

Before deploying the app, we must set the "Rules of Engagement" for the namespace.

2.1 Create Namespace and ResourceQuota

The Quota acts as a hard ceiling for the aggregate resources in the namespace.

# 01-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: checkout-dept-quota
namespace: e-commerce
spec:
hard:
requests.cpu: "2"
requests.memory: "2Gi"
limits.cpu: "4"
limits.memory: "4Gi"
pods: "10" # Max 10 pods allowed in this namespace

2.2 Create LimitRange

The LimitRange ensures that every container has a default size if the developer forgets to specify one.

# 02-limitrange.yaml
apiVersion: v1
kind: LimitRange
metadata:
name: checkout-defaults
namespace: e-commerce
spec:
limits:
- default:
cpu: 500m
memory: 512Mi
defaultRequest:
cpu: 250m
memory: 256Mi
type: Container

3. PHASE 2: DEPLOYING THE RESILIENT WORKLOAD

We will deploy the checkout-api. It is designed to be "Slow to start" (takes 30s to initialize) and "CPU intensive" under load.

3.1 The Deployment Manifest (checkout-deploy.yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
name: checkout-api
namespace: e-commerce
spec:
replicas: 2
selector:
matchLabels:
app: checkout
template:
metadata:
labels:
app: checkout
spec:
containers:
- name: api
image: k8s.gcr.io/hpa-example # A lightweight image that allows CPU stress testing
ports:
- containerPort: 80
resources:
requests:
cpu: "200m" # Baseline for HPA calculation
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"

# 1. STARTUP PROBE: Handles the 30s initialization
startupProbe:
httpGet:
path: /
port: 80
failureThreshold: 30
periodSeconds: 1 # Total 30s wait time

# 2. READINESS PROBE: Ensures traffic only hits healthy pods
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5

# 3. LIVENESS PROBE: Restarts if process deadlocks
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 10
periodSeconds: 20

4. PHASE 3: ELASTIC SCALING (HPA)

We want the checkout service to scale out to handle surges in traffic.

4.1 Define the HPA

kubectl autoscale deployment checkout-api \
--cpu-percent=50 \
--min=2 \
--max=8 \
-n e-commerce

The Math: If current average CPU utilization exceeds 100m (50% of the 200m request), HPA will trigger a scale-out.


5. PHASE 4: THE LOAD TEST (VALIDATION)

5.1 Monitor the Scaling

Open two terminals.

# Terminal 1: Watch HPA
kubectl get hpa checkout-api -n e-commerce -w

# Terminal 2: Watch Pods
kubectl get pods -n e-commerce -w

5.2 Trigger the Surge

Run a "Generator" pod to bombard the service with requests.

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -n e-commerce -- /bin/sh -c "while true; do wget -q -O- http://checkout-api; done"

Expected Observation:

  1. Terminal 1 will show CPU increasing (e.g., 250% / 50%).
  2. Terminal 2 will show new checkout-api-xxxx pods moving from Pending to Running.
  3. Pods will stay in 0/1 Running for 30 seconds (Startup Probe) before moving to 1/1 Running (Readiness Probe).

6. TROUBLESHOOTING & NINJA COMMANDS

6.1 Audit Quota Usage

If the HPA fails to create pods, check if you hit the Namespace Quota.

kubectl describe quota checkout-dept-quota -n e-commerce

Observation: If Used equals Hard for pods (10/10), the HPA will be blocked from scaling further.

6.2 Check Component Resource Usage

# Verify Metrics Server is working
kubectl top pods -n e-commerce

6.3 Identify Probe Failures

If a pod keeps restarting, find out which probe failed:

kubectl describe pod <pod-name> -n e-commerce | grep -i "probe failed"

7. ARCHITECT'S KEY TAKEAWAYS

  1. Requests are for Scheduling: The HPA uses the request value as the denominator for percentage calculations.
  2. Startup Probes save Liveness: Without a Startup probe, a slow-starting app might be killed by the Liveness probe before it ever finishes booting.
  3. Namespace Isolation: Quotas are the only way to prevent one team's auto-scaling from consuming the entire cluster's budget.
  4. Limits prevent Crashes: Memory limits are hard (OOMKill); CPU limits are soft (Throttling). Always provide both for production stability.