Resource Management: Limits, Quotas & cgroups

In Kubernetes, managing compute resources is not just about avoiding node crashes; it is about predictable scheduling, preventing "noisy neighbors," and determining which Pods survive when a Node runs out of memory.

1. CORE CONCEPTS: REQUESTS & LIMITS

Every container in a Pod can specify two key parameters for CPU and Memory: Requests and Limits.

1.1 CPU and Memory Units

CPU (Compressible Resource):
- Measured in cores or millicores (m).
- 1 CPU = 1 AWS vCPU = 1 GCP Core = 1 Intel Hyperthread.
- 1000m = 1 CPU. 250m = 1/4 of a CPU core.
Memory (Incompressible Resource):
- Measured in bytes, usually expressed in Mebibytes (Mi) or Gibibytes (Gi).
- Bible Rule: Always use Power-of-Two units (Mi, Gi). Do not use M or G (Power-of-Ten), as it causes calculation drift between what you request and what the Linux kernel allocates.

1.2 The "Request" (The Guarantee)

Who uses it? The Kube-Scheduler.
What does it do? It determines node placement. If a Pod requests 500m CPU, the Scheduler finds a Node with at least 500m of unallocated CPU capacity.
Linux Implementation: Translates to cpu.shares in cgroups. It guarantees that during CPU contention, the container gets exactly this ratio of CPU time.

1.3 The "Limit" (The Ceiling)

Who uses it? The Kubelet (via Linux cgroups) on the Node.
What does it do? It caps the maximum amount of resources the container can use.

1.4 What happens when you exceed them?

This is the most critical matrix in Kubernetes resource management:

Resource	Exceeding Request	Exceeding Limit
CPU	Allowed (if node has idle CPU cycles).	Throttled. The container is paused by the Linux Completely Fair Scheduler (CFS) bandwidth controller until the next period. It does not crash, but latency spikes.
Memory	Allowed (if node has free memory).	OOMKilled. The Linux kernel instantly kills the primary process with `Exit Code 137` (SIGKILL). Kubelet will restart it (CrashLoopBackOff).

2. QUALITY OF SERVICE (QoS) CLASSES

Kubernetes automatically assigns a QoS class to every Pod based on its Requests and Limits. This class dictates which Pod gets evicted first when a Node experiences Memory Pressure.

QoS Class	Condition	Eviction Priority
Guaranteed	Every container sets Requests EQUAL to Limits for both CPU and Memory.	Lowest Priority (Safest). These are the last pods to be killed. Use for production databases/APIs.
Burstable	Requests are LESS THAN Limits, or only some containers have them set.	Medium Priority. Killed if the node runs out of memory and no BestEffort pods exist.
BestEffort	NO Requests and NO Limits are set anywhere in the Pod.	Highest Priority (First to die). The kernel's OOM killer will target these instantly.

Production Pod Manifest (Guaranteed QoS)

apiVersion: v1
kind: Pod
metadata:
  name: payment-api
spec:
  containers:
  - name: app
    image: enterprise/payment:v2
    resources:
      # Requests == Limits ensures "Guaranteed" QoS
      requests:
        memory: "1Gi"
        cpu: "1000m"
      limits:
        memory: "1Gi"
        cpu: "1000m"

3. LIMITRANGE (Namespace Defaulting)

The Problem: Developers forget to set Requests and Limits. If a developer deploys a BestEffort pod with a memory leak, it can crash the Node.

The Solution: A LimitRange automatically injects default CPU/Memory requests and limits into Pods created in a specific namespace. It can also enforce minimum and maximum sizes.

LimitRange Architecture

Scope: Namespace-level.
Enforcement Agent: Admission Controller.
Behavior: If a Pod is submitted without resources, the LimitRange mutates the YAML to inject the default values before saving to Etcd.

Production LimitRange Manifest

apiVersion: v1
kind: LimitRange
metadata:
  name: standard-tier-limits
  namespace: dev-team-alpha
spec:
  limits:
  - type: Container
    # 1. Hard Boundaries (Admission will REJECT pods outside these bounds)
    max:
      cpu: "2"
      memory: "4Gi"
    min:
      cpu: "100m"
      memory: "128Mi"
    
    # 2. Defaults (Injected if developer forgets to specify)
    default:
      cpu: "500m"      # Default Limit
      memory: "512Mi"  # Default Limit
    defaultRequest:
      cpu: "250m"      # Default Request
      memory: "256Mi"  # Default Request
      
    # 3. Ratio Control (Prevents someone requesting 100m but setting limit to 10 CPU)
    maxLimitRequestRatio:
      cpu: "4"

4. RESOURCEQUOTA (Namespace Budgeting)

The Problem: A LimitRange ensures individual Pods are sized correctly, but what stops a developer from deploying 1,000 correctly-sized Pods and bankrupting your AWS account?

The Solution: A ResourceQuota puts a hard ceiling on the aggregate total of resources that can exist in a Namespace.

ResourceQuota Architecture

Scope: Namespace-level.
Enforcement Agent: Admission Controller.
Behavior: If a new Pod causes the Namespace to exceed its quota, the API Server rejects the Pod with an HTTP 403 Forbidden (Exceeded quota).

Production ResourceQuota Manifest

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-team-alpha-budget
  namespace: dev-team-alpha
spec:
  hard:
    # Compute Quotas
    requests.cpu: "20"        # Max 20 total CPU requested across all pods
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
    
    # Object Count Quotas
    pods: "50"                # Max 50 pods total
    services.loadbalancers: "2" # Max 2 expensive cloud LoadBalancers
    persistentvolumeclaims: "10"

Quota Verification & Debugging

To see how much of the budget a team is using:

kubectl describe quota dev-team-alpha-budget -n dev-team-alpha

Sample Output:

Name:            dev-team-alpha-budget
Namespace:       dev-team-alpha
Resource         Used  Hard
--------         ----  ----
limits.cpu       12    40
limits.memory    24Gi  80Gi
pods             15    50
requests.cpu     6     20
requests.memory  12Gi  40Gi

5. MONITORING & TROUBLESHOOTING CHEATSHEET

5.1 Checking Live Usage (`kubectl top`)

The kubectl top command relies on the Metrics Server (which aggregates data from the Kubelet's cAdvisor). Note: This shows actual usage, not the requested amount.

# See which Nodes are hottest
kubectl top nodes

# See which Pods are consuming the most CPU
kubectl top pods --sort-by=cpu -A

# See which Containers inside a specific Pod are eating memory
kubectl top pod my-database --containers

5.2 Debugging "Pending" Pods (Insufficient CPU/Memory)

If a Pod is stuck in Pending, the Scheduler cannot find a node with enough unallocated Requests.

Check Pod Events:

kubectl describe pod <pod-name> | grep -A 5 Events:
# Output: Warning  FailedScheduling  0/5 nodes are available: 5 Insufficient cpu.

Check Node Allocation:

kubectl describe node worker-01 | grep -A 8 "Allocated resources:"

Sample Output:

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                950m (95%)   2500m (250%)
  memory             6Gi (80%)    12Gi (150%)

Notice in the output above: The Node is "Overcommitted" on Limits (250%), which is fine. But it is at 95% of Requests. If your new pod requests 100m CPU, it will not fit on this node.

5.3 Debugging OOMKills

If a Pod keeps restarting, check if the kernel killed it for memory usage:

kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.reason}'
# If it returns "OOMKilled", you MUST increase the memory Limit in the Deployment YAML.

1. CORE CONCEPTS: REQUESTS & LIMITS​

1.1 CPU and Memory Units​

1.2 The "Request" (The Guarantee)​

1.3 The "Limit" (The Ceiling)​

1.4 What happens when you exceed them?​

2. QUALITY OF SERVICE (QoS) CLASSES​

Production Pod Manifest (Guaranteed QoS)​

3. LIMITRANGE (Namespace Defaulting)​

LimitRange Architecture​

Production LimitRange Manifest​

4. RESOURCEQUOTA (Namespace Budgeting)​

ResourceQuota Architecture​

Production ResourceQuota Manifest​

Quota Verification & Debugging​

5. MONITORING & TROUBLESHOOTING CHEATSHEET​

5.1 Checking Live Usage (kubectl top)​

5.2 Debugging "Pending" Pods (Insufficient CPU/Memory)​

5.3 Debugging OOMKills​