Skip to main content

Resource Management: Limits, Quotas & cgroups

In Kubernetes, managing compute resources is not just about avoiding node crashes; it is about predictable scheduling, preventing "noisy neighbors," and determining which Pods survive when a Node runs out of memory.


1. CORE CONCEPTS: REQUESTS & LIMITS

Every container in a Pod can specify two key parameters for CPU and Memory: Requests and Limits.

1.1 CPU and Memory Units

  • CPU (Compressible Resource):
    • Measured in cores or millicores (m).
    • 1 CPU = 1 AWS vCPU = 1 GCP Core = 1 Intel Hyperthread.
    • 1000m = 1 CPU. 250m = 1/4 of a CPU core.
  • Memory (Incompressible Resource):
    • Measured in bytes, usually expressed in Mebibytes (Mi) or Gibibytes (Gi).
    • Bible Rule: Always use Power-of-Two units (Mi, Gi). Do not use M or G (Power-of-Ten), as it causes calculation drift between what you request and what the Linux kernel allocates.

1.2 The "Request" (The Guarantee)

  • Who uses it? The Kube-Scheduler.
  • What does it do? It determines node placement. If a Pod requests 500m CPU, the Scheduler finds a Node with at least 500m of unallocated CPU capacity.
  • Linux Implementation: Translates to cpu.shares in cgroups. It guarantees that during CPU contention, the container gets exactly this ratio of CPU time.

1.3 The "Limit" (The Ceiling)

  • Who uses it? The Kubelet (via Linux cgroups) on the Node.
  • What does it do? It caps the maximum amount of resources the container can use.

1.4 What happens when you exceed them?

This is the most critical matrix in Kubernetes resource management:

ResourceExceeding RequestExceeding Limit
CPUAllowed (if node has idle CPU cycles).Throttled. The container is paused by the Linux Completely Fair Scheduler (CFS) bandwidth controller until the next period. It does not crash, but latency spikes.
MemoryAllowed (if node has free memory).OOMKilled. The Linux kernel instantly kills the primary process with Exit Code 137 (SIGKILL). Kubelet will restart it (CrashLoopBackOff).

2. QUALITY OF SERVICE (QoS) CLASSES

Kubernetes automatically assigns a QoS class to every Pod based on its Requests and Limits. This class dictates which Pod gets evicted first when a Node experiences Memory Pressure.

QoS ClassConditionEviction Priority
GuaranteedEvery container sets Requests EQUAL to Limits for both CPU and Memory.Lowest Priority (Safest). These are the last pods to be killed. Use for production databases/APIs.
BurstableRequests are LESS THAN Limits, or only some containers have them set.Medium Priority. Killed if the node runs out of memory and no BestEffort pods exist.
BestEffortNO Requests and NO Limits are set anywhere in the Pod.Highest Priority (First to die). The kernel's OOM killer will target these instantly.

Production Pod Manifest (Guaranteed QoS)

apiVersion: v1
kind: Pod
metadata:
name: payment-api
spec:
containers:
- name: app
image: enterprise/payment:v2
resources:
# Requests == Limits ensures "Guaranteed" QoS
requests:
memory: "1Gi"
cpu: "1000m"
limits:
memory: "1Gi"
cpu: "1000m"

3. LIMITRANGE (Namespace Defaulting)

The Problem: Developers forget to set Requests and Limits. If a developer deploys a BestEffort pod with a memory leak, it can crash the Node.

The Solution: A LimitRange automatically injects default CPU/Memory requests and limits into Pods created in a specific namespace. It can also enforce minimum and maximum sizes.

LimitRange Architecture

  • Scope: Namespace-level.
  • Enforcement Agent: Admission Controller.
  • Behavior: If a Pod is submitted without resources, the LimitRange mutates the YAML to inject the default values before saving to Etcd.

Production LimitRange Manifest

apiVersion: v1
kind: LimitRange
metadata:
name: standard-tier-limits
namespace: dev-team-alpha
spec:
limits:
- type: Container
# 1. Hard Boundaries (Admission will REJECT pods outside these bounds)
max:
cpu: "2"
memory: "4Gi"
min:
cpu: "100m"
memory: "128Mi"

# 2. Defaults (Injected if developer forgets to specify)
default:
cpu: "500m" # Default Limit
memory: "512Mi" # Default Limit
defaultRequest:
cpu: "250m" # Default Request
memory: "256Mi" # Default Request

# 3. Ratio Control (Prevents someone requesting 100m but setting limit to 10 CPU)
maxLimitRequestRatio:
cpu: "4"

4. RESOURCEQUOTA (Namespace Budgeting)

The Problem: A LimitRange ensures individual Pods are sized correctly, but what stops a developer from deploying 1,000 correctly-sized Pods and bankrupting your AWS account?

The Solution: A ResourceQuota puts a hard ceiling on the aggregate total of resources that can exist in a Namespace.

ResourceQuota Architecture

  • Scope: Namespace-level.
  • Enforcement Agent: Admission Controller.
  • Behavior: If a new Pod causes the Namespace to exceed its quota, the API Server rejects the Pod with an HTTP 403 Forbidden (Exceeded quota).

Production ResourceQuota Manifest

apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-team-alpha-budget
namespace: dev-team-alpha
spec:
hard:
# Compute Quotas
requests.cpu: "20" # Max 20 total CPU requested across all pods
requests.memory: "40Gi"
limits.cpu: "40"
limits.memory: "80Gi"

# Object Count Quotas
pods: "50" # Max 50 pods total
services.loadbalancers: "2" # Max 2 expensive cloud LoadBalancers
persistentvolumeclaims: "10"

Quota Verification & Debugging

To see how much of the budget a team is using:

kubectl describe quota dev-team-alpha-budget -n dev-team-alpha

Sample Output:

Name:            dev-team-alpha-budget
Namespace: dev-team-alpha
Resource Used Hard
-------- ---- ----
limits.cpu 12 40
limits.memory 24Gi 80Gi
pods 15 50
requests.cpu 6 20
requests.memory 12Gi 40Gi

5. MONITORING & TROUBLESHOOTING CHEATSHEET

5.1 Checking Live Usage (kubectl top)

The kubectl top command relies on the Metrics Server (which aggregates data from the Kubelet's cAdvisor). Note: This shows actual usage, not the requested amount.

# See which Nodes are hottest
kubectl top nodes

# See which Pods are consuming the most CPU
kubectl top pods --sort-by=cpu -A

# See which Containers inside a specific Pod are eating memory
kubectl top pod my-database --containers

5.2 Debugging "Pending" Pods (Insufficient CPU/Memory)

If a Pod is stuck in Pending, the Scheduler cannot find a node with enough unallocated Requests.

  1. Check Pod Events:
    kubectl describe pod <pod-name> | grep -A 5 Events:
    # Output: Warning FailedScheduling 0/5 nodes are available: 5 Insufficient cpu.
  2. Check Node Allocation:
    kubectl describe node worker-01 | grep -A 8 "Allocated resources:"
    Sample Output:
    Allocated resources:
    (Total limits may be over 100 percent, i.e., overcommitted.)
    Resource Requests Limits
    -------- -------- ------
    cpu 950m (95%) 2500m (250%)
    memory 6Gi (80%) 12Gi (150%)
    Notice in the output above: The Node is "Overcommitted" on Limits (250%), which is fine. But it is at 95% of Requests. If your new pod requests 100m CPU, it will not fit on this node.

5.3 Debugging OOMKills

If a Pod keeps restarting, check if the kernel killed it for memory usage:

kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.reason}'
# If it returns "OOMKilled", you MUST increase the memory Limit in the Deployment YAML.