Resource Requests and Limits

“https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/”

Each container in a Pod can declare how much CPU and memory it needs. Two numbers: requests (what the container is guaranteed, used by the scheduler) and limits (the maximum the container is allowed to use, enforced at runtime). These are the most important numbers you set in k8s — they drive scheduling, HPA, VPA, eviction, and QoS.

The Basics — Requests vs Limits
CPU in Detail
Memory in Detail
Ephemeral Storage
Huge Pages and Extended Resources
How Requests Drive Scheduling
How Limits Are Enforced at Runtime
QoS Classes in Depth
The Init Container Scheduling Footgun
LimitRange — Defaults and Constraints
Pod-Level Resources (k8s 1.34+)
The “No Limits” Debate in Depth
Cgroup v1 vs cgroup v2
In-Place Pod Resize (k8s 1.33+)
Operations and Debugging
Gotchas and Common Mistakes

1. The Basics — Requests vs Limits

spec:
  containers:
  - name: app
    image: app:1.0
    resources:
      requests:
        cpu: 100m          # 0.1 CPU guaranteed
        memory: 128Mi      # 128 MiB guaranteed
      limits:
        cpu: 500m          # 0.5 CPU max
        memory: 256Mi      # 256 MiB max

requests — what the container is guaranteed to get. Used by the scheduler for placement. Counts against ResourceQuota. Drives HPA’s Utilization calculation.
limits — the maximum the container is allowed to use. Enforced at runtime by the kernel (CFS for CPU, cgroup memory limit for memory). Exceeding a memory limit = OOM-kill. Exceeding a CPU limit = throttled, not killed.

Requests are the floor; limits are the ceiling. A container with requests.cpu: 100m, limits.cpu: 500m will get 100m guaranteed but can burst up to 500m if the node has spare cycles.

2. CPU in Detail

2.1 Units

1 = 1 CPU = 1 vCPU on a cloud node.
1000m = 1 CPU (the m is millicores).
500m = half a CPU.
100m = 1/10th of a CPU.

In fractional cores:

0.1 = 100m.
0.25 = 250m.
0.5 = 500m.

The decimal and milli forms are equivalent — 0.5 and 500m are the same.

2.2 CPU is compressible

When a container exceeds its CPU limit, the kernel’s CFS (Completely Fair Scheduler) throttles the container. The container can still run, just slower. CPU throttling is not fatal.

This is different from memory. CPU is a “compressible” resource — the kernel can give the container less of it without killing the process.

2.3 CPU in clouds

1 AWS vCPU = 1 k8s core = 1000m.
1 GCP core = 1 k8s core = 1000m.
1 Azure vCPU = 1 k8s core = 1000m.

(Older AWS instance types had a 2:1 vCPU-to-physical-core ratio, but modern types are 1:1.)

2.4 Burstable CPU

A container with requests.cpu: 100m, limits.cpu: 1 is Burstable. In practice:

The scheduler places it based on requests (100m).
The container can use up to limits (1 core) if the node has spare cycles.
If the node is busy, the container is throttled back to 100m.

CFS quota enforcement is per-cgroup. The container is assigned a quota of 100m * period per period (default 100ms). If the container uses more than that quota in a period, it’s throttled until the next period.

2.5 The CPU limit trap

CPU limits can hurt latency-sensitive apps. A Java app with limits.cpu: 500m may be throttled during GC pauses or bursts, increasing tail latency. The choices:

Remove the CPU limit (only requests). The app can use whatever’s free. The downside: a misbehaving app can starve neighbors.
Set limits.cpu == requests.cpu (Guaranteed). The app gets exactly what it asks for, no throttling. The downside: no bursting.
Set a high limits.cpu (e.g. requests: 100m, limits: 4). Some burst room, less throttling. The downside: misbehaving apps can spike.

Most production setups use the third option. Some latency-critical apps (search, ML serving) use option 1 or 2.

3. Memory in Detail

3.1 Units

Memory is in bytes by default. Common suffixes:

Ki = 1024 bytes (kibibyte).
Mi = 1024 Ki = 1,048,576 bytes (mebibyte).
Gi = 1024 Mi (gibibyte).
K = 1000 bytes (kilobyte). Rare in k8s.
M = 1000 KB (megabyte). Rare.

Use the binary suffixes (Mi, Gi) — they’re standard in k8s and what most tools expect.

3.2 Memory is incompressible

When a container exceeds its memory limit, the cgroup memory limit triggers and the kernel OOM-kills the container. The container exits with code 137.

Memory is incompressible — the kernel can’t give the container less memory than it needs. The only options are “give it what it asked for” or “kill it”.

3.3 Memory accounting subtleties

The cgroup memory limit accounts for everything in the cgroup:

RSS (Resident Set Size) — actual physical memory used.
Page cache — files the process has read but may not need again.
Stack, heap, anonymous mappings.
Kernel memory (network buffers, etc.).
TCP/UDP socket buffers.

This means a process that reads a lot of files can hit its memory limit even if it doesn’t think it’s using that much memory. The page cache is counted.

For a database that does heavy reads, this is a real issue. You may need to set the memory limit higher than the app’s RSS would suggest.

3.4 The OOM-kill cascade

A container that’s OOM-killed is restarted by the kubelet (assuming restartPolicy: Always). If the OOM is chronic, the container restarts, OOM-kills again, restarts, OOM-kills again. This is the CrashLoopBackOff.

A Pod in CrashLoopBackOff is Running (the kubelet considers it running), but the container keeps crashing. The Pod is alive, but useless. HPA doesn’t scale up a CrashLoopBackOff Pod — it sees the Pod as running.

3.5 The JVM-Xmx trap

JVMs allocate a large heap (-Xmx) regardless of actual usage. A JVM with -Xmx=2g will use 2 GB of cgroup memory (because the heap is reserved), even if the app only uses 200 MB.

If the container’s memory limit is 1 GB, the JVM will OOM-kill at startup (the heap reservation alone exceeds the limit).

Always set the JVM’s -Xmx to be less than the container’s memory limit (with headroom for non-heap usage, native libs, etc.).

# in the container command
java -Xmx768m -jar app.jar
# container limit: 1Gi
# 768m heap + ~200m non-heap + ~30m headroom = ~1Gi

3.6 The Go runtime trap

Go doesn’t reserve memory the way JVM does — it uses what it needs. A Go app with limits.memory: 512Mi will use 512 MB and OOM if it actually exceeds.

But Go’s runtime doesn’t return memory to the OS by default. The Go runtime holds onto memory even after the heap shrinks (the “GC tuning” problem). A Go app that uses 200 MB at peak will keep that 200 MB reserved.

For Go apps, requests.memory is hard to set right. Run VPA in recommend mode for Go apps.

4. Ephemeral Storage

ephemeral-storage is the third resource that the kubelet accounts for. It covers:

The container’s writable layer (any file the container writes that isn’t in a volume).
Logs stored at /var/log/containers.
emptyDir volumes in the container.

resources:
  requests:
    ephemeral-storage: 1Gi
  limits:
    ephemeral-storage: 2Gi

4.1 When it kicks in

The kubelet monitors ephemeral storage usage. If a container exceeds limits.ephemeral-storage, it’s evicted (similar to OOM-kill). The Pod is rescheduled.

This is most often a problem for:

Log-spilling apps that write huge log files.
Apps that write temp files in the writable layer.
Image-extraction — large container images take up space in the kubelet’s storage.

4.2 The node’s ephemeral storage

The kubelet reserves some ephemeral storage for its own use (image cache, logs, etc.). The node’s allocatable.ephemeral-storage is what the scheduler sees.

A 100 GB node may have allocatable.ephemable-storage: 90 GB (10 GB reserved). The scheduler’s bin-packing uses this.

4.3 The “node disk full” gotcha

If a node’s ephemeral storage fills up, the kubelet starts evicting Pods — even if the Pods have low ephemeral storage usage. The eviction is by total usage, not by per-container usage. A Pod that doesn’t write much can be evicted because the node is full.

This is a common issue with emptyDir (logs, caches) and writable layers (large images).

5. Huge Pages and Extended Resources

5.1 Huge pages

Huge pages are a kernel feature for large memory allocations. k8s supports them as a special resource:

resources:
  requests:
    hugepages-2Mi: 1Gi
  limits:
    hugepages-2Mi: 1Gi

The container gets 1 GiB of 2 MiB huge pages. Huge pages are isolated — they’re not pageable, not swappable, and not shared with the kernel’s page cache.

Huge pages require the kubelet to be configured with --hugepages-2Mi=4 (or similar) on nodes that have the pages. The node advertises the huge pages in allocatable.

5.2 Extended resources

GPU, FPGA, InfiniBand, etc. — opaque resources reported by device plugins. The Pod requests them by name:

resources:
  limits:
    nvidia.com/gpu: 1

See Extended Resources for the full treatment.

5.3 The integer rule

Extended resources are integers only. You can’t ask for “0.5 GPUs”. Time-slicing changes this (with time-slicing, the device plugin reports multiple “GPU” units, and the Pod asks for an integer count of those units).

6. How Requests Drive Scheduling

The scheduler’s NodeResourcesFit plugin filters nodes by request sum:

For each node N:
  used_cpu = sum(requests.cpu for all Pods on N)
  free_cpu = allocatable.cpu - used_cpu
  if Pod's requests.cpu > free_cpu: drop N

The same for memory. The filter is hard — a Pod that doesn’t fit is dropped.

6.1 The bin-packing math

For a 4-CPU node with allocatable.cpu: 3800m (200m reserved):

Pods on the node	requests.cpu	remaining
(none)	0m	3800m
Pod A	500m	3300m
Pod B	1000m	2300m
Pod C	2000m	300m
Pod D (requests 500m)	—	won’t fit, dropped

Pod D is dropped from this node. The scheduler moves to the next node.

6.2 Overcommitment

The scheduler uses requests for placement, not limits. A cluster can have sum(requests) > sum(capacity) if the apps are bursty.

Node: 4 cores
Pod A: requests 1, limits 4
Pod B: requests 1, limits 4
Pod C: requests 1, limits 4
Pod D: requests 1, limits 4
Pod E: requests 1, limits 4

sum(requests) = 5, but only 4 cores
sum(limits) = 20, way over

The scheduler places all 5 (each gets 1 core "guaranteed").
At runtime, the kernel throttles any that try to use more than 1.

This is common and intentional. Apps burst up to their limits, the kernel enforces. CPU usage averages out below sum(requests).

6.3 Memory is not overcommitted

Memory is incompressible. The scheduler enforces sum(requests) <= capacity. A node with 16 GB and 16 GB of requests is full — no more Pods can be scheduled.

If you have 10 GB of requests and 5 GB free, a Pod asking for 6 GB doesn’t fit. The scheduler moves to the next node.

7. How Limits Are Enforced at Runtime

7.1 CPU: CFS throttling

The kernel’s CFS (Completely Fair Scheduler) is configured for the container’s cgroup:

cpu.cfs_quota_us — the time the container can use per period.
cpu.cfs_period_us — the period (default 100ms).

The container’s “CPU usage” is the sum of its threads’ CPU time. If it exceeds the quota in a period, it’s throttled until the next period.

Period = 100ms
Quota = 50ms (for limits.cpu: 500m, requests.cpu: 100m, on a 1-core node)

Container uses 80ms of CPU in the first 100ms period.
→ Container is throttled for the remaining 20ms.
→ New period starts. Container can use 50ms.

Throttling is per-period. The container isn’t killed; it just runs slower.

7.2 The CFS throttling gotcha

CFS throttling can cause latency spikes in CPU-bound apps. A Java app doing GC may spike to 200% CPU for 50ms, get throttled, and have its GC pause extended. The user sees the GC pause as latency.

This is why some teams remove CPU limits entirely. The trade-off:

With limits: predictable resource use, possible throttling.
Without limits: possible starvation, no throttling.

7.3 Memory: cgroup OOM

The cgroup memory limit triggers a kernel OOM-kill when exceeded. The container is killed with SIGKILL (exit code 137).

The kernel’s OOM-kill is fast and final. The container’s process tree is killed; if there are child processes, they’re killed too (unless they escaped the cgroup).

7.4 The OOM-kill signals

When the cgroup OOM-kills a container, the kubelet sees the exit and restarts it (per restartPolicy). The Pod’s events show:

Last State:     Terminated
Reason:         OOMKilled
Exit Code:      137

kubectl describe pod shows this in the container’s status.

7.5 The `node-level OOM`

If a node’s total memory usage exceeds the node’s capacity, the node-level OOM triggers. The kernel picks a process to kill (usually the largest), which may be a kubelet, a system process, or a container.

Node-level OOMs are bad. The node may become unresponsive. The Pods are rescheduled on other nodes.

To avoid node-level OOM: set requests accurately. The scheduler places Pods such that sum(requests) <= node.allocatable. The actual usage can burst, but the system can usually handle it.

8. QoS Classes in Depth

Every Pod has a QoS class based on its resource spec. The class affects eviction order when a node is under memory pressure.

8.1 Guaranteed

Every container has requests == limits for both CPU and memory.
Both requests and limits are set (not just one).
requests > 0 for both CPU and memory.

Example:

containers:
- name: app
  resources:
    requests: { cpu: 100m, memory: 128Mi }
    limits:   { cpu: 100m, memory: 128Mi }

Guaranteed Pods are evicted last (after Burstable and BestEffort).

8.2 Burstable

At least one container has a request or limit set.
Not all containers have requests == limits.

Examples:

# Burstable: has requests but no limits
containers:
- resources:
    requests: { cpu: 100m, memory: 128Mi }
    # no limits

# Burstable: requests != limits
containers:
- resources:
    requests: { cpu: 100m, memory: 128Mi }
    limits:   { cpu: 500m, memory: 256Mi }

Burstable Pods are evicted second (after BestEffort, before Guaranteed).

8.3 BestEffort

No container has any request or limit set.

containers:
- name: app
  image: app:1.0
  # no resources

BestEffort Pods are evicted first when the node is under memory pressure.

8.4 The eviction logic

When a node runs out of memory:

The kernel starts reclaiming page cache.
If still under pressure, the kernel OOM-kills cgroups.
The kubelet’s oom_watcher sees the cgroup OOMs.
The kubelet evicts Pods in order: BestEffort, Burstable, Guaranteed.
Within a class, the Pod that requested the most memory is evicted first.

Eviction is in a specific order. QoS class is the dominant factor.

8.5 QoS and the scheduler

The scheduler does not consider QoS for placement. A Burstable Pod and a Guaranteed Pod are placed the same way (based on requests). The QoS class only matters at eviction time.

9. The Init Container Scheduling Footgun

Init containers run before the main container. The scheduler uses the highest request of any init container or the main container for placement.

spec:
  initContainers:
  - name: migrate
    image: migrate:1.0
    resources:
      requests:
        cpu: 500m
        memory: 1Gi
  containers:
  - name: app
    image: app:1.0
    resources:
      requests:
        cpu: 100m
        memory: 128Mi

This Pod is scheduled as if it needs 500m CPU and 1 GiB memory (the init container’s request, which is higher than the main container’s).

The init container only runs once at startup. The Pod’s runtime usage is the main container’s. But the scheduler uses the init’s request for placement. This is correct behavior — you need the resources to run the init — but it can lead to “wasted” scheduling slots if the init is a one-time heavy task.

9.1 The fix

If the init container is one-time heavy, consider:

Running the migration in a separate Job, not as an init container.
Using a smaller init container (do the heavy work in a separate step).
Or: accept the over-allocation. The node is “using” those resources during init, then they’re free during runtime.

10. LimitRange — Defaults and Constraints

LimitRange is the namespace-level mechanism for setting defaults and constraints on Pods and Containers.

apiVersion: v1
kind: LimitRange
metadata:
  name: defaults
  namespace: prod
spec:
  limits:
  - type: Container
    default:
      cpu: 500m
      memory: 512Mi
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    max:
      cpu: "2"
      memory: 4Gi
    min:
      cpu: 100m
      memory: 128Mi

default — applied if the Container doesn’t set limits.
defaultRequest — applied if the Container doesn’t set requests.
max — hard cap. Pods that exceed are rejected.
min — hard floor. Pods that don’t meet are rejected.

See ResourceQuota for the full treatment.

11. Pod-Level Resources (k8s 1.34+)

A newer feature (currently in alpha/beta as of 1.30) lets you set resources at the Pod level, not per-container:

apiVersion: v1
kind: Pod
spec:
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      cpu: 1
      memory: 2Gi
  containers:
  - name: app
    image: app:1.0

The Pod-level resources define the total across all containers. The per-container resources are still honored, but the Pod’s resources field is the overall cap.

This is useful for:

ResourceQuota constraints — a quota on requests.cpu checks the Pod’s total, not the per-container sum.
VPA recommendations — VPA in Pod mode recommends at the Pod level, not per-container.
Multi-container Pods — clear, total budgets.

As of 1.30, this is in beta. Adoption is early.

12. The “No Limits” Debate in Depth

Some teams run with no CPU limits. The argument:

Throttling hurts latency. A CPU limit causes throttling. For latency-sensitive apps (search, ML serving), the throttling is the bottleneck. Removing the limit removes the throttling.
The scheduler is enough. With requests set, the scheduler places Pods such that sum(requests) <= capacity. A misbehaving app can still spike, but the average case is fine.
CFS is conservative. The kernel’s CFS throttling is per-period, not per-second. A 100ms period with 50% quota means the container is throttled 50ms every 100ms. That’s noticeable.

The argument for limits:

Predictability. With limits, you know each container is capped.
Multi-tenancy safety. In a shared cluster, a misbehaving app can’t starve neighbors.
ResourceQuota safety. Quotas are about requests, not limits. But a runaway container can still OOM-kill a node.

The k8s official position is set both. The reality is more nuanced:

Latency-critical apps (search, ML serving) — no CPU limits, just requests. Burstable.
General services — set both. Burstable.
Critical services — set requests == limits. Guaranteed.
Background / batch — no limits or high limits. Burstable.

Tune with VPA in recommend mode to get data-driven values.

13. Cgroup v1 vs cgroup v2

k8s supports both cgroup v1 and cgroup v2. The difference matters for resource isolation.

13.1 cgroup v1 (legacy)

Separate cgroup hierarchies for CPU, memory, blkio, etc.
Resource limits set per-hierarchy.
More compatible with older kernels.
Less efficient — the kernel has to coordinate across hierarchies.

13.2 cgroup v2 (modern, k8s 1.25+ default)

Unified hierarchy. All controllers under one tree.
More efficient — the kernel can coordinate resource pressure.
Required for some features (e.g. PSI — Pressure Stall Information).
Default on new clusters since 1.25.

13.3 The migration

Most modern distros (kubeadm, EKS, GKE) use cgroup v2 by default. If you have an old cluster on cgroup v1, you can:

Set --cgroup-driver=cgroupfs (v1) or systemd (v2).
Migrate by draining nodes, switching the cgroup driver, rejoining.

The kubelet’s cgroup driver must match the container runtime’s. Docker uses cgroupfs, containerd uses systemd (typically). Mismatches cause Pods to fail.

13.4 cgroup v2 and resource isolation

cgroup v2 has better memory isolation — the kernel can throttle or kill a process based on memory pressure more accurately. cgroup v1 has more edge cases where a container can OOM the node.

14. In-Place Pod Resize (k8s 1.33+)

A newer feature (alpha in 1.33) lets you resize a Pod’s resources without restarting it:

# patch the Pod to change resources
kubectl patch pod <pod> -p '{"spec":{"containers":[{"name":"app","resources":{"requests":{"cpu":"500m"}}}]}}'

The kubelet updates the cgroup without restarting the container. The container doesn’t see the change (no SIGTERM, no restart), but the cgroup’s CPU/memory quota is updated.