L06 — Scheduling & Scaling

Once pods exist, two questions: where should this pod run, and how many should I have? L06 covers both — the scheduling primitives (where Pods land) and the scaling family (how many Pods run, how much they get).

What you’ll understand after this level

The kube-scheduler flow: PreFilter → Filter → PreScore → Score → Reserve → Permit → PreBind → Bind
Taints and tolerations — keeping pods off (or onto) specific nodes
Node affinity / pod anti-affinity — schedule based on labels
Topology spread constraints — spread replicas across zones/nodes
PriorityClass and preemption — the only signal the scheduler uses to evict a lower-priority Pod
Scheduling gates — hold a Pod back from scheduling until an external signal
Resource requests vs limits — what each does, QoS classes, cgroups, the limits debate
HPA (horizontal scale replicas), VPA (vertical resize requests), Cluster Autoscaler + Karpenter (add nodes), KEDA (event-driven) — what each does and how they fit
PodDisruptionBudgets — keep services available during voluntary disruption
Restart policies — Always, OnFailure, Never and when each applies
Extended resources — GPUs, FPGAs, and the device plugin model

Notes in this level

Scheduling primitives

Note	Status	What’s in it
Scheduling	✅	Taints, tolerations, node/pod affinity, anti-affinity, topology spread, all the operator semantics
Priority & Preemption	✅	PriorityClass, preemption algorithm, system classes, the PD deadlocks, QoS vs priority
Scheduler Internals	✅	The plugin pipeline, every default plugin, profiles, framework extensions, perf tuning
Scheduling Gates	✅	Pod scheduling readiness, holding Pods back, the StatefulSet join pattern
Extended Resources	✅	GPUs, device plugins, time-slicing, MIG, DRA, ResourceClaim, the integer rule

Resources and constraints

Note	Status	What’s in it
Resource Requests & Limits	✅	CPU/memory/ephemeral-storage, CFS throttling, OOM-kill, QoS classes, cgroup v2, the limits debate
Restart Policy	✅	Always / OnFailure / Never, the backoff algorithm, CrashLoopBackOff, exit codes, Job/CronJob behavior

Scaling family

Note	Status	What’s in it
Scaling — overview	✅	The L06 hub: HPA / VPA / Karpenter / CA / KEDA at a glance, how they combine
HPA	✅	The autoscaling control loop, custom / external metrics, behavior settings, scaling math, the HPA controller
VPA	✅	VPA modes (Off / Initial / Auto), the recommender, VPA + HPA coexistence, the OOM pattern
Karpenter	✅	NodePools, EC2NodeClass, consolidation, spot, the modern alternative to Cluster Autoscaler
Cluster Autoscaler	✅	ASG / MIG / VMSS, scale-up and scale-down logic, the CA vs Karpenter decision
KEDA	✅	Event-driven autoscaling, 60+ scalers, scale to zero, the external metrics API
PodDisruptionBudget	✅	minAvailable / maxUnavailable, the eviction API, the HPA + PDB deadlock, unhealthyPodEvictionPolicy

Where to go next

→ L07 — Security: with workloads scheduled and scaled, decide who can do what to them.

cloudnative wiki

Explorer

L06 — Scheduling & Scaling

L06 — Scheduling & Scaling

What you’ll understand after this level

Notes in this level

Scheduling primitives

Resources and constraints

Scaling family

Suggested reading order

Scheduling path

Scaling path

Where to go next

Graph View

Table of Contents