L06 — Scheduling & Scaling

Once pods exist, two questions: where should this pod run, and how many should I have? L06 covers both — the scheduling primitives (where Pods land) and the scaling family (how many Pods run, how much they get).

What you’ll understand after this level

  • The kube-scheduler flow: PreFilter → Filter → PreScore → Score → Reserve → Permit → PreBind → Bind
  • Taints and tolerations — keeping pods off (or onto) specific nodes
  • Node affinity / pod anti-affinity — schedule based on labels
  • Topology spread constraints — spread replicas across zones/nodes
  • PriorityClass and preemption — the only signal the scheduler uses to evict a lower-priority Pod
  • Scheduling gates — hold a Pod back from scheduling until an external signal
  • Resource requests vs limits — what each does, QoS classes, cgroups, the limits debate
  • HPA (horizontal scale replicas), VPA (vertical resize requests), Cluster Autoscaler + Karpenter (add nodes), KEDA (event-driven) — what each does and how they fit
  • PodDisruptionBudgets — keep services available during voluntary disruption
  • Restart policiesAlways, OnFailure, Never and when each applies
  • Extended resources — GPUs, FPGAs, and the device plugin model

Notes in this level

Scheduling primitives

NoteStatusWhat’s in it
SchedulingTaints, tolerations, node/pod affinity, anti-affinity, topology spread, all the operator semantics
Priority & PreemptionPriorityClass, preemption algorithm, system classes, the PD deadlocks, QoS vs priority
Scheduler InternalsThe plugin pipeline, every default plugin, profiles, framework extensions, perf tuning
Scheduling GatesPod scheduling readiness, holding Pods back, the StatefulSet join pattern
Extended ResourcesGPUs, device plugins, time-slicing, MIG, DRA, ResourceClaim, the integer rule

Resources and constraints

NoteStatusWhat’s in it
Resource Requests & LimitsCPU/memory/ephemeral-storage, CFS throttling, OOM-kill, QoS classes, cgroup v2, the limits debate
Restart PolicyAlways / OnFailure / Never, the backoff algorithm, CrashLoopBackOff, exit codes, Job/CronJob behavior

Scaling family

NoteStatusWhat’s in it
Scaling — overviewThe L06 hub: HPA / VPA / Karpenter / CA / KEDA at a glance, how they combine
HPAThe autoscaling control loop, custom / external metrics, behavior settings, scaling math, the HPA controller
VPAVPA modes (Off / Initial / Auto), the recommender, VPA + HPA coexistence, the OOM pattern
KarpenterNodePools, EC2NodeClass, consolidation, spot, the modern alternative to Cluster Autoscaler
Cluster AutoscalerASG / MIG / VMSS, scale-up and scale-down logic, the CA vs Karpenter decision
KEDAEvent-driven autoscaling, 60+ scalers, scale to zero, the external metrics API
PodDisruptionBudgetminAvailable / maxUnavailable, the eviction API, the HPA + PDB deadlock, unhealthyPodEvictionPolicy

Suggested reading order

Scheduling path

  1. Resource Requests & Limits — the foundation; everything else assumes you have this
  2. Scheduling — the YAML-level primitives (taints, affinity, topology)
  3. Restart Policy — short, foundational
  4. Scheduler Internals — the framework that enforces the above
  5. Priority & Preemption — when scheduling fails, what happens
  6. Scheduling Gates — advanced: hold a Pod back from scheduling
  7. Extended Resources — GPUs and the device plugin model

Scaling path

  1. Scaling — overview — at-a-glance comparison of all the scaling primitives
  2. HPA — the workhorse
  3. VPA — the right-sizing complement
  4. PDB — read before you start draining nodes
  5. Cluster Autoscaler — the older node provisioner
  6. Karpenter — the modern alternative
  7. KEDA — event-driven, the right answer for queue-based workloads

Where to go next

L07 — Security: with workloads scheduled and scaled, decide who can do what to them.