Scaling — The L06 Overview

A high-level overview of the scaling family in Kubernetes — HPA, VPA, Cluster Autoscaler, Karpenter, and KEDA. This is the hub: the deeper notes are linked below. If you’re looking for the “which autoscaler do I use” decision, this is the place to start.

The three scaling dimensions

When load on a Deployment changes, you can scale in three orthogonal ways:

DimensionWhat scalesMechanismAffects existing Pods?
HorizontalNumber of replicas (Pods)Add / remove PodsNo (new Pods have new IPs)
VerticalCPU / memory per PodResize requests/limitsYes (Pods restart)
ClusterNumber of nodesAdd / remove nodesN/A

The four autoscalers:

  • HPA — Horizontal Pod Autoscaler
  • VPA — Vertical Pod Autoscaler
  • CA — Cluster Autoscaler (or Karpenter, the modern alternative)
  • KEDA — Kubernetes Event-Driven Autoscaling (drives HPA, the only one that natively scales to zero from external sources)

The one-table comparison

HPAVPACAKarpenterKEDA
What it scalesReplicas (Pods)Pod resource requests/limitsNodes (ASG / MIG / VMSS)Nodes (dynamic instance types)Drives HPA — scales replicas from external sources
Driven byCPU / memory / custom / externalHistorical usagePending PodsPending PodsKafka lag, queue depth, SQS, cron, Prometheus, 60+ others
Restarts PodsNo (new Pods)Yes (in Auto mode)NoNoNo (drives HPA)
Best forStateless HTTP servicesStateful, single-replicaStable, predictable workloadsHeterogeneous, fast-scalingEvent-driven, queue-based
Production-ready?YesBeta (since k8s 1.9)YesYes (v1 GA)Yes
Scales to zero?With minReplicas: 0NoNoNoYes (built-in)
Common pairingCA / KarpenterHPA on a different metricHPAHPAHPA (it IS HPA’s metrics source)

HPA — the workhorse → VPA — the right-sizing complement → Karpenter — the modern alternative to CA → Cluster Autoscaler — the older node provisioner → KEDA — event-driven scaling

How they combine

A typical production setup uses three of the four:

                ┌────────────────────────────────────────────────┐
                │                CLUSTER                          │
                │                                                 │
                │  ┌──────────────────────────────────────────┐   │
                │  │  Karpenter / Cluster Autoscaler         │   │
                │  │  Watches for Pending Pods, adds nodes   │   │
                │  └──────────────────────────────────────────┘   │
                │                       │ adds nodes               │
                │  ┌─────────┐  ┌─────────┐  ┌─────────┐         │
                │  │ kubelet │  │ kubelet │  │ kubelet │  ...     │
                │  └────┬────┘  └────┬────┘  └────┬────┘         │
                │       │            │            │              │
                │  ┌────▼────┐  ┌────▼────┐  ┌────▼────┐         │
                │  │ Pod x N │  │ Pod x N │  │ Pod x N │  ...     │
                │  │ (HPA    │  │ (HPA    │  │         │         │
                │  │  decides│  │         │  │         │         │
                │  │  N)     │  │         │  │         │         │
                │  └─────────┘  └─────────┘  └─────────┘         │
                │                                                 │
                │  VPA tunes requests (right-sizing)              │
                │  KEDA drives HPA from external sources          │
                │  PDB protects availability during drains        │
                └────────────────────────────────────────────────┘

The standard pattern

  • HPA on CPU for stateless services (HTTP APIs, workers).
  • VPA in recommend mode for right-sizing (you apply the recommendations manually).
  • Karpenter for node provisioning (replace CA on new clusters).
  • KEDA for event-driven workloads (Kafka, SQS, RabbitMQ).
  • PDB for availability during voluntary disruption.

What NOT to do

  • HPA + VPA on the same metric. They fight. Use one on CPU, the other on memory.
  • CA + Karpenter at the same time. They race for the same Pending Pods.
  • HPA + manual kubectl scale. Manual changes are overridden by HPA in seconds.
  • Tight PDBs with low replicas. minAvailable: 2 on a Deployment with 2 replicas is a deadlock.

When to use what (decision tree)

Stateless HTTP service, scale on CPU?
├── Yes → HPA on CPU + Karpenter
└── No, custom metric → HPA on custom + Karpenter
        (custom metric needs Prometheus Adapter or similar)

Stateless HTTP service, scale on event (Kafka, SQS, RabbitMQ)?
└── KEDA on the queue metric + Karpenter

Stateful service, hard to add replicas (single DB)?
├── VPA in Auto mode on memory + manual replica count
└── OR: VPA in Off mode, apply recommendations manually

Stateful service, can add replicas (Cassandra, Kafka)?
└── HPA on CPU + careful state management (PDB matters here)

Batch / Job workloads?
└── Right-size the Job spec; no autoscaler

Dev / test environments?
└── KEDA on cron + scale to zero on idle

Multi-cluster?
└── Each cluster's autoscaler; cross-cluster is harder

The “production checklist” for autoscaling

  • Set resource requests on every container. HPA needs them.
  • Pick one autoscaler per metric. Don’t have HPA and VPA both touching CPU.
  • Use a PDB for any service with > 1 replica. Without it, drains are dangerous.
  • Have a fallback for custom metrics. If Prometheus is down, HPA on custom can’t scale. Add HPA on CPU as a backup.
  • Monitor the autoscaler itself. Each autoscaler exposes Prometheus metrics — scrape them.
  • Test scale-up and scale-down in non-prod. A scale-up to 1000 Pods in 30s may break the apiserver, the CNI, the load balancer, etc. Tune the rate limits.
  • Document the autoscalers. The next operator will need to know which metric drives which Deployment.

Where to go next