Scaling — The L06 Overview
A high-level overview of the scaling family in Kubernetes — HPA, VPA, Cluster Autoscaler, Karpenter, and KEDA. This is the hub: the deeper notes are linked below. If you’re looking for the “which autoscaler do I use” decision, this is the place to start.
The three scaling dimensions
When load on a Deployment changes, you can scale in three orthogonal ways:
| Dimension | What scales | Mechanism | Affects existing Pods? |
|---|---|---|---|
| Horizontal | Number of replicas (Pods) | Add / remove Pods | No (new Pods have new IPs) |
| Vertical | CPU / memory per Pod | Resize requests/limits | Yes (Pods restart) |
| Cluster | Number of nodes | Add / remove nodes | N/A |
The four autoscalers:
- HPA — Horizontal Pod Autoscaler
- VPA — Vertical Pod Autoscaler
- CA — Cluster Autoscaler (or Karpenter, the modern alternative)
- KEDA — Kubernetes Event-Driven Autoscaling (drives HPA, the only one that natively scales to zero from external sources)
The one-table comparison
| HPA | VPA | CA | Karpenter | KEDA | |
|---|---|---|---|---|---|
| What it scales | Replicas (Pods) | Pod resource requests/limits | Nodes (ASG / MIG / VMSS) | Nodes (dynamic instance types) | Drives HPA — scales replicas from external sources |
| Driven by | CPU / memory / custom / external | Historical usage | Pending Pods | Pending Pods | Kafka lag, queue depth, SQS, cron, Prometheus, 60+ others |
| Restarts Pods | No (new Pods) | Yes (in Auto mode) | No | No | No (drives HPA) |
| Best for | Stateless HTTP services | Stateful, single-replica | Stable, predictable workloads | Heterogeneous, fast-scaling | Event-driven, queue-based |
| Production-ready? | Yes | Beta (since k8s 1.9) | Yes | Yes (v1 GA) | Yes |
| Scales to zero? | With minReplicas: 0 | No | No | No | Yes (built-in) |
| Common pairing | CA / Karpenter | HPA on a different metric | HPA | HPA | HPA (it IS HPA’s metrics source) |
→ HPA — the workhorse → VPA — the right-sizing complement → Karpenter — the modern alternative to CA → Cluster Autoscaler — the older node provisioner → KEDA — event-driven scaling
How they combine
A typical production setup uses three of the four:
┌────────────────────────────────────────────────┐
│ CLUSTER │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Karpenter / Cluster Autoscaler │ │
│ │ Watches for Pending Pods, adds nodes │ │
│ └──────────────────────────────────────────┘ │
│ │ adds nodes │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ kubelet │ │ kubelet │ │ kubelet │ ... │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ │
│ │ Pod x N │ │ Pod x N │ │ Pod x N │ ... │
│ │ (HPA │ │ (HPA │ │ │ │
│ │ decides│ │ │ │ │ │
│ │ N) │ │ │ │ │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ VPA tunes requests (right-sizing) │
│ KEDA drives HPA from external sources │
│ PDB protects availability during drains │
└────────────────────────────────────────────────┘
The standard pattern
- HPA on CPU for stateless services (HTTP APIs, workers).
- VPA in
recommendmode for right-sizing (you apply the recommendations manually). - Karpenter for node provisioning (replace CA on new clusters).
- KEDA for event-driven workloads (Kafka, SQS, RabbitMQ).
- PDB for availability during voluntary disruption.
What NOT to do
- HPA + VPA on the same metric. They fight. Use one on CPU, the other on memory.
- CA + Karpenter at the same time. They race for the same Pending Pods.
- HPA + manual
kubectl scale. Manual changes are overridden by HPA in seconds. - Tight PDBs with low replicas.
minAvailable: 2on a Deployment with 2 replicas is a deadlock.
When to use what (decision tree)
Stateless HTTP service, scale on CPU?
├── Yes → HPA on CPU + Karpenter
└── No, custom metric → HPA on custom + Karpenter
(custom metric needs Prometheus Adapter or similar)
Stateless HTTP service, scale on event (Kafka, SQS, RabbitMQ)?
└── KEDA on the queue metric + Karpenter
Stateful service, hard to add replicas (single DB)?
├── VPA in Auto mode on memory + manual replica count
└── OR: VPA in Off mode, apply recommendations manually
Stateful service, can add replicas (Cassandra, Kafka)?
└── HPA on CPU + careful state management (PDB matters here)
Batch / Job workloads?
└── Right-size the Job spec; no autoscaler
Dev / test environments?
└── KEDA on cron + scale to zero on idle
Multi-cluster?
└── Each cluster's autoscaler; cross-cluster is harder
The “production checklist” for autoscaling
- Set resource requests on every container. HPA needs them.
- Pick one autoscaler per metric. Don’t have HPA and VPA both touching CPU.
- Use a PDB for any service with > 1 replica. Without it, drains are dangerous.
- Have a fallback for custom metrics. If Prometheus is down, HPA on custom can’t scale. Add HPA on CPU as a backup.
- Monitor the autoscaler itself. Each autoscaler exposes Prometheus metrics — scrape them.
- Test scale-up and scale-down in non-prod. A scale-up to 1000 Pods in 30s may break the apiserver, the CNI, the load balancer, etc. Tune the rate limits.
- Document the autoscalers. The next operator will need to know which metric drives which Deployment.
Where to go next
- HPA — the full deep dive
- VPA — the vertical counterpart
- Karpenter — node provisioning
- Cluster Autoscaler — the older node provisioner
- KEDA — event-driven scaling
- PDB — availability during disruption
- Resource Requests & Limits — the foundation