K8s clusters are easy to over-spend on. The default behavior is to provision conservatively (lots of headroom, big nodes, no spot), and bills grow linearly with the number of services. The good news: a few well-placed levers can cut your bill 50-70% without changing the workload.
Where the money goes
Typical k8s cluster cost breakdown:
┌──────────────────────────────────────────────────────────┐
│ │
│ 60-80% Compute (EC2/EKS nodes, GKE nodes, AKS VMs) │
│ 10-20% Control plane (EKS, GKE, AKS) │
│ 5-15% Storage (EBS, persistent disks, object storage)│
│ 2-5% Egress (cross-AZ, cross-region, internet) │
│ 1-5% Load balancers (NLB, ALB) │
│ <1% Other (API calls, CloudWatch, etc.) │
│ │
└──────────────────────────────────────────────────────────┘
The bulk of the bill is compute. Storage is a distant second. Egress can sneak up on you if you have a lot of cross-AZ traffic.
The cost optimization levers
In order of impact, easiest first:
1. Right-size pod requests (5-30% savings)
Most clusters have over-provisioned pods. Common patterns:
- 1GB memory request for a pod that uses 80Mi
- 1 CPU request for a pod that uses 50m
- Memory requests copied from the limit, not the actual need
How to find: deploy VPA in Off mode, look at recommendations.
kubectl get vpa -A
# NAME REFERENCE MODE CPU MEM PROVIDED
# web-vpa Deployment/web Off 80m 180Mi 1 1Gi
# HPA says "use 80m CPU, 180Mi memory"
# but you set requests to 1 CPU, 1Gi
# that's 12x overprovisioned on CPU, 5x on memoryFix: update the Deployment’s resources.requests to match actual usage.
resources:
requests:
cpu: 100m # was 1
memory: 256Mi # was 1Gi
limits:
cpu: 500m # was 2
memory: 512Mi # was 2GiWhy it matters: the scheduler uses requests to bin-pack pods onto nodes. If requests are too high, fewer pods fit per node, you need more nodes, more cost.
2. Use spot / preemptible instances (60-90% on those nodes)
Spot instances are 60-90% cheaper than on-demand. The catch: they can be reclaimed with 30s-2min notice.
Where spot works:
- Stateless services that can be rescheduled quickly
- Batch jobs (with checkpointing)
- Dev/staging environments
- Worker nodes behind a service mesh (rescheduling is fast)
Where spot doesn’t work:
- Stateful workloads with PVCs (re-attaching a volume takes minutes)
- Latency-sensitive services that can’t tolerate brief capacity loss
- Gossip-based systems (Consul, etcd — except as a quorum with on-demand)
Karpenter + spot is the modern approach:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: spot
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
nodeClassRef:
name: default
limits:
cpu: "500"
disruption:
consolidationPolicy: WhenUnderutilized
# spot nodes can be reclaimed; Karpenter handles the disruptionFor mixed workloads, run critical pods on on-demand nodes, best-effort on spot. Use nodeSelectors, taints/tolerations, or topology spread.
3. Right-size cluster nodes (10-30%)
Many clusters use a single instance type (e.g., all m5.large). That bin-packs poorly — a pod needing 1.5GB memory won’t fit, even if the node has 1GB CPU free.
Karpenter solves this by right-sizing nodes per pod:
Pod asks for 1 CPU, 1Gi memory
↓
Karpenter picks the smallest instance that fits
↓
c6g.large (2 CPU, 4Gi) — 1 pod, 50% utilized
vs. CA / node groups:
All nodes are m5.large (2 CPU, 8Gi)
↓
Pod asks for 1 CPU, 1Gi
↓
Lands on m5.large, 50% utilized
↓
If 10 pods land here, you're at 100% memory — needs another node
↓
But if pods are smaller, the node is wasted
Instance family selection matters. Some instances are more cost-effective for certain workloads:
- Compute-optimized (c5, c6i, c7g) — CPU-bound, batch, video
- Memory-optimized (r5, r6i, x2) — caches, in-memory DBs
- General (m5, m6i) — web servers, APIs
- ARM (Graviton — c7g, m7g, r7g) — 20-40% cheaper per core, requires ARM-compatible images
4. Scale down off-hours (20-50% for non-prod)
Dev/staging clusters don’t need to run 24/7.
Pattern: scale to zero at night
# in CI / a CronJob
# scale all Deployments to 0 at 7pm
kubectl scale deploy --all -n dev --replicas=0
# scale back up at 8am
kubectl scale deploy --all -n dev --replicas=1Pattern: scale down cluster nodes (Karpenter can do this automatically)
disruption:
consolidationPolicy: WhenUnderutilized
expireAfter: 24hPattern: separate dev and prod clusters. Scale dev aggressively, leave prod stable.
5. Use committed-use / savings plans (20-40% on steady state)
AWS Compute Savings Plans, GCP Committed Use Discounts, Azure Reserved Instances. 1-year or 3-year commitments for 20-40% off on-demand prices.
Strategy: commit to your steady-state baseline. Use spot for burst.
# EKS example
# Baseline: 20 m5.large always running = $3,200/mo at on-demand
# 1-year Compute Savings Plan: $2,100/mo (35% off)
# Burst: 50 c5.xlarge spot for peak = $1,200/mo (vs $4,000 on-demand)6. Use cluster autoscaler / Karpenter aggressively (10-20%)
If your cluster doesn’t scale to zero or scale down aggressively, you’re paying for idle capacity.
Karpenter’s consolidationPolicy: WhenUnderutilized continuously rebalances pods to fewer, better-fit nodes. This is one of Karpenter’s biggest cost wins vs CA.
7. Storage cost (5-20%)
Persistent disks are billed by size and type. Optimization:
- Don’t over-provision PVCs. Many teams ask for 100Gi when they need 10Gi.
- Use the right storage class. gp3 is cheaper than io1. Standard disks are cheaper than SSD.
- Lifecycle old data. Move backups to S3 (or equivalent), not EBS.
- Use snapshots, not cloned volumes. Snapshots are incremental and cheap.
# bad: io2 with 1000 IOPS
storageClassName: io2
resources:
requests:
storage: 1Ti
# good: gp3 for most workloads
storageClassName: gp3
resources:
requests:
storage: 100Gi8. Egress cost (varies wildly)
Cloud egress is the silent killer. Each cloud’s pricing differs:
| Cloud | Egress cost |
|---|---|
| AWS | 0.01/GB cross-AZ, free in-region |
| GCP | 0.01-0.08/GB cross-region |
| Azure | 0.01/GB cross-AZ |
Common egress bombs:
-
Cross-AZ traffic. A pod in zone A talking to a Service that routes to a pod in zone B. Multiply by RPS, you can spend thousands.
# check cross-AZ traffic # (AWS: VPC Flow Logs; GCP: VPC Flow Logs; or your CNI's metrics) # if high, fix with topology-aware routing or affinityFix: pod topology spread constraints, or use a service mesh with locality-aware load balancing.
-
Cross-region replication. S3 cross-region replication, RDS cross-region replicas, etc. — billed by data transferred. Fix: aggressive lifecycle policies, only replicate what you need.
-
Internet egress from nodes. Pods talking to external APIs (github.com, registry.npmjs.org, etc.). Fix: NAT gateway is cheaper than per-pod NAT, but still expensive. Use VPC endpoints for AWS services (free).
9. NetworkPolicy for unused namespaces (1-5%)
Namespaces that aren’t actively used still get default policies and possibly DNS traffic. If a dev namespace is left running, pods there can still consume resources.
Pattern: TTL on dev namespaces
# using kubedb or a custom controller
apiVersion: dev.example.com/v1
kind: DevNamespace
metadata:
name: alice-experiment
spec:
ttl: 7d # auto-delete after 7 days10. Image optimization (5-10%)
Smaller images = faster pulls, less storage on nodes.
# 1.2 GB
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y python3 python3-pip
COPY . .
RUN pip install -r requirements.txt
CMD ["python3", "server.py"]
# 80 MB
FROM python:3.12-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python3", "server.py"]
# 30 MB
FROM python:3.12-alpine
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python3", "server.py"]
# 20 MB with multi-stage
FROM python:3.12-alpine AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt
COPY . .
FROM python:3.12-alpine
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
CMD ["python3", "server.py"]The 80/20 — what to do first
For a typical cluster, the biggest wins are:
- Right-size pod requests (VPA in
Offmode) — easy, 5-30% savings - Spot instances for non-prod and stateless workloads — 60-90% on those nodes
- Karpenter for node right-sizing — 10-30% on compute
- Committed-use / savings plans for steady state — 20-40% on the committed portion
- Scale down dev/staging off-hours — 20-50% on those clusters
Together: 50-70% reduction is realistic.
Tools
Kubecost (commercial, free tier)
The standard cost-monitoring tool for k8s. Shows:
- Per-namespace, per-Deployment, per-Label cost
- Right-sizing recommendations
- Spot vs on-demand breakdown
- Slack/Teams alerts on cost anomalies
# install
helm install kubecost cost-analyzer \
--repo https://kubecost.github.io/cost-analyzer \
--namespace kubecost --create-namespaceWeb UI at localhost:9090 (port-forward).
OpenCost (CNCF, open source)
CNCF version of kubecost. Less polished, but free and open.
# install
kubectl apply -f https://opencost.github.io/opencost-install.yamlPrometheus + Grafana integration.
Cloud-native tools
- AWS Cost Explorer — per-resource, per-tag cost
- GCP Billing Reports — BigQuery export for analysis
- Azure Cost Management — per-subscription, per-resource group
Pair these with k8s labels: tag every namespace, Deployment, and Service with team, project, environment. Then bills roll up by label.
Spot instance management
- AWS Spot Instance Advisor — interruption rates per instance type
- Spotinst / Spot.io (now AWS Spot) — managed spot orchestration
- Karpenter — built-in spot handling, no extra tool
Cost-aware cluster design
The “right-sized cluster” pattern
# baseline: 3 on-demand c6g.xlarge (steady state)
# burst: spot c6g.2xlarge, c6g.4xlarge, etc. (Karpenter-managed)
# all on ARM (Graviton) for 30% cost savingsThe “fleet of small clusters” pattern
Instead of one big shared cluster, run many small clusters:
- Per-environment: dev, staging, prod
- Per-region: us-east-1, eu-west-1
- Per-team: team-a-cluster, team-b-cluster
Pros: scale down to zero (dev), isolated blast radius, different node types per cluster. Cons: operational overhead, control plane cost per cluster ($73/mo each on EKS).
Rule of thumb: if the cluster runs 24/7 and serves production, one big cluster. If dev/staging, split it.
The “shared services” pattern
For multi-cluster setups, run shared services (logging, monitoring, ingress) in a dedicated cluster.
- Control plane cluster — Argo CD, Vault, monitoring, logging
- Workload clusters — your actual apps
Workload clusters can scale up/down without affecting the control plane.
Cost anomaly detection
Set up alerts for unexpected spikes:
# Prometheus alert
- alert: CostAnomaly
expr: |
sum(kube_pod_container_resource_requests{resource="cpu"})
> 1.5 * avg_over_time(sum(kube_pod_container_resource_requests{resource="cpu"})[7d])
for: 1h
annotations:
summary: "Cluster CPU requests 50% above 7-day average"Kubecost has built-in anomaly detection. Cloud-native tools (AWS Cost Anomaly Detection, GCP) catch billing anomalies at the account level.
Capacity planning
For predictable workloads, model the cost:
Workload X needs:
- 10 replicas
- 500m CPU each = 5 cores total
- 1Gi memory each = 10Gi total
- always-on (no scaling down)
- latency-sensitive (no spot)
Cost calculation:
- On-demand: 3 c6g.xlarge (4 CPU, 8Gi each) = 12 cores, 24Gi for 10 pods
= $0.076/hour * 3 * 730 = $166/mo
- 1-year savings plan: $110/mo
- 3-year savings plan: $85/mo
The “showback” vs “chargeback” question
Showback: tell teams what they’re spending. They get visibility, but no bill.
Chargeback: actually bill teams. Drives accountability but creates politics.
For most companies: showback first, chargeback later. Showback is a soft control; chargeback is a hard one. Once teams see their spend, they self-optimize.
# kubecost — per-namespace cost
kubectl get ns -o custom-columns=NAME:.metadata.name
# or via the kubecost UI:
# http://kubecost.monitoring:9090/allocationCost as a first-class SLO
Treat cost like any other SLO:
# Prometheus alerting rule
- alert: NamespaceCostAnomaly
expr: |
kubecost_namespace_cost
> 1.5 * avg_over_time(kubecost_namespace_cost[7d])
for: 1h
annotations:
summary: "Namespace {{ $labels.namespace }} spending 50% over 7-day average"
runbook: "https://wiki.example.com/runbooks/cost-anomaly"Tie cost anomalies to your SLO/SLA. The team that owns the namespace owns the bill.
The reserved vs on-demand vs spot decision tree
Q: Is the workload stateful with PVCs?
│
├── Yes → on-demand (or savings plan)
│ Spot risk = PVC re-attachment = data loss
│
└── No
│
Q: Is it latency-sensitive, <100ms p99?
│
├── Yes → on-demand (predictable performance)
│ Spot interruption = brief latency spike
│
└── No
│
Q: Can it tolerate 30s-2min interruption?
│
├── Yes → spot (60-90% savings)
│ Karpenter handles interruption
│
└── No → on-demand (or savings plan)
Cost optimization for specific workload types
Databases
- Right-size the instance type. Memory-optimized instances (r5, r7g) for in-memory DBs.
- Use Aurora / Cloud SQL / managed Postgres rather than self-managed.
- Tune the database itself. Slow queries waste CPU. Indexes, query plans.
- Connection pooling. PgBouncer, RDS Proxy. Avoids per-pod DB connection overhead.
- Read replicas for read-heavy workloads. Cheaper than scaling the primary.
Web servers
- ARM/Graviton for 30% savings (compatible workloads).
- Right-size requests (most common savings lever).
- CDN for static content. CloudFront, Cloudflare. Reduces egress and origin load.
- Caching layer (Redis, Memcached) for hot data.
Workers / batch
- Spot by default — workers are the spot-friendly workload.
- Pre-emptible VMs (GCP) or Spot VMs (Azure) for short jobs.
- Cluster autoscaler / Karpenter with mixed instance types.
- Run during off-hours if possible (scale to 0 at night).
ML/AI workloads
- Spot for training (interruptions OK, can resume from checkpoint).
- On-demand for inference (latency-sensitive).
- GPU sharing (MIG, time-slicing) for cost efficiency.
Common cost pitfalls
- Multi-AZ when you don’t need it. Multi-AZ adds 2x the data transfer cost for some setups.
- Cross-region replication for everything. Replicate only what you need to recover.
- Over-provisioned storage. Most teams ask for 100Gi when they need 10Gi.
- Snapshot sprawl. Snapshots pile up; lifecycle policies are essential.
- NAT gateway data processing. $0.045/GB processed by NAT. Use VPC endpoints for AWS services (free).
- Container image pull costs. With most clouds, pull is free, but egress from a registry to other regions isn’t.
- Logging everything at high verbosity. Ingest costs add up.
- Untagged resources. Cost allocation breaks. Tag everything.
- Long-lived dev clusters. Scale down or destroy.
- Egress to internet for AI/ML data. Move data with Snowball or DataSync.
The “show me the money” report
A monthly cost report should answer:
- What did we spend? Total, by team, by environment, by service.
- What changed? vs last month, with explanations.
- What’s the trend? Forecast for next quarter.
- What’s the saving opportunity? Items flagged by right-sizing, idle resources.
- Who owns the spend? Per-team accountability.
# kubecost has a built-in report
# /allocation?window=month&aggregation=namespace,team
# raw cost data
kubectl get ns -o json | jq '.items[].metadata.labels.team' | sort -u
# aws cost explorer
aws ce get-cost-and-usage \
--time-period Start=2024-01-01,End=2024-01-31 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=LINKED_ACCOUNTA good cost report drives action. A bad one is just a number no one looks at.
Multi-tenant cost allocation
Tag everything:
apiVersion: v1
kind: Namespace
metadata:
name: team-a
labels:
team: team-a
cost-center: engineering
project: web-app
environment: production# kubecost
apiVersion: kubecost.com/v1alpha1
kind: AllocationTags
metadata:
name: default
spec:
tags:
- team
- cost-center
- project
- environmentNow your bill rolls up by team / project / environment.
The “cheap” gotchas
- Spot savings evaporate if your HPA can’t reschedule fast. If pods take 2 minutes to reschedule and your spot gets reclaimed every 5 minutes, you’re constantly restarting.
- Right-sizing too aggressively leads to OOMKills and CPU throttling. Profile before shrinking.
- Committed-use discounts lock you in. If you commit for 3 years and your workload shrinks, you still pay.
- Karpenter consolidation is disruptive. Always set PodDisruptionBudgets.
- Dev clusters scaled to zero still cost something. Cluster itself (control plane), persistent disks, etc.
- Egress is hard to predict. One misconfigured service can balloon your bill overnight.
- Storage costs don’t shrink with auto-scaling. PVCs persist even when pods are gone.
A worked example
Cluster: 50 nodes, mostly m5.2xlarge (8 CPU, 32Gi). 200 namespaces, mixed dev/staging/prod.
Bills:
- Compute: 50 * 8 * 5,600/mo
- Storage: 30 PVCs at 100Gi gp3 = $360/mo
- Egress: 2TB cross-AZ at 20/mo
- LB: 5 NLB = $90/mo
- Total: $6,070/mo
Optimization:
- Right-size pods. 30% of pods are over-provisioned by 2x. New node count: 35.
- Move dev to spot. 20 of 35 nodes are dev/staging, can be spot. New dev cost: 2,240).
- Karpenter for node right-sizing. Average node utilization goes from 30% to 65%. New node count: 25.
- 1-year Compute Savings Plan for steady-state prod (10 nodes). New prod compute: $1,440/mo.
- Drop idle namespaces. 5 namespaces with 0 Deployments still have over-provisioned limit ranges. Drop.
New bills:
- Compute (prod): $1,440/mo
- Compute (dev, spot): $560/mo
- Storage: $360/mo
- LB: $90/mo
- Total: $2,450/mo
Savings: 60%.
See also
- auto-scaling — HPA, VPA, Karpenter, KEDA
- performance-tuning — right-sizing
- high-availability — cost vs reliability tradeoffs
- backup-restore — storage costs