Cost Optimization

K8s clusters are easy to over-spend on. The default behavior is to provision conservatively (lots of headroom, big nodes, no spot), and bills grow linearly with the number of services. The good news: a few well-placed levers can cut your bill 50-70% without changing the workload.

Where the money goes

Typical k8s cluster cost breakdown:
┌──────────────────────────────────────────────────────────┐
│                                                          │
│  60-80%   Compute (EC2/EKS nodes, GKE nodes, AKS VMs)    │
│  10-20%   Control plane (EKS, GKE, AKS)                  │
│   5-15%   Storage (EBS, persistent disks, object storage)│
│   2-5%    Egress (cross-AZ, cross-region, internet)      │
│   1-5%    Load balancers (NLB, ALB)                      │
│   <1%     Other (API calls, CloudWatch, etc.)            │
│                                                          │
└──────────────────────────────────────────────────────────┘

The bulk of the bill is compute. Storage is a distant second. Egress can sneak up on you if you have a lot of cross-AZ traffic.

The cost optimization levers

In order of impact, easiest first:

1. Right-size pod requests (5-30% savings)

Most clusters have over-provisioned pods. Common patterns:

1GB memory request for a pod that uses 80Mi
1 CPU request for a pod that uses 50m
Memory requests copied from the limit, not the actual need

How to find: deploy VPA in Off mode, look at recommendations.

kubectl get vpa -A
# NAME      REFERENCE         MODE   CPU    MEM    PROVIDED
# web-vpa   Deployment/web    Off    80m    180Mi  1       1Gi
# HPA says "use 80m CPU, 180Mi memory"
# but you set requests to 1 CPU, 1Gi
# that's 12x overprovisioned on CPU, 5x on memory

Fix: update the Deployment’s resources.requests to match actual usage.

resources:
  requests:
    cpu: 100m       # was 1
    memory: 256Mi   # was 1Gi
  limits:
    cpu: 500m       # was 2
    memory: 512Mi   # was 2Gi

Why it matters: the scheduler uses requests to bin-pack pods onto nodes. If requests are too high, fewer pods fit per node, you need more nodes, more cost.

2. Use spot / preemptible instances (60-90% on those nodes)

Spot instances are 60-90% cheaper than on-demand. The catch: they can be reclaimed with 30s-2min notice.

Where spot works:

Stateless services that can be rescheduled quickly
Batch jobs (with checkpointing)
Dev/staging environments
Worker nodes behind a service mesh (rescheduling is fast)

Where spot doesn’t work:

Stateful workloads with PVCs (re-attaching a volume takes minutes)
Latency-sensitive services that can’t tolerate brief capacity loss
Gossip-based systems (Consul, etcd — except as a quorum with on-demand)

Karpenter + spot is the modern approach:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot
spec:
  template:
    spec:
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot"]
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values: ["c", "m", "r"]
      nodeClassRef:
        name: default
  limits:
    cpu: "500"
  disruption:
    consolidationPolicy: WhenUnderutilized
    # spot nodes can be reclaimed; Karpenter handles the disruption

For mixed workloads, run critical pods on on-demand nodes, best-effort on spot. Use nodeSelectors, taints/tolerations, or topology spread.

3. Right-size cluster nodes (10-30%)

Many clusters use a single instance type (e.g., all m5.large). That bin-packs poorly — a pod needing 1.5GB memory won’t fit, even if the node has 1GB CPU free.

Karpenter solves this by right-sizing nodes per pod:

Pod asks for 1 CPU, 1Gi memory
    ↓
Karpenter picks the smallest instance that fits
    ↓
c6g.large (2 CPU, 4Gi) — 1 pod, 50% utilized

vs. CA / node groups:

All nodes are m5.large (2 CPU, 8Gi)
    ↓
Pod asks for 1 CPU, 1Gi
    ↓
Lands on m5.large, 50% utilized
    ↓
If 10 pods land here, you're at 100% memory — needs another node
    ↓
But if pods are smaller, the node is wasted

Instance family selection matters. Some instances are more cost-effective for certain workloads:

Compute-optimized (c5, c6i, c7g) — CPU-bound, batch, video
Memory-optimized (r5, r6i, x2) — caches, in-memory DBs
General (m5, m6i) — web servers, APIs
ARM (Graviton — c7g, m7g, r7g) — 20-40% cheaper per core, requires ARM-compatible images

4. Scale down off-hours (20-50% for non-prod)

Dev/staging clusters don’t need to run 24/7.

Pattern: scale to zero at night

# in CI / a CronJob
# scale all Deployments to 0 at 7pm
kubectl scale deploy --all -n dev --replicas=0
# scale back up at 8am
kubectl scale deploy --all -n dev --replicas=1

Pattern: scale down cluster nodes (Karpenter can do this automatically)

disruption:
  consolidationPolicy: WhenUnderutilized
  expireAfter: 24h

Pattern: separate dev and prod clusters. Scale dev aggressively, leave prod stable.

5. Use committed-use / savings plans (20-40% on steady state)

AWS Compute Savings Plans, GCP Committed Use Discounts, Azure Reserved Instances. 1-year or 3-year commitments for 20-40% off on-demand prices.

Strategy: commit to your steady-state baseline. Use spot for burst.

# EKS example
# Baseline: 20 m5.large always running = $3,200/mo at on-demand
# 1-year Compute Savings Plan: $2,100/mo (35% off)
# Burst: 50 c5.xlarge spot for peak = $1,200/mo (vs $4,000 on-demand)

6. Use cluster autoscaler / Karpenter aggressively (10-20%)

If your cluster doesn’t scale to zero or scale down aggressively, you’re paying for idle capacity.

Karpenter’s consolidationPolicy: WhenUnderutilized continuously rebalances pods to fewer, better-fit nodes. This is one of Karpenter’s biggest cost wins vs CA.

7. Storage cost (5-20%)

Persistent disks are billed by size and type. Optimization:

Don’t over-provision PVCs. Many teams ask for 100Gi when they need 10Gi.
Use the right storage class. gp3 is cheaper than io1. Standard disks are cheaper than SSD.
Lifecycle old data. Move backups to S3 (or equivalent), not EBS.
Use snapshots, not cloned volumes. Snapshots are incremental and cheap.

# bad: io2 with 1000 IOPS
storageClassName: io2
resources:
  requests:
    storage: 1Ti
 
# good: gp3 for most workloads
storageClassName: gp3
resources:
  requests:
    storage: 100Gi

8. Egress cost (varies wildly)

Cloud egress is the silent killer. Each cloud’s pricing differs:

Cloud	Egress cost
AWS	$0.09/ GBt o in t er n e t,$ 0.01/GB cross-AZ, free in-region
GCP	$0.12/ GBt o in t er n e t,$ 0.01-0.08/GB cross-region
Azure	$0.087/ GBt o in t er n e t,$ 0.01/GB cross-AZ

Common egress bombs:

Cross-AZ traffic. A pod in zone A talking to a Service that routes to a pod in zone B. Multiply by RPS, you can spend thousands.
```
# check cross-AZ traffic
# (AWS: VPC Flow Logs; GCP: VPC Flow Logs; or your CNI's metrics)
# if high, fix with topology-aware routing or affinity
```
Fix: pod topology spread constraints, or use a service mesh with locality-aware load balancing.
Cross-region replication. S3 cross-region replication, RDS cross-region replicas, etc. — billed by data transferred. Fix: aggressive lifecycle policies, only replicate what you need.
Internet egress from nodes. Pods talking to external APIs (github.com, registry.npmjs.org, etc.). Fix: NAT gateway is cheaper than per-pod NAT, but still expensive. Use VPC endpoints for AWS services (free).

9. NetworkPolicy for unused namespaces (1-5%)

Namespaces that aren’t actively used still get default policies and possibly DNS traffic. If a dev namespace is left running, pods there can still consume resources.

Pattern: TTL on dev namespaces

# using kubedb or a custom controller
apiVersion: dev.example.com/v1
kind: DevNamespace
metadata:
  name: alice-experiment
spec:
  ttl: 7d  # auto-delete after 7 days

10. Image optimization (5-10%)

Smaller images = faster pulls, less storage on nodes.

# 1.2 GB
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y python3 python3-pip
COPY . .
RUN pip install -r requirements.txt
CMD ["python3", "server.py"]
 
# 80 MB
FROM python:3.12-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python3", "server.py"]
 
# 30 MB
FROM python:3.12-alpine
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python3", "server.py"]
 
# 20 MB with multi-stage
FROM python:3.12-alpine AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt
COPY . .
 
FROM python:3.12-alpine
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
CMD ["python3", "server.py"]

The 80/20 — what to do first

For a typical cluster, the biggest wins are:

Right-size pod requests (VPA in Off mode) — easy, 5-30% savings
Spot instances for non-prod and stateless workloads — 60-90% on those nodes
Karpenter for node right-sizing — 10-30% on compute
Committed-use / savings plans for steady state — 20-40% on the committed portion
Scale down dev/staging off-hours — 20-50% on those clusters

Together: 50-70% reduction is realistic.

Tools

Kubecost (commercial, free tier)

The standard cost-monitoring tool for k8s. Shows:

Per-namespace, per-Deployment, per-Label cost
Right-sizing recommendations
Spot vs on-demand breakdown
Slack/Teams alerts on cost anomalies

# install
helm install kubecost cost-analyzer \
  --repo https://kubecost.github.io/cost-analyzer \
  --namespace kubecost --create-namespace

Web UI at localhost:9090 (port-forward).

OpenCost (CNCF, open source)

CNCF version of kubecost. Less polished, but free and open.

# install
kubectl apply -f https://opencost.github.io/opencost-install.yaml

Prometheus + Grafana integration.

Cloud-native tools

AWS Cost Explorer — per-resource, per-tag cost
GCP Billing Reports — BigQuery export for analysis
Azure Cost Management — per-subscription, per-resource group

Pair these with k8s labels: tag every namespace, Deployment, and Service with team, project, environment. Then bills roll up by label.

Spot instance management

AWS Spot Instance Advisor — interruption rates per instance type
Spotinst / Spot.io (now AWS Spot) — managed spot orchestration
Karpenter — built-in spot handling, no extra tool

Cost-aware cluster design

The “right-sized cluster” pattern

# baseline: 3 on-demand c6g.xlarge (steady state)
# burst: spot c6g.2xlarge, c6g.4xlarge, etc. (Karpenter-managed)
# all on ARM (Graviton) for 30% cost savings

The “fleet of small clusters” pattern

Instead of one big shared cluster, run many small clusters:

Per-environment: dev, staging, prod
Per-region: us-east-1, eu-west-1
Per-team: team-a-cluster, team-b-cluster

Pros: scale down to zero (dev), isolated blast radius, different node types per cluster. Cons: operational overhead, control plane cost per cluster ($73/mo each on EKS).

Rule of thumb: if the cluster runs 24/7 and serves production, one big cluster. If dev/staging, split it.

The “shared services” pattern

For multi-cluster setups, run shared services (logging, monitoring, ingress) in a dedicated cluster.

Control plane cluster — Argo CD, Vault, monitoring, logging
Workload clusters — your actual apps

Workload clusters can scale up/down without affecting the control plane.

Cost anomaly detection

Set up alerts for unexpected spikes:

# Prometheus alert
- alert: CostAnomaly
  expr: |
    sum(kube_pod_container_resource_requests{resource="cpu"}) 
    > 1.5 * avg_over_time(sum(kube_pod_container_resource_requests{resource="cpu"})[7d])
  for: 1h
  annotations:
    summary: "Cluster CPU requests 50% above 7-day average"

Kubecost has built-in anomaly detection. Cloud-native tools (AWS Cost Anomaly Detection, GCP) catch billing anomalies at the account level.

Capacity planning

For predictable workloads, model the cost:

Workload X needs:
  - 10 replicas
  - 500m CPU each = 5 cores total
  - 1Gi memory each = 10Gi total
  - always-on (no scaling down)
  - latency-sensitive (no spot)

Cost calculation:
  - On-demand: 3 c6g.xlarge (4 CPU, 8Gi each) = 12 cores, 24Gi for 10 pods
    = $0.076/hour * 3 * 730 = $166/mo
  - 1-year savings plan: $110/mo
  - 3-year savings plan: $85/mo

The “showback” vs “chargeback” question

Showback: tell teams what they’re spending. They get visibility, but no bill.

Chargeback: actually bill teams. Drives accountability but creates politics.

For most companies: showback first, chargeback later. Showback is a soft control; chargeback is a hard one. Once teams see their spend, they self-optimize.

# kubecost — per-namespace cost
kubectl get ns -o custom-columns=NAME:.metadata.name
# or via the kubecost UI:
# http://kubecost.monitoring:9090/allocation

Cost as a first-class SLO

Treat cost like any other SLO:

# Prometheus alerting rule
- alert: NamespaceCostAnomaly
  expr: |
    kubecost_namespace_cost
    > 1.5 * avg_over_time(kubecost_namespace_cost[7d])
  for: 1h
  annotations:
    summary: "Namespace {{ $labels.namespace }} spending 50% over 7-day average"
    runbook: "https://wiki.example.com/runbooks/cost-anomaly"

Tie cost anomalies to your SLO/SLA. The team that owns the namespace owns the bill.

The reserved vs on-demand vs spot decision tree

Q: Is the workload stateful with PVCs?
│
├── Yes  →  on-demand (or savings plan)
│          Spot risk = PVC re-attachment = data loss
│
└── No
    │
    Q: Is it latency-sensitive, <100ms p99?
    │
    ├── Yes  →  on-demand (predictable performance)
    │          Spot interruption = brief latency spike
    │
    └── No
        │
        Q: Can it tolerate 30s-2min interruption?
        │
        ├── Yes  →  spot (60-90% savings)
        │          Karpenter handles interruption
        │
        └── No   →  on-demand (or savings plan)