Capacity Planning

Capacity planning answers: “Do we have enough resources to handle the expected load — now, and in the future?” It’s the bridge between business growth projections and infrastructure investment.

The Core Formula

For any resource, capacity is determined by:

Capacity = (Resource Amount) / (Resource Consumption per Unit of Work)

Example:

Fargate task: 1 vCPU, 2GB RAM
Avg request:50ms CPU, 128MB RAM working set
Max concurrent requests (CPU): 1000ms / 50ms = 20 concurrent per task
Max concurrent requests (Memory): 2048MB / 128MB = 16 concurrent per task

Bottleneck: Memory → 16 concurrent requests per task
If target: 1,000 concurrent users → need ceil(1000/16) = 63 tasks

The bottleneck is the resource that runs out first. Optimize the bottleneck, not everything.

Resource Dimensions

Compute (CPU)

Utilization target: 60-70% sustained (headroom for spikes)
Scaling trigger: > 70% sustained for 5+ minutes
Measurement: CPU steal (for cloud VMs), CPU credits (for burstable instances)

Memory

Working set — memory actively used (not total RSS)
OOM events — system kills process when memory exhausted
GC pressure — in managed languages (Java, Go), GC pauses increase with memory utilization

Storage

IOPS vs throughput — IOPS (random ops) vs throughput (sequential MB/s)
Disk queues — requests waiting for disk (indicator of saturation)
SSD vs HDD — SSD for random IOPS workloads, HDD for high-throughput sequential

Network

Bandwidth —饱和 at high fan-out (many services calling many others)
Connections — TCP connection limits (especially for connection-pooled protocols)
DNS query rate — often overlooked (many services resolve on every request)

Database Connections

Often the hidden bottleneck:

App servers: 50 instances × 100 connections each = 5,000 connections
DB max connections: 1,000
→ Gap: 4,000 connections short at peak

Solution: Connection pooling (PgBouncer, HikariCP) to multiplex many app connections onto fewer DB connections.

Forecasting

Linear Extrapolation

For predictable growth:

Current:10,000 RPS,30% CPU
Growth: 20% per quarter
Next quarter: 12,000 RPS
At 12,000 RPS, 30% CPU → CPU at 36% (still OK)
At 20,000 RPS, 30% CPU → CPU at 60% (need to scale)
So: scale before next quarter

Growth Curves

Not all growth is linear. Distinguish:

Linear — steady growth, predictable capacity needs
Step function — product launches, marketing campaigns cause sudden jumps
Exponential — viral growth, network effects (hardest to plan for)
Seasonal — daily peaks, monthly billing cycles, holiday spikes

Capacity Planning Process

1. Current state assessment
   → Measure actual resource consumption per component
   → Identify current bottlenecks

2. Growth projection
   → Business forecast (user growth, transaction growth)
   → Historical growth rate
   → Planned product changes (new features = new load patterns)

3. Headroom calculation
   → Current capacity × 1.3 (30% headroom minimum)
   → Factor in known upcoming events (product launch, peak season)

4. Gap analysis
   → Required capacity - Current capacity = Gap
   → Time to gap = timeline for procurement/deployment

5. Procurement and deployment
   → Lead time for new resources
   → Provision and test before you need them

Back-of-the-Envelope Calculations

Quick estimates for common scenarios:

API Server Capacity

Target: 10,000 RPS
Avg response time: 100ms
Concurrent requests at steady state: 10,000 × 0.1 = 1,000 concurrent
Each server handles: 500 concurrent (CPU-bound, not I/O bound)
Servers needed: ceil(1000/500) = 2 (use 4 for HA + headroom)

Database Capacity

Target: 5,000 writes/sec, 50,000 reads/sec
Write-heavy (Postgres):
  - Each write uses ~1ms CPU time
  - 8-core DB → ~8,000 writes/sec max (per instance)
  - Need:1 writer (can parallelize reads with replicas)

Read replicas:
  - Each replica handles ~2,000 reads/sec
  - Need: ceil(50,000/2,000) = 25 read replicas
  - Replication lag: ~100ms (acceptable for non-financial)

Cache Capacity

Working set: 10 million items
Avg item size: 2KB
Total working set: 20GB
Redis: 25GB allocated (some overhead, fragmentation)
Need: at least 25GB memory for working set

Cost Modeling

Every capacity decision has a cost dimension:

Cost Per User

Monthly infra cost: $50,000
Active users: 100,000
Cost per user per month: $0.50
Cost per user per year: $6.00
LTV: $500 → cost is 1.2% of LTV (healthy)

Cost Scaling Patterns

Scaling approach	Cost curve	Notes
Vertical (bigger instance)	Step function	Pay for idle capacity
Horizontal (more small instances)	Linear	Pay for what you use
Serverless (Lambda, Cloud Run)	Pay-per-use	Good for variable load
Reserved instances	30-60% savings	Commitment required

Right-Sizing

Most cloud workloads are over-provisioned by 2-4x. Regularly review:

Actual CPU utilization — if averaging 20%, you’re paying for 80% idle
Right-sizing recommendations — AWS Compute Optimizer, Azure Advisor
Scaling down — reduce instance sizes as load is characterized

Capacity and Performance Interaction

Capacity planning and performance are linked:

More capacity → lower latency (less queuing)
Better performance → more capacity (same hardware serves more)
Performance optimization → defer capacity purchase (cheaper than scaling)

The order of preference:

Optimize first — faster code, better caching, lower latency
Scale horizontally — add more machines
Scale vertically — bigger machines (last resort)

Monitoring for Capacity

Key signals that predict capacity exhaustion:

Signal	Threshold	Action
CPU > 70% sustained	Warning	Plan scale-up
CPU > 85%	Critical	Scale immediately
Memory > 80%	Warning	Investigate memory leak
Disk queue > 10	Warning	IO bottleneck
DB connections > 80% max	Warning	Connection pool or scale
P99 latency increasing	Any increase	Capacity constrained
Queue depth growing	Warning	Consumer lag

Common Capacity Planning Mistakes

No growth buffer — capacity plan for today, not tomorrow
Ignoring the data layer — scale app servers, forget the DB is the bottleneck
No connection pooling — finite DB connections, not scaled with app servers
Not measuring utilization — decisions based on gut, not data
Over-provisioning “to be safe” — wasted cost
Under-provisioning “to save money” — performance crises, emergency scaling
No cost monitoring — don’t discover over-provisioning in the bill

Back-of-the-Envelope Calculations — quick estimates
Scalability — scaling patterns
Performance — latency and throughput
Reliability — capacity for resilience