Cost Anomaly Detection

Cost Anomaly Detection uses machine learning to detect unusual spend patterns in your AWS account. Unlike budgets (which alert you when you hit a threshold you’ve set), anomaly detection alerts you when something unexpected happens — a cost spike that wasn’t planned for.

It’s a free service available in Cost Explorer for Business and Enterprise support customers.

How It Works

AWS trains ML models on your historical spend patterns across:

Service-level spend (EC2, S3, RDS, etc.)
Linked account spend
Region-level spend
Time-of-day and day-of-week patterns

The model learns what “normal” looks like for your account. When spend deviates significantly from the predicted pattern, it generates an anomaly alert.

Key concept: The model understands seasonality, growth trends, and known recurring events (monthly billing cycles, product launches). A spike that follows a normal pattern doesn’t generate an alert. A spike that’s unexpected for your account’s pattern does.

Anomaly Alerts

When an anomaly is detected, you receive:

Anomaly alert: Which service/account/region has unusual spend
Root cause estimate: AWS’s ML estimate of what drove the spike
Cost impact: How much the anomaly added above normal

Alert example:

Anomaly Detected: Amazon EC2
Estimated monthly impact: $2,400 above normal
Detected at: 2024-06-15 14:00 UTC
Root cause estimate: New EC2 instances launched in us-east-1
Account(s) affected: 123456789012 (production)

Alert Subscriptions

Subscribe to anomaly alerts via:

Email: Simple alert to a specific address
SNS: Integrates with Slack, Teams, PagerDuty

Slack integration (via SNS → Lambda):

# Lambda triggered by SNS anomaly alert
# Formats the alert into a Slack message with:
# - Service and estimated impact
# - Account(s) affected
# - Root cause estimate
# - Link to Cost Explorer for investigation

ChatOps setup:

Cost Anomaly Detection → SNS Topic → Lambda → Slack #cost-alerts channel

Investigation Workflow

When an anomaly alert fires:

Check the alert details — which service, which account, estimated impact
Check recent changes — new deployments, new accounts, infrastructure changes
Look at Cost Explorer — drill into the specific service and account
Check CloudTrail — API calls that might indicate new resource creation
Identify the root cause — was it planned (product launch) or unplanned (misconfigured automation)?

Common causes:

New environment spun up and left running
Auto Scaling scaling up unexpectedly
Lambda function hitting a cold start storm
EBS volume snapshot accumulation
Data transfer spike from cross-region replication
Reserved Instance not covering new workload

Anomaly Detection vs Budgets

	Cost Anomaly Detection	Cost Budgets
Trigger	Unexpected spend spike	Planned threshold
Latency	Near real-time	24-48 hours
What it detects	Unexpected changes	Over-budget situations
Action	Alert only	Alert + optional automation

Use both: Anomaly detection catches surprises. Budgets catch gradual over-consumption that might not trigger an anomaly.

Coverage and Responsiveness

Anomaly detection monitors:

All services across all linked accounts
Spend by account, region, and service
Anomalies down to the linked account level

What it doesn’t detect:

Gradual cost growth (e.g., storage growing 5% per month as data accumulates)
Small anomalies below the sensitivity threshold
Anomalies in very new accounts (not enough history for ML model)

Sensitivity settings: You can adjust sensitivity (low/medium/high) to control how quickly alerts fire. Higher sensitivity means more alerts but also more false positives.

Cost Impact Estimation

The ML model estimates the monthly cost impact of the anomaly at the time of detection. This is an estimate — the actual impact might be higher or lower depending on whether the anomaly continues.

Why it matters: A $500 an o ma l y a l er t mi g h t a c t u a ll y b e a$ 5,000/month problem if you don’t catch it. Set up response procedures so that anomaly alerts get investigated same-day.

References

Homepage: https://aws.amazon.com/cost-management/aws-cost-anomaly-detection/
Documentation: https://docs.aws.amazon.com/cost-management/latest/userguide/getting-started_cost_anomaly_detection.html
Pricing: https://aws.amazon.com/cost-management/aws-cost-anomaly-detection/pricing/

Pricing Examples

Scenario 1: A devops team sets up Cost Anomaly Detection with a weekly alert to their cost-alerts Slack channel. In week 2, an alert fires for a $3, 200 u n e x p ec t e d c ha r g eo nan S 3 b u c k e t . I n v es t i g a t i o n re v e a l s an e w L amb d a f u n c t i o n w a s w r i t in g 50 GB / d a yo fd e b ug l o g s t o S 3. F i x in g t h e L amb d a l o gg in g re d u ces t h e bi ll b y$ 3,200/month.

Scenario 2: A company with multi-account setup enables Cost Anomaly Detection at the payer level with alert subscriptions to finance@company.com. The system detects a $12 K an o ma l y ina cco u n t - 123 t ha t i s$ 4K above expected spend. The root cause: an engineer ran a Glue job that scanned a 50TB DynamoDB table. They implement S3 prefix isolation and query-level cost controls, preventing a $48K monthly recurrence.

Nuggets & Gotchas

Cost Anomaly Detection uses ML and needs history: The model requires 14 days of baseline data before it can detect anomalies. New accounts or accounts with rapidly changing spend patterns may see delayed or inaccurate anomaly detection.
Alert subscriptions are per-detection threshold, not per-dollar-amount: You set a sensitivity level (1-10) and a threshold dollar amount. Alerts fire when both conditions are met — not just when spend exceeds the dollar amount.
Anomaly detection is free for the first 30 days: After that, AWS charges per anomaly detection per account per month. At large scale (hundreds of linked accounts), this adds up — evaluate whether the value justifies the cost.
The cost impact estimate is forward-looking but bounded: The model estimates the monthly cost if the anomaly continues at the current rate. If the anomaly stops after 3 days (e.g., a one-time large data transfer), the actual impact is much lower than the estimate.
Cost Anomaly Detection doesn’t block actions: It only alerts. You need CloudWatch + Lambda automation or manual investigation to remediate. An alert without a response procedure is just noise.

cloudnative wiki

Explorer

Cost Anomaly Detection

Cost Anomaly Detection

How It Works

Anomaly Alerts

Alert Subscriptions

Investigation Workflow

Anomaly Detection vs Budgets

Coverage and Responsiveness

Cost Impact Estimation

References

Pricing Examples

Nuggets & Gotchas

Graph View

Table of Contents

Backlinks