AWS Monitoring

AWS monitoring is built around CloudWatch — a centralized service for metrics, logs, alarms, and dashboards. CloudWatch provides visibility into how your applications and infrastructure are performing.

Service Map

ServiceWhat It DoesWhen to Use
MetricsTime-series data for AWS and custom resourcesEvery service emits metrics — query with GetMetricData
LogsCentralized log storage and managementEvery application should stream logs here
AlarmsAlerting based on metric thresholdsAlert when latency spikes, error rate rises
DashboardsCustom metric visualizationBuild custom views for business/SRE metrics
EventsEvent-driven automation via rulesReact to AWS API events, schedules
InsightsLog query language for CloudWatch LogsDebug production issues, search logs

Three Pillars of Observability

Metrics (What happened?)
  → CloudWatch Metrics (numerical time-series data)
  → How many requests? What's p99 latency?

Logs (Why did it happen?)
  → CloudWatch Logs (application and infrastructure logs)
  → What error occurred? What was the request trace?

Traces (How did it happen?)
  → AWS X-Ray (distributed tracing, not in this section)
  → Request flow through services

Core Concepts

Namespaces

Metrics are organized by namespace — each AWS service has its own namespace:

AWS/EC2          → CPUUtilization, NetworkIn, DiskWriteBytes
AWS/RDS          → DatabaseConnections, CPUUtilization, FreeStorageSpace
AWS/Lambda       → Invocations, Duration, Errors
MyApp/Production → Custom metrics you define

Dimensions

A dimension is a name/value pair that uniquely identifies a metric. Common dimensions:

InstanceId          → CPU for a specific EC2 instance
InstanceType        → CPU across all instances of a type
ServiceName         → Latency for a specific microservice
AvailabilityZone    → NetworkIn per AZ

Resolution

ResolutionRetention
Basic (5 min)15 days
High (1 min)15 days
Detailed (1 sec)3 hours

Architecture: Unified Monitoring Stack

Application (EC2/ECS/Lambda)
  │  (emit metrics via StatsD / CloudWatch Agent)
  ↓
CloudWatch Metrics (custom + basic monitoring)
  │
  ├→ CloudWatch Alarms → SNS → Email/PagerDuty
  │
  ├→ CloudWatch Dashboards → SRE / Business view
  │
  └→ CloudWatch Contributor Insights → Top contributors

Application Logs
  │  (via CloudWatch Agent / SDK)
  ↓
CloudWatch Logs
  │
  ├→ CloudWatch Logs Insights (query)
  ├→ Subscription Filters → Lambda / Kinesis
  └→ Cross-Account Logs → Logs bucket (S3)

AWS Services Organized by Category

CloudWatch Core

  • Metrics — Time-series data, GetMetricData API, custom metrics
  • Logs — Log groups, streams, CloudWatch Agent, retention
  • Alarms — Metric alarms, composite alarms, anomaly detection
  • Dashboards — Custom widgets, live charts, cross-service
  • Events — CloudWatch Events rules (legacy EventBridge API)
  • Insights — Log query language, saved queries, dashboards

Cross-Account Observability

CloudWatch Application Insights can automatically discover and monitor applications across accounts in AWS Organizations.

Management Account
  └── CloudWatch Cross-Account Dashboards
        → Aggregates metrics from member accounts
        → Single pane of glass for all accounts

Member Account
  └── CloudWatch Metrics
        └── Shared via CloudWatch Dashboard sharing

References

Nuggets & Gotchas

  • CloudWatch Metrics have a 1-second resolution limit — for sub-second metrics, use custom SDK: AWS services emit metrics at 1-minute (basic) or 1-second (detailed) resolution. For sub-second granularity, you must use the PutMetricData API with higher resolution timestamps.
  • CloudWatch Logs is priced per GB ingested + per GB stored — logging everything is expensive: At 0.03/GB/month for storage, a high-traffic application generating 1GB/day of logs costs 0.90/month = $15.90/month. Use subscription filters to selectively route logs to S3 or Lambda for cheaper storage.
  • Metric math with GetMetricData is cheaper than multiple GetMetricStatistics calls: One GetMetricData call with math on 500 metrics costs the same as one GetMetricStatistics call. Batch your metric queries.
  • CloudWatch Agent uses the StatsD protocol — you can emit custom metrics from any application: The CloudWatch Agent listens on UDP port 8125 for StatsD messages. Any application can send nginx.requests:100|c and it appears in CloudWatch as a custom metric.
  • Alarms have a 10-second evaluation period minimum — you cannot set sub-10-second alerting: For real-time alerting with sub-10-second detection, use CloudWatch Contributor Insights or a third-party monitoring tool like Datadog or Grafana.