3.1 Operational Excellence

Observability

CloudWatch (Cheatsheet)

FeatureFunctionUse Case
LogsCentralized logsApp logs, Lambda logs, VPC Flow Logs.
MetricsNumerical data pointsCPU%, Disk I/O, Custom Metrics.
AlarmsTriggers based on metricsAuto Scaling, SNS Notifications.
Events (EventBridge)Real-time stream of system eventsEC2 State Change Lambda.
Contributor InsightsHigh cardinality analysis”Who are top 10 Talkers in VPC Flow Logs?”
SettingsConfigurationRetention policies.

AWS X-Ray

  • Function: Distributed Tracing. Visualizes the “Service Map”.
  • Integration: Works with EC2, ECS, Lambda, API Gateway, SNS, SQS.
  • Annotations vs Metadata:
    • Annotations: Indexed, Searchable (e.g., UserID).
    • Metadata: Not indexed, Extra data.

WARNING

Exam Gotcha: If you need to trace a request across multiple microservices and visualize latency bottlenecks, the answer is X-Ray. CloudWatch Logs is for text logs; X-Ray is for traces.

Automation

AWS Systems Manager (SSM)

  • Automation Documents: JSON/YAML scripts to perform actions (e.g., “Restart instances”, “Create snapshot”).
  • Patch Manager: Automate patching for EC2 and On-prem (Hybrid).
  • Session Manager: Secure shell access without opening port 22 or managing keys. Logs to S3/CloudWatch.

OpsCenter

  • Function: Central location to view, investigate, and resolve OpsItems (operational issues) related to AWS resources.