3.1 Operational Excellence
Observability
CloudWatch (Cheatsheet)
| Feature | Function | Use Case |
|---|---|---|
| Logs | Centralized logs | App logs, Lambda logs, VPC Flow Logs. |
| Metrics | Numerical data points | CPU%, Disk I/O, Custom Metrics. |
| Alarms | Triggers based on metrics | Auto Scaling, SNS Notifications. |
| Events (EventBridge) | Real-time stream of system events | EC2 State Change → Lambda. |
| Contributor Insights | High cardinality analysis | ”Who are top 10 Talkers in VPC Flow Logs?” |
| Settings | Configuration | Retention policies. |
AWS X-Ray
- Function: Distributed Tracing. Visualizes the “Service Map”.
- Integration: Works with EC2, ECS, Lambda, API Gateway, SNS, SQS.
- Annotations vs Metadata:
- Annotations: Indexed, Searchable (e.g., UserID).
- Metadata: Not indexed, Extra data.
WARNING
Exam Gotcha: If you need to trace a request across multiple microservices and visualize latency bottlenecks, the answer is X-Ray. CloudWatch Logs is for text logs; X-Ray is for traces.
Automation
AWS Systems Manager (SSM)
- Automation Documents: JSON/YAML scripts to perform actions (e.g., “Restart instances”, “Create snapshot”).
- Patch Manager: Automate patching for EC2 and On-prem (Hybrid).
- Session Manager: Secure shell access without opening port 22 or managing keys. Logs to S3/CloudWatch.
OpsCenter
- Function: Central location to view, investigate, and resolve OpsItems (operational issues) related to AWS resources.