CloudWatch Logs Insights
CloudWatch Logs Insights (CLI: aws insights) is a query language for analyzing CloudWatch Logs. It supports structured field extraction, aggregation, filtering, and visualization.
Core Concepts
Query Syntax
Logs Insights uses a SQL-like query language:
fields @timestamp, @message, level
| filter level = "ERROR"
| sort @timestamp desc
| limit 20Query Structure
fields → Extract specific fields
| filter → Filter log lines by condition
| sort → Order by field
| limit → Limit results
| stats → Aggregate (count, avg, sum, etc.)
| display → Rename fields for output
Log Format Example (JSON)
{"timestamp":"2024-01-15T10:30:00Z","level":"ERROR","service":"checkout","message":"Database connection failed","error":"Connection refused","duration_ms":5000}Query:
fields @timestamp, service, message, duration_ms
| filter level = "ERROR"
| sort duration_ms desc
| limit 10Field Reference
| Field | Description |
|---|---|
@timestamp | When the log event was ingested |
@ingestionTime | When CloudWatch received the log |
@message | The raw log message (string) |
@log | Full log group/stream path |
@logStream | The log stream name |
Query Commands
fields — Extract Fields
fields @timestamp, @message, level, servicefilter — Filter by Condition
fields @timestamp, @message
| filter level = "ERROR"
| filter service = "checkout"Filter operators:
=!=>=<=><likenot like(regex)containsnot containsandornot
fields @timestamp, @message
| filter level = "ERROR" and (service = "checkout" or service = "payment")stats — Aggregation
fields @timestamp, service, latency_ms
| filter service = "api"
| stats avg(latency_ms) as avg_latency,
max(latency_ms) as max_latency,
count() as request_count
by service
| sort avg_latency descAggregation functions:
count()— number of log linesavg()min()max()sum()percentile()distinct()sort asc/desc
sort and limit
fields @timestamp, @message
| filter level = "ERROR"
| sort @timestamp desc
| limit 50Common Query Patterns
Error Rate Over Time
fields @timestamp, level
| filter level = "ERROR"
| stats count() as error_count
by bin(5m)
| sort bin(5m) ascHTTP Request Latency Percentiles
fields @message
| filter @message like /GET|POST|PUT|DELETE/
| parse @message "* [*] *" as method, path, status
| stats avg(latency_ms) as p50,
percentile(latency_ms, 95) as p95,
percentile(latency_ms, 99) as p99
by serviceDatabase Query Analysis
fields @timestamp, query, duration_ms
| filter type = "db.query"
| stats avg(duration_ms) as avg_query_time,
max(duration_ms) as max_query_time,
count() as query_count
by query
| sort avg_query_time desc
| limit 20Failed Logins
fields @timestamp, src_ip, user, result
| filter event = "login" and result = "FAILED"
| stats count() as failed_attempts
by src_ip, user
| sort failed_attempts desc
| limit 10Parse JSON Logs
fields @message
| filter @message like /\{.*\}/
| parse @message '{"timestamp":"*","level":"*","service":"*","message":"*"}' as ts, level, service, message
| filter level = "ERROR"
| sort @timestamp descSaved Queries
Save frequently-used queries for reuse:
aws insights save-query \
--name "errors-last-hour" \
--query-string "fields @timestamp, @message | filter level = \"ERROR\" | sort @timestamp desc | limit 50" \
--log-group-name /aws/lambda/my-functionRun a saved query:
aws insights start-query \
--log-group-name /aws/lambda/my-function \
--start-time 1705312800 \
--end-time 1705316400 \
--query-string "fields @timestamp, @message | filter level = \"ERROR\" | sort @timestamp desc"Contribution Insights
Contributor Insights identifies top contributors to a metric:
fields src_ip
| filter status = 500
| sort count() desc
| limit 10Creating a Contributor Insight Rule
aws insights put-insight-rules \
--insight-rules '[{
"Name": "top-500-errors",
"Schema": {
"Name": "CloudWatchLogInsightsRule",
"MatchLogic": "And",
"Constraints": [{
"Dimensions": ["src_ip", "uri"],
"Type": "string"
}]
},
"FilterPattern": "{\"status\": [500]}",
"ReleaseBehind": "LatestRelease"
}]'Integration with Dashboards
Add Logs Insights queries to CloudWatch Dashboards:
{
"type": "logInsights",
"x": 0, "y": 0, "width": 24, "height": 9,
"properties": {
"title": "Error Rate",
"logGroupNames": ["/aws/lambda/checkout", "/aws/lambda/payment"],
"queries": [
{
"text": "fields @timestamp, level | filter level = \"ERROR\" | stats count() by bin(5m)",
"name": "Error Rate",
"visible": true
}
]
}
}Visualization
Insights results can be displayed as:
- Table — Default, rows and columns
- Line chart — Time series (when using
bin()) - Bar chart — Aggregation by category
- Pie chart — Percentage breakdown
Limits
| Resource | Limit |
|---|---|
| Query runtime | 60 minutes (async) |
| Results returned | 1,000 lines (sync), 10,000 (async) |
| Queries per account (concurrent) | 10 |
| Saved queries per account | 500 |
References
- Homepage: https://aws.amazon.com/cloudwatch/
- Documentation: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/
- Pricing: https://aws.amazon.com/cloudwatch/pricing/
Pricing Examples
Scenario 1: A developer running 20 Logs Insights queries per day to debug production issues. 20 × 30 = 600 queries/month. Logs Insights: 3/month. Without Logs Insights, debugging would require SSH into instances and running grep — 30 minutes/day × 1,500/month in engineering time.
Scenario 2: A security team using Contributor Insights to identify top IPs hitting 500 errors. One Contributor Insight rule running continuously. 0.10/million = 10,000 in downtime costs.
Nuggets & Gotchas
- Logs Insights queries are not real-time — they query historical data: A query on the last 5 minutes of logs has a latency of 10-30 seconds. For real-time log analysis, use CloudWatch Contributor Insights or a third-party tool.
- The
parsecommand is powerful but slow on large log groups: If your query times out, simplify the parse pattern or narrow the time range. Complex parse patterns with multiple regex groups are expensive. - Queries with
*in field names are treated as glob, not regex: If you have a field namedhost.nameand you usefields host.*, it returns all fields starting withhost.. If you want regex, usefields regex "host\\..*". - The
bin()function requires the time series to have at least one data point per bin: If you bin by 5 minutes but your log group has no events in a 5-minute window, that bin is empty and doesn’t appear in the chart. Usestatswithfill()to handle gaps. - Query results are limited to 1,000 rows in the console (10,000 via API): For large-scale log analysis, use async query execution (
aws insights start-query) which can return up to 10,000 rows and stores results for 30 minutes.