AWS Macie

Macie uses machine learning to automatically discover, classify, and protect sensitive data in S3. It identifies PII (personally identifiable information), PHI (health records), credentials, and intellectual property.

What Macie Does

Macie scans S3:
  │
  ├── Classification
  │   ├── Sensitive data types (PII, PHI, credentials)
  │   ├── Custom data identifiers (regex, keywords)
  │   └── Pre-built data types (credit cards, SSNs, etc.)
  │
  └── Monitoring
      ├── S3 policy changes (bucket made public)
      ├── Access patterns (unusual data access)
      └── Data egress alerts

Macie Classic vs Macie v2

FeatureMacie ClassicMacie v2 (Current)
ScopeAll S3 buckets in accountSpecific buckets or all
ClassificationOne-time or scheduledContinuous (jobs)
PricingPer object scannedPer GB classified
Custom identifiersNoYes
IntegrationsCloudWatchSecurity Hub, EventBridge

Enabling Macie

# Enable Macie v2
aws macie2 enable
 
# Accept the service linked role
aws macie2 enable --no-ignore-previous-warnings

Creating a Classification Job

# One-time job (all S3, all objects)
aws macie2 create-classification-job \
  --name "full-scan" \
  --job-type ONE_TIME \
  --s3-job-definition '{
    "bucketDefinitions": [{"accountId": "123456789012", "buckets": ["my-bucket"]}]
  }'
 
# Scheduled job (weekly)
aws macie2 create-classification-job \
  --name "weekly-scan" \
  --job-type SCHEDULED \
  --schedule-frequency '{
    "daily": {}
  }' \
  --s3-job-definition '{
    "bucketDefinitions": [{"accountId": "123456789012", "buckets": ["my-bucket"]}]
  }'

Sensitivity Scoring

Macie assigns a sensitivity score (0-100) to each S3 object:

ScoreClassification
0-49Low — no sensitive data
50-69Medium — some sensitive data
70-89High — significant sensitive data
90-100Critical — large amounts of sensitive data

Finding Types

{
  "type": "SensitiveData:S3Object/PII",
  "severity": {
    "score": 85
  },
  "title": "PII detected in S3 object",
  "description": "Amazon Macie detected 5 occurrences of Social Security Numbers (US) in my-bucket/finance/employees.csv",
  "resourcesAffected": {
    "s3Bucket": {"name": "my-bucket", "arn": "arn:aws:s3:::my-bucket"},
    "s3Object": {"key": "finance/employees.csv", "size": 1024, "versionId": "xxx"}
  },
  "classificationDetails": {
    "detectedAt": "2024-01-15T10:00:00Z",
    "jobId": "xxxxx",
    "result": {
      "sensitiveData": [{"category": "PII", "occurrences": 5}],
      "customDataIdentifiers": null
    }
  }
}

Custom Data Identifiers

# Create custom regex identifier
aws macie2 create-custom-data-identifier \
  --name "employee-id" \
  --regex "EMP-[0-9]{6}" \
  --description "Employee ID pattern" \
  --keywords '["employee", "id"]'
 
# Create custom managed data identifier (keywords)
aws macie2 create-custom-data-identifier \
  --name "project-code" \
  --regex "[A-Z]{3}-[0-9]{4}" \
  --keywords '["project", "code"]'

S3 Policy Monitoring

# Get policy findings
aws macie2 list-findings \
  --finding-criteria '{
    "type": [{"eq": ["EXTERNAL_BUCKET"]}]
  }'

Macie alerts on:

  • S3 bucket made public
  • S3 bucket policy changed to allow external access
  • S3 bucket shared with another AWS account
  • S3 bucket ACL changed

Integration with Security Hub

# Macie sends findings to Security Hub automatically (when enabled)
aws securityhub enable-import-findings-for-product \
  --product-arn "arn:aws:securityhub:us-east-1::product/aws/macie"

Automated Response

# EventBridge rule for sensitive data findings
aws events put-rule \
  --name macie-sensitive-data \
  --event-pattern '{
    "source": ["aws.macie"],
    "detail": {
      "type": ["SensitiveData:S3Object/PII"]
    }
  }'
 
# Target: Lambda to remediate
aws events put-targets \
  --rule macie-sensitive-data \
  --targets '[{
    "Id": "remediate",
    "Arn": "arn:aws:lambda:us-east-1:123456789012:function:macie-remediate",
    "Input": "{\"detail\": <see full event>}"
  }]'

Example Remediation Lambda

def lambda_handler(event, context):
    bucket = event['detail']['resourcesAffected']['s3Bucket']['name']
    key = event['detail']['resourcesAffected']['s3Object']['key']
    
    # Block public access
    s3 = boto3.client('s3')
    s3.put_public_access_block(
        Bucket=bucket,
        PublicAccessBlockConfiguration={
            'BlockPublicAcls': True,
            'IgnorePublicAcls': True,
            'BlockPublicPolicy': True,
            'RestrictPublicBuckets': True
        }
    )
    
    # Move to quarantine prefix
    s3.copy_object(
        Bucket=bucket,
        Key=f"quarantine/{key}",
        CopySource={'Bucket': bucket, 'Key': key}
    )
    s3.delete_object(Bucket=bucket, Key=key)
    
    # Notify
    sns.publish(
        TopicArn='arn:aws:sns:us-east-1:123456789012:sensitive-data-alerts',
        Message=f"Sensitive data detected in {bucket}/{key}. Moved to quarantine."
    )

Pricing

ComponentCost
Classification (first 10GB/month)Free
Classification (after 10GB)$0.10/GB
Custom data identifiersFree
S3 policy monitoringFree

Limits

ResourceLimit
Classification jobs20 per account
Custom data identifiers50 per account
S3 bucketsUnlimited
Objects per jobUnlimited

References

Pricing Examples

Scenario 1: A company with 100GB of S3 data. First 10GB free. 90GB × 9/month. Macie is extremely cheap for typical workloads.

Scenario 2: A data analytics company with 5TB of S3 data. 5TB - 10GB free = 5TB × 512/month. For 5TB of data, consider whether continuous classification is necessary or if one-time classification jobs are sufficient.

Nuggets & Gotchas

  • Macie only scans S3 — it doesn’t scan RDS, DynamoDB, or EBS: Macie is specifically for object storage. For database-level sensitive data classification, use different tools (e.g., AWS CloudWatch Data Protection for logs).
  • Macie v2’s continuous classification runs on NEW and MODIFIED objects only — not all existing objects: If you need full initial classification, run a one-time job first, then continuous jobs will maintain it. Don’t assume continuous = all historical data classified.
  • Macie’s PII detection for custom formats (internal employee IDs, etc.) requires custom data identifiers ($0.10/GB still applies): Pre-built identifiers (SSN, credit card) are free. Custom regex/keyword identifiers cost the same as standard classification.
  • Macie findings auto-sent to Security Hub if both are enabled — no additional cost: If you already use Security Hub, enabling Macie is “free” in terms of integration effort. Just make sure to filter Macie findings in Security Hub.
  • Macie can trigger on EVERY new object in a busy S3 bucket — set up job frequency carefully: If you upload 100K objects/day, Macie will scan each one. For high-volume buckets, consider daily or weekly scheduled jobs instead of continuous.