Amazon S3 (Simple Storage Service)

S3 is the primary object storage service in AWS. Objects (files) are stored in buckets (containers) with a flat namespace identified by a globally unique key. Durability is 99.999999999% (11 9s) across multiple AZs.

Core Concepts

Buckets and Objects

Bucket: my-app-assets (globally unique name)
  └── Object
       ├── Key: "images/logo.png"
       ├── Value: [binary data]
       ├── VersionId: "abc123"
       ├── Metadata: {Content-Type: "image/png"}
       └── Tags: {Environment: "Production"}

S3 URI

s3://my-app-assets/images/logo.png
        ↑bucket         ↑key

Object Key Structure

S3 keys can simulate folder structure with prefixes:

logs/
  2024/
    01/
      15/
        app.log
        error.log
    02/
      app.log

The / in keys is just a character — S3 has no real directory hierarchy. Prefix logs/ groups all log objects.

Storage Classes

ClassDurabilityAvailabilityUse CaseCost (per GB/mo)
S3 Standard11 9s99.99%Hot data, frequent access~$0.023
S3 Intelligent-Tiering11 9s99.9%Unknown access patterns~$0.023 (monitoring)
S3 Standard-IA11 9s99.9%Infrequent (30+ days)~$0.0125
S3 Glacier IA11 9s99.99%Rare (90+ days)~$0.004
S3 Glacier11 9s99.99%Archive (180+ days)~$0.00099
S3 Glacier Deep Archive11 9s99.99%Very rare (365+ days)~$0.00099
S3 One Zone-IA11 9s99.5%Re-creatable infrequent~$0.01

Intelligent-Tiering

Automatically moves objects between tiers based on access patterns:

  • Frequent → Infrequent → Archive → Deep Archive
  • 0 monitoring/automation charge per object
  • Best for unpredictable access patterns

Creating a Bucket

aws s3 mb s3://my-unique-bucket-name

Enabling Versioning

aws s3api put-bucket-versioning \
  --bucket my-bucket \
  --versioning-configuration Status=Enabled

Enabling Encryption

aws s3api put-bucket-encryption \
  --bucket my-bucket \
  --server-side-encryption-configuration '{
    "Rules": [{
      "ApplyServerSideEncryptionByDefault": {
        "SSEAlgorithm": "AES256"
      }
    }]
  }'

Working with Objects

# Upload
aws s3 cp myfile.txt s3://my-bucket/
aws s3 cp ./local-dir s3://my-bucket/ --recursive
 
# Download
aws s3 cp s3://my-bucket/myfile.txt ./
aws s3 sync s3://my-bucket/ ./local-dir
 
# Delete
aws s3 rm s3://my-bucket/myfile.txt
 
# List
aws s3 ls s3://my-bucket/
aws s3 ls s3://my-bucket/logs/
aws s3api list-objects-v2 --bucket my-bucket

Access Control

Bucket Policies

{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "PublicRead",
    "Effect": "Allow",
    "Principal": "*",
    "Action": ["s3:GetObject"],
    "Resource": "arn:aws:s3:::my-public-bucket/*"
  }]
}

ACLs (less common now)

Private          → Owner only (default)
PublicRead       → Read object
PublicReadWrite  → Read and write
AuthenticatedRead → Any AWS authenticated user

Access Points

Simplify access for multiple applications:

aws s3control create-access-point \
  --account-id 123456789012 \
  --bucket my-bucket \
  --name mobile-app-access
# Access: s3://ap,移动应用-access.my-bucket.s3.amazonaws.com/

Lifecycle Rules

Automate storage class transitions:

aws s3api put-bucket-lifecycle-configuration \
  --bucket my-bucket \
  --lifecycle-configuration '{
    "Rules": [{
      "ID": "log-retention",
      "Prefix": "logs/",
      "Status": "Enabled",
      "Transitions": [
        {"Days": 30, "StorageClass": "STANDARD_IA"},
        {"Days": 90, "StorageClass": "GLACIER"},
        {"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
      ],
      "Expiration": {"Days": 730}
    }]
  }'

Replication

Cross-Region Replication (CRR)

aws s3api put-bucket-replication \
  --bucket my-source-bucket \
  --replication-configuration '{
    "Role": "arn:aws:iam::123456789012:role/s3-replication-role",
    "Rules": [{
      "ID": "replicate-to-us-west-2",
      "Status": "Enabled",
      "Destination": {
        "Bucket": "arn:aws:s3:::my-dest-bucket-us-west-2"
      }
    }]
  }'

Same-Region Replication (SRR)

For compliance, multi-account, or analytics.

Static Website Hosting

aws s3 website s3://my-bucket \
  --index-document index.html \
  --error-document error.html
 
# Make objects publicly readable
aws s3api put-bucket-policy \
  --bucket my-bucket \
  --policy '{
    "Version": "2012-10-17",
    "Statement": [{
      "Sid": "PublicReadGetObject",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::my-bucket/*"
    }]
  }'

Website endpoint: http://my-bucket.s3-website-us-east-1.amazonaws.com

S3 Transfer Acceleration

Speed up uploads via edge locations:

aws s3api put-bucket-accelerate-configuration \
  --bucket my-bucket \
  --accelerate-configuration Status=Enabled

Upload to: my-bucket.s3-accelerate.amazonaws.com

Performance Optimization

Prefix Partitioning

S3 auto-scales, but partitioning keys helps:

  • Good: logs/2024/01/15/app.log (year/month/day/hour)
  • Bad: logs/app.log (single hot prefix)

Multipart Upload

For files > 100MB:

aws s3 cp large-file.zip s3://my-bucket/  # SDK handles multipart automatically

S3 Select

Query CSV/JSON/Parquet without fetching the whole object:

aws s3 select-object-content \
  --bucket my-bucket \
  --key data.csv \
  --expression "SELECT * FROM s3object WHERE price > 100" \
  --input-serialization CSV \
  --output-serialization JSON

S3 Inventory

Audit all objects in a bucket:

aws s3api put-bucket-inventory-configuration \
  --bucket my-bucket \
  --inventory-configuration '{
    "Id": "full-inventory",
    "Enabled": true,
    "Destination": {
      "S3BucketDestination": {
        "Format": "Parquet",
        "Bucket": "arn:aws:s3:::my-inventory-bucket"
      }
    },
    "Schedule": {"Frequency": "Daily"},
    "IncludedObjectVersions": "All"
  }'

Event Notifications

Trigger Lambda or SNS on S3 events:

aws s3api put-bucket-notification-configuration \
  --bucket my-bucket \
  --notification-configuration '{
    "TopicConfiguration": [{
      "Topic": "arn:aws:sns:us-east-1:123456789012:my-topic",
      "Events": ["s3:ObjectCreated:*", "s3:ObjectRemoved:*"]
    }]
  }'

S3 Access Analyzer

Analyze access to prevent accidental exposure:

aws s3control get-access-point-policy \
  --account 123456789012 \
  --name mobile-app-access

Limits

ResourceLimit
Bucket name3-63 chars, globally unique, lowercase
Object size5TB (single PUT), 5GB (multipart)
Objects per bucketUnlimited
Buckets per account100 (soft limit)
PUT request5GB
Multipart upload10000 parts, 5MB-5GB per part

References

Pricing Examples

Scenario 1: A media application storing 100GB of images. 100GB = ~0.005/1K = 0.0004/1K = 0.09/GB = 7.50/month for storage and requests.

Scenario 2: A data lake storing 10TB of analytics data. With S3 Intelligent-Tiering (monitoring 2.50/month, storage 230/month). Total: ~230 + negligible = ~69/month.

Nuggets & Gotchas

  • S3 bucket names are globally unique — you can’t create a bucket with a name someone else already uses: Even if the bucket is in a different account. This is why terraform-state-bucket-prod style names with account IDs or random suffixes are common.
  • S3 DELETE is permanent — there is no recycle bin: Unless you have versioning enabled (then you can delete a version marker to recover). For production buckets, use lifecycle rules with NoncurrentVersionExpiration to clean up old versions instead of manually deleting.
  • S3’s eventual consistency applies to DELETE and PUT in certain regions — reads may return stale data briefly: For critical reads, use strong consistency (available since 2020). But note: strong consistency costs more and has higher latency.
  • S3 costs come from storage + requests + data transfer — storage is usually the smallest line item: A bucket with 10M small objects (0.23/month for 10GB) may cost 0.0004/1K). Monitor request costs separately.
  • S3 Transfer Acceleration can double data transfer costs — use it only for global uploads: Transfer Acceleration adds 0.08/GB on top of normal egress. For uploads from a single region, use standard S3. For global CDN origin pull, it’s worth it.