Amazon DynamoDB

DynamoDB is a fully managed NoSQL database with single-digit millisecond latency at any scale. It supports key-value and document data models. No servers to manage, automatic partitioning, and on-demand capacity or provisioned capacity with auto-scaling.

Core Concepts

Data Model

Table
  └── Item (row)
        ├── Attribute (column)
        ├── Partition Key (required, hash)
        └── Sort Key (optional, range)

Primary Key

Partition Key (PK) only:

UserID (PK) → Hash function → Partition

All items with the same PK are stored together.

Partition Key + Sort Key (SK):

UserID (PK) + OrderID (SK) → Partition

Items are sorted within a partition. Allows efficient range queries within a partition.

Example Table: Orders

UserID (PK)OrderID (SK)DateTotalStatus
user123order-0012024-01-1599.99shipped
user123order-0022024-02-20149.99pending
user456order-0012024-01-1029.99delivered

Query: Get all orders for user123:

aws dynamodb query \
  --table-name Orders \
  --key-condition-expression "UserID = :uid" \
  --expression-attribute-values '{":uid": {"S": "user123"}}'

Creating a Table

aws dynamodb create-table \
  --table-name Orders \
  --attribute-definitions '[
    {"AttributeName": "UserID", "AttributeType": "S"},
    {"AttributeName": "OrderID", "AttributeType": "S"}
  ]' \
  --key-schema '[
    {"AttributeName": "UserID", "KeyType": "HASH"},
    {"AttributeName": "OrderID", "KeyType": "RANGE"}
  ]' \
  --billing-mode PAY_PER_REQUEST \
  --table-class STANDARD

With Provisioned Capacity

aws dynamodb create-table \
  --table-name Orders \
  --attribute-definitions '[...]' \
  --key-schema '[...]' \
  --provisioned-throughput '{
    "ReadCapacityUnits": 10,
    "WriteCapacityUnits": 5
  }'

Reading and Writing

PutItem (insert/replace)

aws dynamodb put-item \
  --table-name Orders \
  --item '{
    "UserID": {"S": "user123"},
    "OrderID": {"S": "order-001"},
    "Date": {"S": "2024-01-15"},
    "Total": {"N": "99.99"},
    "Status": {"S": "shipped"}
  }'

GetItem (read by PK + SK)

aws dynamodb get-item \
  --table-name Orders \
  --key '{"UserID": {"S": "user123"}, "OrderID": {"S": "order-001"}}'

Query (range of items by SK within a partition)

aws dynamodb query \
  --table-name Orders \
  --key-condition-expression "UserID = :uid AND OrderID BETWEEN :start AND :end" \
  --expression-attribute-values '{
    ":uid": {"S": "user123"},
    ":start": {"S": "order-001"},
    ":end": {"S": "order-999"}
  }'

Scan (full table, expensive)

aws dynamodb scan \
  --table-name Orders \
  --filter-expression "Status = :status" \
  --expression-attribute-values '{":status": {"S": "shipped"}}'

Warning: Scan reads every item in the table. Use FilterExpression to reduce response size, but you still pay for all the reads. Never scan large tables in production.

Batch Operations

# BatchGetItem (up to 100 items)
aws dynamodb batch-get-item \
  --request-items '{
    "Orders": {
      "Keys": [
        {"UserID": {"S": "user123"}, "OrderID": {"S": "order-001"}},
        {"UserID": {"S": "user456"}, "OrderID": {"S": "order-001"}}
      ]
    }
  }'
 
# BatchWriteItem (up to 25 items, 16MB)
aws dynamodb batch-write-item \
  --request-items '{
    "Orders": [
      {"PutRequest": {"Item": {"UserID": {"S": "user789"}, "OrderID": {"S": "order-001"}}}}
    ]
  }'

Secondary Indexes

Global Secondary Index (GSI)

A GSI has a different PK and optional SK from the base table. It has its own provisioned throughput.

aws dynamodb update-table \
  --table-name Orders \
  --attribute-definitions '[{"AttributeName": "Status", "AttributeType": "S"}]' \
  --global-secondary-index-updates '[{
    "Create": {
      "IndexName": "StatusIndex",
      "KeySchema": [{"AttributeName": "Status", "KeyType": "HASH"}],
      "Projection": {"ProjectionType": "ALL"},
      "ProvisionedThroughput": {"ReadCapacityUnits": 5, "WriteCapacityUnits": 5}
    }
  }]'

Query GSI:

aws dynamodb query \
  --table-name Orders \
  --index-name StatusIndex \
  --key-condition-expression "#st = :status" \
  --expression-attribute-names '{"#st": "Status"}' \
  --expression-attribute-values '{":status": {"S": "shipped"}}'

Local Secondary Index (LSI)

LSI has the same PK as the base table but a different SK. Shares the base table’s partition throughput.

aws dynamodb create-table \
  --table-name Orders \
  --attribute-definitions '[
    {"AttributeName": "UserID", "AttributeType": "S"},
    {"AttributeName": "OrderID", "AttributeType": "S"},
    {"AttributeName": "Date", "AttributeType": "S"}
  ]' \
  --key-schema '[
    {"AttributeName": "UserID", "KeyType": "HASH"},
    {"AttributeName": "OrderID", "KeyType": "RANGE"}
  ]' \
  --local-secondary-indexes '[{
    "IndexName": "DateIndex",
    "KeySchema": [
      {"AttributeName": "UserID", "KeyType": "HASH"},
      {"AttributeName": "Date", "KeyType": "RANGE"}
    ],
    "Projection": {"ProjectionType": "ALL"}
  }]'

GSI vs LSI

GSILSI
PKDifferent from base tableSame as base table
SKDifferent (optional)Different
ThroughputOwn provisioned capacityShares base table capacity
Size limit10GB per partition key valueNo limit
ProjectionsALL, KEYS_ONLY, INCLUDEALL, KEYS_ONLY, INCLUDE
Use whenNeed different PK access patternsNeed different SK with same PK

DynamoDB Streams

Capture item-level changes (insert, modify, remove):

aws dynamodb update-table \
  --table-name Orders \
  --stream-specification '{
    "StreamEnabled": true,
    "StreamViewType": "NEW_AND_OLD_IMAGES"
  }'

StreamViewType options:

  • KEYS_ONLY — only PK/SK
  • NEW_IMAGE — entire new item
  • OLD_IMAGE — entire old item
  • NEW_AND_OLD_IMAGES — both

Time To Live (TTL)

Automatically delete items after expiration:

aws dynamodb update-time-to-live \
  --table-name Orders \
  --time-to-live-specification '{
    "Enabled": true,
    "AttributeName": "ExpiresAt"
  }'

Set ExpiresAt to Unix timestamp. Items expire and are deleted within 48 hours.

DAX (DynamoDB Accelerator)

In-memory cache (write-through) for microsecond latency:

# Create DAX cluster
aws dax create-cluster \
  --cluster-name my-dax \
  --node-type dax.r4.large \
  --replication-factor 2 \
  --iam-role-arn arn:aws:iam::123456789012:role/dax-role
 
# Update table to enable DAX
# DAX is accessed via a separate endpoint (not the DynamoDB endpoint)

Note: DAX is not a read-through cache — only write-through. For read-heavy workloads, use DAX or ElastiCache (DynamoDB doesn’t natively support read-through caching).

Partition Behavior

DynamoDB distributes data across partitions by hashing the PK:

PK Hash (MD5) → 0 to 2^128 → Partition

Each partition supports:

  • Up to 1,000 WCUs
  • Up to 3,000 RCUs
  • 10GB of data

Hot Partitions

If one PK gets more traffic than others (celebrity problem):

  • Use write sharding: userID + "#" + random(1-10) to spread writes
  • Use random suffixes in sort key
  • Consider provisioned capacity with higher RCU/WCU for hot items

Pricing

On-Demand Mode

Cost
WCU (write)$1.25 per million
RCU (read)0.125 (eventually consistent)
Data storage$0.25/GB/month

Provisioned Mode

Cost
WCU$0.00065 per hour
RCU$0.00013 per hour
Data storage$0.25/GB/month

Reserved Capacity

1 or 3 year commitment: 50-70% savings.

References

Pricing Examples

Scenario 1: A table with 10M items, 5KB average item size. 1000 writes/day, 10,000 reads/day. On-Demand: 1000 × 1 WCU = 1,000,000 WCUs/month. 10,000 × 1 RCU = 10,000,000 RCUs/month. WCU cost: 1M × 1.25/month. RCU cost: 10M × 2.50/month. Storage: 10M × 5KB = 50GB × 12.50/month. Total: ~$16/month.

Scenario 2: Same table with heavy write load (1000 writes/second, 24/7). Provisioned: 1000 WCU = 468/month. On-Demand: 1000 writes/sec × 3600 sec/hr × 24hr × 30 days = 2.16 billion writes/month. 2.16B × 2,700/month. Provisioned is 5.7x cheaper for consistent high throughput.

Nuggets & Gotchas

  • DynamoDB has no schema — items in the same table can have completely different attributes: One item can have {"email": "x"} and another {"count": 42}. Enforce schema at the application layer or use a separate attribute to indicate type.
  • DynamoDB transactions (TransactWriteItems) have a 25-item limit — you can’t update 30 items atomically: If you need atomic updates across more than 25 items, use a Saga pattern (orchestrated compensation) instead of a single transaction.
  • GSIs have their own provisioned throughput and cannot be updated without recreating the index: If you need to change the GSI key schema, you must create a new GSI, backfill data, and switch. Plan your GSI design carefully.
  • On-demand DynamoDB is more expensive than provisioned for consistent high-throughput workloads: If you know your traffic pattern, use provisioned with auto-scaling. On-demand is for unpredictable, spiky, or low-traffic tables.
  • DynamoDB doesn’t support joins — you must denormalize or use multiple queries: If you need related data (orders + customer info), embed it in the item or make separate queries. For complex queries, consider using Elasticsearch or Athena.