Post

AWS DynamoDB — NoSQL, Keys, Indexes, Streams, and DAX

A full walkthrough of AWS DynamoDB — tables, partition and sort keys, read/write capacity, GSIs, DynamoDB Streams, DAX, TTL, transactions, and design patterns

AWS DynamoDB — NoSQL, Keys, Indexes, Streams, and DAX

What is DynamoDB?

Amazon DynamoDB is a fully managed, serverless, key-value and document NoSQL database. It delivers single-digit millisecond performance at any scale — from a few requests per second to millions. There is no server to manage, no OS to patch, no capacity to pre-allocate.

DynamoDB is designed for workloads where:

  • You need consistent, low-latency reads and writes at any scale
  • Access patterns are known and relatively simple
  • You need a schemaless, flexible data model
  • You want zero operational overhead

It is not a replacement for relational databases — it has no joins, no aggregation functions, no complex queries across arbitrary columns.


Core Concepts

TermMeaning
TableThe container for data — equivalent to a database table
ItemA single record in a table — equivalent to a row
AttributeA data field on an item — equivalent to a column, but schemaless
Partition KeyThe primary key — used to distribute items across partitions
Sort KeyOptional second part of a composite key — used to sort items within a partition
Item size limit400 KB per item

Primary Keys

The primary key uniquely identifies each item in a table. DynamoDB supports two types:

Simple Primary Key (Partition Key only)

Each item’s partition key must be unique.

1
2
3
4
5
6
7
Table: Users
Partition Key: user_id (String)

user_id       │ name         │ email
──────────────┼──────────────┼─────────────────
u_001         │ Alice        │ alice@example.com
u_002         │ Bob          │ bob@example.com

Composite Primary Key (Partition Key + Sort Key)

Multiple items can share the same partition key — but the combination of partition key + sort key must be unique. Items with the same partition key are stored together, sorted by sort key. This is the most powerful key design pattern.

1
2
3
4
5
6
7
8
9
Table: Orders
Partition Key: customer_id (String)
Sort Key: order_date (String, ISO format)

customer_id   │ order_date           │ total
──────────────┼──────────────────────┼──────
c_001         │ 2026-05-01T10:00:00  │ 49.99
c_001         │ 2026-05-15T14:30:00  │ 129.50
c_002         │ 2026-05-02T09:00:00  │ 75.00

With this design, you can query all orders for a customer (partition key = c_001) and filter by date range (sort key between dates) — efficiently, without scanning the whole table.


Data Types

Scalar Types

TypeExample
String (S)"hello", "2026-05-01"
Number (N)42, 3.14 (stored as string internally)
Binary (B)Base64-encoded binary data
Boolean (BOOL)true, false
Null (NULL)true (represents a null value)

Document Types

TypeExample
Map (M){"street": "123 Main St", "city": "London"} (nested object)
List (L)["red", "green", "blue"] (ordered list, mixed types)

Set Types

TypeExample
String Set (SS){"a", "b", "c"} (unique strings)
Number Set (NS){1, 2, 3} (unique numbers)
Binary Set (BS)Set of unique binary values

Read/Write Capacity Modes

On-Demand Mode

Pay per request — DynamoDB scales instantly with no capacity planning. No minimum capacity, no throttling. More expensive per request than provisioned, but zero waste. Use for unpredictable traffic, new tables, or low-traffic workloads.

1
2
3
4
5
6
7
8
9
10
# Create table in on-demand mode
aws dynamodb create-table \
  --table-name Orders \
  --attribute-definitions \
    AttributeName=customer_id,AttributeType=S \
    AttributeName=order_date,AttributeType=S \
  --key-schema \
    AttributeName=customer_id,KeyType=HASH \
    AttributeName=order_date,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST

Provisioned Mode

You set read and write capacity units (RCU/WCU) in advance. Traffic that exceeds provisioned capacity is throttled (requests fail with ProvisionedThroughputExceededException). Cheaper per unit than on-demand — cost-effective for predictable, consistent workloads. Use Auto Scaling with provisioned mode to adjust capacity automatically based on utilisation targets.

Capacity UnitDefinition
1 RCU1 strongly consistent read per second (or 2 eventually consistent) for items up to 4 KB
1 WCU1 write per second for items up to 1 KB
1
2
3
4
5
6
7
# Create table with provisioned capacity + auto scaling
aws dynamodb create-table \
  --table-name Products \
  --attribute-definitions AttributeName=product_id,AttributeType=S \
  --key-schema AttributeName=product_id,KeyType=HASH \
  --billing-mode PROVISIONED \
  --provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=50

CRUD Operations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# Put item (create or replace)
aws dynamodb put-item \
  --table-name Orders \
  --item '{
    "customer_id": {"S": "c_001"},
    "order_date": {"S": "2026-06-01T10:00:00"},
    "total": {"N": "99.99"},
    "status": {"S": "pending"},
    "items": {"L": [
      {"M": {"sku": {"S": "ABC123"}, "qty": {"N": "2"}}}
    ]}
  }'

# Get item (by primary key — fastest operation)
aws dynamodb get-item \
  --table-name Orders \
  --key '{"customer_id": {"S": "c_001"}, "order_date": {"S": "2026-06-01T10:00:00"}}'

# Update a specific attribute (partial update — no full replace)
aws dynamodb update-item \
  --table-name Orders \
  --key '{"customer_id": {"S": "c_001"}, "order_date": {"S": "2026-06-01T10:00:00"}}' \
  --update-expression "SET #s = :status" \
  --expression-attribute-names '{"#s": "status"}' \
  --expression-attribute-values '{":status": {"S": "shipped"}}'

# Delete item
aws dynamodb delete-item \
  --table-name Orders \
  --key '{"customer_id": {"S": "c_001"}, "order_date": {"S": "2026-06-01T10:00:00"}}'

# Query — fetch all orders for a customer, last 30 days
aws dynamodb query \
  --table-name Orders \
  --key-condition-expression "customer_id = :cid AND order_date >= :start" \
  --expression-attribute-values '{
    ":cid": {"S": "c_001"},
    ":start": {"S": "2026-05-01"}
  }'

# Scan — reads every item (expensive — avoid in production)
aws dynamodb scan --table-name Products --filter-expression "price < :p" \
  --expression-attribute-values '{":p": {"N": "50"}}'

Query vs Scan: Query reads only items matching the partition key — efficient. Scan reads every item in the table — expensive and slow at scale. Design your keys and indexes so you never need a full table scan in your hot path.


Global Secondary Indexes (GSI)

A GSI lets you query on attributes other than the primary key. It’s a separate projection of the table with its own partition key and optional sort key. A table can have up to 20 GSIs.

1
2
3
4
5
Table: Orders (PK: customer_id, SK: order_date)
GSI:   StatusIndex (PK: status, SK: order_date)

→ Query: "give me all pending orders placed after May 1"
  → Uses StatusIndex: status = "pending" AND order_date > "2026-05-01"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Create table with a GSI
aws dynamodb create-table \
  --table-name Orders \
  --attribute-definitions \
    AttributeName=customer_id,AttributeType=S \
    AttributeName=order_date,AttributeType=S \
    AttributeName=status,AttributeType=S \
  --key-schema \
    AttributeName=customer_id,KeyType=HASH \
    AttributeName=order_date,KeyType=RANGE \
  --global-secondary-indexes '[{
    "IndexName": "StatusIndex",
    "KeySchema": [
      {"AttributeName": "status", "KeyType": "HASH"},
      {"AttributeName": "order_date", "KeyType": "RANGE"}
    ],
    "Projection": {"ProjectionType": "ALL"},
    "ProvisionedThroughput": {"ReadCapacityUnits": 50, "WriteCapacityUnits": 25}
  }]' \
  --billing-mode PROVISIONED \
  --provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=50

Local Secondary Indexes (LSI)

An LSI shares the same partition key as the table but uses a different sort key. Must be defined at table creation — cannot add one later. A table can have up to 5 LSIs. LSIs share the table’s provisioned capacity (unlike GSIs which have their own).


DynamoDB Streams

DynamoDB Streams captures a time-ordered sequence of all item-level modifications in a table. Each record contains the old and/or new image of the item. Records are retained for 24 hours.

Use streams for:

  • Triggering Lambda on data changes (change data capture)
  • Replicating data to other tables or services
  • Building audit logs
  • Cache invalidation

📸 SCREENSHOT: DynamoDB → Table → Exports and streams → DynamoDB stream details. Show the stream enabled with “New and old images” view type, and the Lambda trigger attached to it.

1
2
3
4
5
6
7
8
9
10
11
# Enable streams on an existing table
aws dynamodb update-table \
  --table-name Orders \
  --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES

# Add a Lambda trigger (Lambda processes records from the stream)
aws lambda create-event-source-mapping \
  --event-source-arn arn:aws:dynamodb:eu-west-1:123456789012:table/Orders/stream/2026-06-01T00:00:00.000 \
  --function-name process-order-changes \
  --batch-size 100 \
  --starting-position LATEST

TTL — Time to Live

TTL automatically deletes items when a specified Unix timestamp attribute expires. Deletes happen within 48 hours of expiry — not exactly at expiry time. There is no additional cost for TTL deletes.

Use TTL for session data, temporary records, cache tables, or compliance data with retention limits.

1
2
3
4
5
6
7
8
9
10
11
12
13
# Enable TTL on a table (specify which attribute holds the expiry timestamp)
aws dynamodb update-time-to-live \
  --table-name Sessions \
  --time-to-live-specification Enabled=true,AttributeName=expires_at

# When writing an item, include the expiry timestamp (Unix epoch)
aws dynamodb put-item \
  --table-name Sessions \
  --item '{
    "session_id": {"S": "sess_abc123"},
    "user_id": {"S": "u_001"},
    "expires_at": {"N": "1780000000"}
  }'

Transactions

DynamoDB supports ACID transactions across multiple items and tables in the same region. Use transactions when you need all-or-nothing writes — for example, transferring credits between two user accounts.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Transact write — atomically update two items
aws dynamodb transact-write-items \
  --transact-items '[
    {
      "Update": {
        "TableName": "Accounts",
        "Key": {"account_id": {"S": "acc_001"}},
        "UpdateExpression": "SET balance = balance - :amount",
        "ConditionExpression": "balance >= :amount",
        "ExpressionAttributeValues": {":amount": {"N": "100"}}
      }
    },
    {
      "Update": {
        "TableName": "Accounts",
        "Key": {"account_id": {"S": "acc_002"}},
        "UpdateExpression": "SET balance = balance + :amount",
        "ExpressionAttributeValues": {":amount": {"N": "100"}}
      }
    }
  ]'

Transactions consume 2× the RCUs/WCUs of regular operations.


DynamoDB Accelerator (DAX)

DAX is an in-memory cache for DynamoDB, fully API-compatible with DynamoDB. It reduces read latency from milliseconds to microseconds without changing your application code. DAX is ideal for read-heavy workloads (product catalogues, leaderboards, social feeds).

DAX does NOT help with:

  • Write-heavy workloads (writes go to DynamoDB directly)
  • Strongly consistent reads (DAX only serves eventually consistent reads)
  • Table scans and queries across large datasets
1
2
3
Application → DAX Cluster → DynamoDB
                │ cache hit: <1ms
                └── cache miss: reads from DynamoDB and caches result

📸 SCREENSHOT: DynamoDB → DAX → Create cluster. Show the cluster configuration with node type, number of nodes, subnet group, and the security group selection.


Point-in-Time Recovery (PITR)

PITR continuously backs up your table and lets you restore it to any second within the last 35 days. Enable it on every production table.

1
2
3
4
5
6
7
8
9
10
# Enable PITR
aws dynamodb update-continuous-backups \
  --table-name Orders \
  --point-in-time-recovery-specification PointInTimeRecoveryEnabled=true

# Restore to a point in time (creates a new table)
aws dynamodb restore-table-to-point-in-time \
  --source-table-name Orders \
  --target-table-name Orders-restored-2026-06-01 \
  --restore-date-time 2026-06-01T12:00:00Z

DynamoDB Design Principles

DynamoDB rewards good key design and punishes bad key design. The goal is to spread traffic evenly across partitions — a hot partition (one partition getting all the traffic) causes throttling.

Access Pattern First

Unlike relational databases where you design the schema first, DynamoDB requires you to know your access patterns up front. Start with: “what queries does my application run?” — then design keys and indexes to serve those queries efficiently.

Single-Table Design

Advanced DynamoDB users often put multiple entity types in one table (called single-table design). Different item types use different prefixes in the key: USER#u_001, ORDER#o_001, PRODUCT#p_001. GSIs allow multiple access patterns across these entity types. This reduces costs and latency by keeping related data in one place.

Hot Partition Prevention

Avoid partition keys with low cardinality (e.g. status = "active" — all active items hit the same partition). Use high-cardinality keys (user IDs, UUIDs, order IDs). For write-heavy workloads, add a random suffix (sharding) to distribute writes: user_id#0, user_id#1, …, user_id#9.


Quick Reference

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Tables
aws dynamodb list-tables
aws dynamodb describe-table --table-name Orders
aws dynamodb describe-table --table-name Orders \
  --query 'Table.[TableName,TableStatus,ItemCount,TableSizeBytes]'

# Capacity
aws dynamodb describe-table --table-name Orders \
  --query 'Table.BillingModeSummary'

# Backup
aws dynamodb list-backups --table-name Orders
aws dynamodb create-backup --table-name Orders --backup-name orders-backup-$(date +%Y%m%d)

# Export to S3 (for analytics without consuming table capacity)
aws dynamodb export-table-to-point-in-time \
  --table-arn arn:aws:dynamodb:eu-west-1:123456789012:table/Orders \
  --s3-bucket my-ddb-exports \
  --export-format DYNAMODB_JSON

# Check CloudWatch metrics
aws cloudwatch get-metric-statistics \
  --namespace AWS/DynamoDB \
  --metric-name ConsumedReadCapacityUnits \
  --dimensions Name=TableName,Value=Orders \
  --start-time $(date -d '1 hour ago' --iso-8601=seconds) \
  --end-time $(date --iso-8601=seconds) \
  --period 60 \
  --statistics Sum
When to use DynamoDBWhen to use RDS/Aurora
Single-digit ms latency at any scaleComplex queries, JOINs, aggregations
Known, simple access patternsAd-hoc querying and reporting
Schemaless, evolving data modelStrong relational integrity (foreign keys)
Serverless, event-driven architecturesExisting relational application
Massive scale with auto-scalingACID transactions across many entities
This post is licensed under CC BY 4.0 by the author.