Designing Instagram — Photo Sharing Platform
⚡ Difficulty: Intermediate 🏷️ Topics: CDN, Object Storage, Fan-out, Media Processing, Feed Ranking 🏢 Asked at: Meta, Pinterest, Snap, Google, Amazon
1. Understanding the Problem
Instagram is a photo and video sharing platform where users upload media, follow other users, and consume a personalized feed of content. The system must handle billions of photo uploads, deliver images globally with low latency via CDN, and generate personalized feeds for hundreds of millions of users. The key challenges are: efficiently processing and storing media at scale, generating feeds without overwhelming the system, and delivering images fast regardless of user location.
1.5. Naive First Cut
flowchart LR
Client["Mobile App"]:::client
API["API Server"]:::service
DB["Postgres DB"]:::data
Disk["Local Disk Storage"]:::data
Client --> API
API --> DB
API --> Disk
classDef client fill:#4c3a5e,stroke:#818cf8,color:#e2e8f0
classDef service fill:#1a3a2a,stroke:#4ade80,color:#e2e8f0
classDef data fill:#3b3520,stroke:#fbbf24,color:#e2e8f0
How this breaks:
- Local disk storage can’t serve images globally — users in Tokyo wait 2+ seconds for images stored in US-East
- Single API server becomes bottleneck during upload spikes (New Year’s Eve, live events)
- No image resizing — phones download 12MP originals on 3G connections
- Feed generation via
SELECT * FROM posts WHERE user_id IN (following) ORDER BY timekills the DB at scale - No caching layer — every feed request hits the database
- Celebrity posts (100M followers) create thundering herd on reads
The rest of the doc evolves this into a globally distributed media platform with CDN delivery and intelligent feed generation.
1.7. Prior Art We’re Drawing From
- Instagram Engineering (Cassandra for Feed Storage) — Moved from Redis to Cassandra for feed storage to handle 500M+ users. Uses a hybrid fan-out approach (write for normal users, read for celebrities). (Instagram Engineering blog)
- Facebook TAO (Social Graph Cache) — Distributed graph-aware cache serving billions of queries/sec for social relationships. Demonstrates that the social graph must be cached separately from content. (Facebook TAO paper)
- Flickr Architecture (Image Serving) — Pioneered the multi-tier image serving pattern: upload → process → store in object storage → serve via CDN. Proved that separating upload and serving paths is essential. (Flickr architecture talk)
- Pinterest Image Processing Pipeline — Async image processing with multiple resolution generation, perceptual hashing for deduplication, and progressive JPEG delivery. (Pinterest Engineering blog)
- Twitter Fan-out Service — Demonstrates the fan-out-on-write vs fan-out-on-read tradeoff at scale. Twitter hybrid approach handles celebrities differently from normal users.
2. Functional Requirements
Core (Top 3)
- Upload photos and videos — users can upload media with captions, apply filters, and tag locations
- View personalized feed — users see a ranked feed of posts from people they follow
- Follow and unfollow users — build a social graph that drives feed generation
Below the Line
- Stories (24-hour ephemeral content)
- Direct messages
- Comments and likes
- Explore/discovery page
- Reels (short-form video)
3. Non-Functional Requirements
Core
| NFR | Target |
|---|---|
| Feed Latency | Feed load < 500ms P95 globally |
| Upload Latency | Photo upload completes < 3 seconds (user sees confirmation) |
| Availability | 99.99% — users expect Instagram to always be up |
| Scale | 2B monthly active users, 100M+ photos uploaded daily |
Below the Line
- Image deduplication (nice-to-have, saves storage cost)
- Multi-region disaster recovery
- Content moderation pipeline
Technology Choices
| Tier | Purpose | Stores | Access Pattern | Primary | Alternatives |
|---|---|---|---|---|---|
| Object Storage | Original and resized images | Raw media files (JPEG, MP4) | Write once, read many via CDN | S3 | GCS, Azure Blob |
| CDN | Global image delivery | Cached image variants | High-QPS reads, edge-cached | CloudFront or Cloudflare | Fastly, Akamai |
| Feed Store | Pre-computed user feeds | Ordered post IDs per user | Write-heavy (fan-out), sequential reads | Cassandra | ScyllaDB, DynamoDB |
| Post Metadata DB | Post details, captions, tags | Structured post data | Read-heavy, indexed queries | Postgres (sharded by user) | CockroachDB, Vitess |
| Social Graph DB | Follow relationships | Follower and following edges | High-QPS lookups (who follows whom) | Redis Cluster (adjacency sets) | Neo4j, TAO-style cache |
| Event Bus | Async processing pipeline | Upload events, feed fan-out events | Fan-out writes, ordered per user | Kafka | Redpanda, Kinesis |
| Cache | Hot feed data, user profiles | Serialized feed pages, profile JSON | High-QPS reads, TTL-based | Redis Cluster | Memcached |
| Media Processing Queue | Image resize jobs | Processing tasks | FIFO per upload | SQS or Kafka | RabbitMQ |
Why Cassandra for the feed store, not Postgres? Feed reads are sequential (give me the next 20 posts) and writes are massive during fan-out (one post fans out to millions of follower feeds). Cassandra’s write-optimized LSM-tree and partition-key access pattern (userId → sorted posts) is perfect. Postgres would choke on the write amplification.
Why Redis for the social graph, not the main DB?
“Does user A follow user B?” is called on every feed request, every like, every comment. At 2B users, this needs sub-millisecond latency. Redis SET operations (SISMEMBER) answer this in microseconds.
4. Core Entities
- User — profile info, follower count, following count, settings
- Post — media URL, caption, location, timestamp, author
- Feed — ordered list of post IDs for a user’s home timeline
- Follow — directed edge from follower to followee
- Media — physical file metadata: S3 key, dimensions, format, sizes generated
- Like — user + post association with timestamp
5. API / System Interface
POST /api/v1/posts
Body: { mediaFile (multipart), caption, location?, tags[]? }
Response: { postId, mediaUrl, status: "PROCESSING", timestamp }
Auth: JWT Bearer token
Note: Returns immediately; media processing happens async
GET /api/v1/feed?cursor=<timestamp>&limit=20
Response: { posts: [{ postId, authorId, mediaUrls, caption, likes, timestamp }], nextCursor }
Note: Cursor-based pagination for infinite scroll
POST /api/v1/users/{userId}/follow
Response: { status: "FOLLOWING", timestamp }
DELETE /api/v1/users/{userId}/follow
Response: { status: "UNFOLLOWED" }
GET /api/v1/users/{userId}/profile
Response: { userId, username, bio, postCount, followerCount, followingCount, posts[] }
6. High-Level Design
FR1: Upload Photos and Videos
When a user takes a photo and hits “Share,” we need to store the image, process it into multiple sizes, and make it available globally. The key insight: don’t make the user wait for processing. Accept the upload, confirm immediately, process in the background.
New components:
- API Gateway — Handles auth, rate limiting, routes requests. For uploads, it streams the file directly to object storage (not through the app server — avoids memory pressure).
- Upload Service — Validates the upload, generates a unique media ID, writes metadata to DB, and triggers async processing.
- Object Storage (S3) — Stores the original image. Write-once, read-many. Durable (11 nines).
- Media Processing Workers — Consume from a queue, generate thumbnails (150px, 320px, 640px, 1080px), compress, strip EXIF data, and write variants back to S3.
- Processing Queue (Kafka or SQS) — Decouples upload from processing. If workers are busy, uploads still succeed.
flowchart LR
App["Mobile App"]:::client
GW["API Gateway"]:::edge
US["Upload Service"]:::service
S3["S3 Object Store"]:::data
Q["Processing Queue"]:::async
MW["Media Workers"]:::service
DB["Post Metadata DB"]:::data
App --> GW
GW --> US
US --> S3
US --> DB
US --> Q
Q --> MW
MW --> S3
classDef client fill:#4c3a5e,stroke:#818cf8,color:#e2e8f0
classDef edge fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
classDef service fill:#1a3a2a,stroke:#4ade80,color:#e2e8f0
classDef async fill:#3b1f5e,stroke:#c084fc,color:#e2e8f0
classDef data fill:#3b3520,stroke:#fbbf24,color:#e2e8f0
| Color | Meaning |
|---|---|
| 🟠 Purple | Client |
| 🔵 Blue | Edge / Gateway |
| 🟢 Green | Service |
| 🟣 Purple | Async (Queue / Kafka) |
| 🟡 Yellow | Data store |
Step-by-step flow:
- User selects photo, adds caption, hits “Share” → app uploads file via multipart POST to Gateway
- Gateway authenticates user, checks file size limits (max 50MB), streams file to Upload Service
- Upload Service generates a unique
mediaId, uploads original to S3 at pathoriginals/{userId}/{mediaId}.jpg - Upload Service writes post metadata to DB (postId, userId, caption, mediaId, status=PROCESSING)
- Upload Service publishes a
media.uploadedevent to Processing Queue with mediaId and S3 path - User gets
201 Createdwith postId — upload is confirmed, processing hasn’t started yet - Media Worker picks up the job: downloads original from S3, generates 4 size variants, converts to WebP, uploads variants to S3 at
processed/{mediaId}/{size}.webp - Worker updates post status to
PUBLISHEDand triggers feed fan-out
FR2: View Personalized Feed
Feed is the core experience. When a user opens Instagram, they need to see recent posts from people they follow, ranked by relevance. The challenge: a user following 500 people needs their feed assembled from 500 sources.
Two approaches: fan-out-on-write (pre-compute everyone’s feed when a post is created) vs fan-out-on-read (assemble the feed on demand). We use a hybrid — borrowing from Twitter and Instagram’s actual approach.
New components:
- Feed Service — Serves feed requests. Reads from pre-computed feed store for normal users, merges in celebrity posts on-read.
- Fan-out Service — When a post is published, pushes the postId to all followers’ feeds (Cassandra). Skips celebrities (>500K followers).
- Feed Store (Cassandra) — Each user has a feed partition: sorted list of postIds. Feed Service reads top N.
- CDN — Serves actual images. Feed Service returns URLs; the app fetches images from CDN edge nodes.
- Redis Feed Cache — Caches the top 200 posts for active users. Avoids hitting Cassandra on every scroll.
flowchart LR
App["Mobile App"]:::client
GW["Gateway"]:::edge
FS["Feed Service"]:::service
FC["Redis Feed Cache"]:::data
CASS["Cassandra Feed Store"]:::data
CDN["CDN Edge"]:::edge
S3["S3 Images"]:::data
FO["Fan-out Service"]:::service
KF["Kafka"]:::async
App --> GW
GW --> FS
FS --> FC
FS --> CASS
App --> CDN
CDN --> S3
KF --> FO
FO --> CASS
classDef client fill:#4c3a5e,stroke:#818cf8,color:#e2e8f0
classDef edge fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
classDef service fill:#1a3a2a,stroke:#4ade80,color:#e2e8f0
classDef async fill:#3b1f5e,stroke:#c084fc,color:#e2e8f0
classDef data fill:#3b3520,stroke:#fbbf24,color:#e2e8f0
Step-by-step flow:
- User opens app →
GET /feed?cursor=&limit=20hits Feed Service - Feed Service checks Redis cache for user’s feed. Cache hit → return immediately
- Cache miss → query Cassandra feed partition for user (SELECT postIds WHERE userId=X ORDER BY timestamp DESC LIMIT 20)
- Feed Service enriches postIds with metadata (author name, caption, like count) from Post Metadata DB
- For celebrities the user follows (pre-flagged in social graph), Feed Service fetches their recent posts on-the-fly and merges into the sorted feed
- Response includes CDN URLs for each image variant (thumbnail for preview, full-res for detail view)
- App renders feed; each image
<img src>points to CDN edge → CDN serves from cache or fetches from S3 origin
Why hybrid fan-out?
Pure fan-out-on-write: when a celebrity with 100M followers posts, we’d write 100M rows to Cassandra. That’s 100M writes per post — expensive and slow. Instead, we skip fan-out for celebrities and merge their posts at read time. This is the “celebrity problem” fix.
FR3: Follow and Unfollow Users
The social graph drives everything — feed generation, suggestions, notifications. When user A follows user B, we need to update the graph and backfill A’s feed with B’s recent posts.
New components:
- Social Graph Service — Manages follow/unfollow operations. Stores bidirectional edges (A follows B, B is followed by A).
- Graph Store (Redis Sets) —
following:{userId}= set of users they follow.followers:{userId}= set of their followers. O(1) membership check. - Feed Backfill Worker — When A follows B, fetches B’s last 10 posts and inserts into A’s feed in Cassandra.
flowchart LR
App["App"]:::client
GW["Gateway"]:::edge
SGS["Social Graph Service"]:::service
RG["Redis Graph Store"]:::data
DB["Graph DB Backup"]:::data
KF["Kafka"]:::async
BF["Backfill Worker"]:::service
CASS["Feed Store"]:::data
App --> GW
GW --> SGS
SGS --> RG
SGS --> DB
SGS --> KF
KF --> BF
BF --> CASS
classDef client fill:#4c3a5e,stroke:#818cf8,color:#e2e8f0
classDef edge fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
classDef service fill:#1a3a2a,stroke:#4ade80,color:#e2e8f0
classDef async fill:#3b1f5e,stroke:#c084fc,color:#e2e8f0
classDef data fill:#3b3520,stroke:#fbbf24,color:#e2e8f0
Step-by-step flow:
- User A taps “Follow” on User B’s profile →
POST /users/{B}/follow - Social Graph Service adds B to
following:Aset in Redis, adds A tofollowers:Bset - Service persists the edge to durable Graph DB (Postgres) as backup — Redis is fast but volatile
- Service publishes
user.followedevent to Kafka - Backfill Worker consumes event: fetches B’s last 10 posts, inserts postIds into A’s Cassandra feed partition
- A’s next feed refresh shows B’s recent posts mixed in chronologically
- On unfollow: remove from Redis sets, publish
user.unfollowedevent. A lazy cleanup job removes B’s posts from A’s feed (or they just age out naturally)
6.5. Core Flows
Flow 1: Photo Upload End-to-End
sequenceDiagram
participant User
participant GW as API Gateway
participant US as Upload Service
participant S3 as Object Storage
participant DB as Post Metadata DB
participant Queue as Processing Queue
participant MW as Media Worker
participant FO as Fan-out Service
participant CASS as Feed Store
User->>GW: POST /posts (multipart image + caption)
GW->>GW: Auth + rate limit + file size check
GW->>US: Forward upload
US->>S3: Upload original image
S3-->>US: 200 OK (S3 key)
US->>DB: INSERT post (status=PROCESSING)
US-->>User: 201 Created (postId)
US->>Queue: Publish media.uploaded
Queue->>MW: Consume job
MW->>S3: Download original
MW->>MW: Resize to 4 variants + WebP convert
MW->>S3: Upload processed variants
MW->>DB: UPDATE post status=PUBLISHED
MW->>Queue: Publish post.published
Queue->>FO: Consume post.published
FO->>CASS: Write postId to all follower feeds
Non-obvious failure path: If Media Worker crashes mid-processing, the job stays on the queue (visibility timeout). After timeout, another worker picks it up. Idempotent processing (check if variants already exist in S3 before re-generating) prevents duplicates. Posts stuck in PROCESSING > 10 minutes are flagged by a reconciler and re-queued.
Flow 2: Feed Load
sequenceDiagram
participant User
participant FS as Feed Service
participant Redis as Feed Cache
participant CASS as Cassandra
participant Meta as Post Metadata DB
participant CDN
User->>FS: GET /feed?cursor=X&limit=20
FS->>Redis: Check cache (feed:userId:page)
alt Cache hit
Redis-->>FS: Return cached postIds
else Cache miss
FS->>CASS: SELECT postIds WHERE userId=X LIMIT 20
CASS-->>FS: PostIds
FS->>Redis: Cache for 60s
end
FS->>Meta: Batch fetch post metadata
Meta-->>FS: Posts with CDN URLs
FS-->>User: Feed response with image URLs
User->>CDN: Fetch images (parallel)
CDN-->>User: Images from edge cache
Non-obvious failure path: If Cassandra is temporarily down, Feed Service falls back to assembling the feed on-the-fly by querying the social graph (who does this user follow?) and then fetching recent posts from each followed user’s partition. Slower (2-3s) but keeps the app functional.
Post Lifecycle State Machine
stateDiagram-v2
[*] --> UPLOADING : User selects media
UPLOADING --> PROCESSING : Upload complete
PROCESSING --> PUBLISHED : Variants generated
PROCESSING --> FAILED : Worker error
FAILED --> PROCESSING : Retry
PUBLISHED --> ARCHIVED : User deletes
PUBLISHED --> FLAGGED : Moderation trigger
FLAGGED --> REMOVED : Violation confirmed
FLAGGED --> PUBLISHED : Appeal approved
7. Deep Dives
Deep Dive 1: Media Upload and Processing Pipeline
Bad: Process images synchronously during upload — user waits 15+ seconds while server resizes, compresses, and uploads variants. Timeouts on slow connections cause lost uploads.
Good: Accept upload, store original, process asynchronously. Notify user when done. But: single worker processes all images serially — backlog grows during peak hours.
Great: Multi-stage pipeline with auto-scaling worker pools:
- Upload stage: Client uploads to a pre-signed S3 URL directly (bypasses API server entirely for large files). Upload Service just validates and records metadata.
- Processing stage: Worker pool auto-scales based on queue depth. Each worker: download → resize (150, 320, 640, 1080px) → convert to WebP → strip EXIF → upload variants → update DB.
- Optimization: Generate progressive JPEGs so images render top-to-bottom even on slow connections. Store a tiny 20px blurred placeholder (BlurHash) in the post metadata for instant feed skeleton rendering.
flowchart LR
Client["Client"]:::client
PS["Pre-signed URL"]:::edge
S3O["S3 Originals"]:::data
Q["SQS Queue"]:::async
W1["Worker Pool"]:::service
S3P["S3 Processed"]:::data
CDN["CDN"]:::edge
Client --> PS
PS --> S3O
S3O --> Q
Q --> W1
W1 --> S3P
S3P --> CDN
classDef client fill:#4c3a5e,stroke:#818cf8,color:#e2e8f0
classDef edge fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
classDef service fill:#1a3a2a,stroke:#4ade80,color:#e2e8f0
classDef async fill:#3b1f5e,stroke:#c084fc,color:#e2e8f0
classDef data fill:#3b3520,stroke:#fbbf24,color:#e2e8f0
Cost consideration: Processing 100M images/day at 4 variants each = 400M resize operations. GPU-accelerated workers (using libvips, not ImageMagick) cut processing time from 2s to 200ms per image. Auto-scaling down during off-peak saves 60% compute cost.
Deep Dive 2: Feed Generation — Fan-out on Write vs Read
Bad: Fan-out-on-read only — every feed request queries 500 users’ posts, sorts, ranks. At 100M DAU opening feeds simultaneously, this is billions of queries per minute.
Good: Fan-out-on-write — when user posts, push postId to all followers’ feeds (Cassandra write). Feed reads become a single partition scan. But: celebrities with 100M followers generate 100M writes per post.
Great: Hybrid approach (borrowing from Instagram and Twitter):
- Normal users (< 500K followers): Fan-out-on-write. When they post, Fan-out Service writes their postId to all followers’ feed partitions.
- Celebrity users (> 500K followers): Skip fan-out. Feed Service merges their recent posts at read time. Since users follow only ~5-10 celebrities, merging 10 extra queries is acceptable.
- Feed ranking: After assembling candidates, a lightweight ML ranker scores posts by: recency (decay function), engagement signals (likes from mutual friends), content type preference, and relationship strength.
How Fan-out Service handles scale:
- Kafka partitions fan-out events by postId → single consumer per post
- Consumer reads follower list from Redis (SMEMBERS followers:{userId})
- Batches writes to Cassandra (1000 rows per batch, async)
- For 10K followers, fan-out completes in < 2 seconds
- Rate limiter ensures no single post’s fan-out starves others
Deep Dive 3: CDN and Image Optimization
Bad: Serve all images from origin S3 directly — high latency for distant users (300ms+ for cross-continent), massive egress costs, origin overwhelmed.
Good: Put CloudFront/Cloudflare in front of S3 — cache at edge nodes. But: cache misses on first access, no adaptive quality based on connection speed.
Great: Multi-layer CDN strategy with client-driven quality selection:
- Edge caching (CDN): Images cached at 200+ PoPs globally. TTL = 1 year (images are immutable — new upload = new URL). Cache hit ratio > 95% for popular content.
- Client-driven quality: App detects network speed and requests appropriate variant:
cdn.instagram.com/media/{id}/w640.webpvsw1080.webp. Saves bandwidth on slow connections. - Progressive loading: Feed shows BlurHash placeholder instantly → low-res thumbnail loads in 50ms → full resolution lazy-loads as user scrolls.
- Regional origin shields: Secondary cache layer between CDN edge and S3 origin. Reduces origin requests by another 80%.
Cost at scale: Serving 2B users, ~50 images/session, ~200KB avg = 20PB egress/month. CDN with committed-use discount: ~$0.02/GB = $400K/month. Without CDN (direct from S3 at $0.09/GB) = $1.8M/month. CDN pays for itself 4x over.
Deep Dive 4: Celebrity / Hot User Problem
Problem: When a celebrity (100M followers) posts, naive fan-out means 100M Cassandra writes. At 10 celebrity posts/hour, that’s 1B writes/hour just for fan-out — unsustainable.
Bad: Treat celebrities the same as everyone — fan-out to all followers. System collapses under write load.
Good: Skip fan-out entirely for celebrities. Merge their posts at read time. But: feed load latency increases because we now query celebrity posts on every feed request.
Great: Tiered hybrid with intelligent caching:
- Classify users: follower_count > 500K = “celebrity.” Flag in Redis graph store.
- Skip fan-out for celebrities: Their posts go to a special “celebrity posts” store (sharded by celebrityId, sorted by time).
- Feed assembly at read time: Feed Service fetches: (a) user’s pre-computed feed from Cassandra, (b) recent posts from celebrities they follow (max 10 celebrities × 5 posts = 50 posts to merge).
- Cache celebrity feeds aggressively: Redis caches each celebrity’s last 50 posts. Updated on new post. All followers read from same cache — millions of cache hits, one write.
- Pre-warm on post: When celebrity posts, invalidate their Redis cache entry. First reader triggers cache fill; subsequent readers hit cache.
Net effect: Celebrity post = 1 write to celebrity store + 1 cache invalidation. vs. 100M writes with naive fan-out. Read overhead: +5ms per celebrity merge (parallel Redis fetches).
Deep Dive 5: Feed Ranking and Relevance
Problem: Chronological feed shows everything in time order. But users follow 500 people and check the app 5x/day — they miss 80% of content. Need to surface the most relevant posts.
Bad: Pure chronological — users miss important posts from close friends buried under high-frequency posters.
Good: Simple scoring: score = recency_weight * time_decay + engagement_weight * (likes + comments). Better than chronological but doesn’t personalize.
Great: Lightweight ML ranker with candidate generation + ranking stages:
- Candidate generation: Pull 500 candidate posts (pre-computed feed + celebrity merge)
- Feature extraction: For each candidate, compute: time since posted, author-viewer relationship strength (interaction frequency), post engagement velocity (likes/min in first hour), content type match (does viewer prefer photos or videos?)
- Scoring: Simple logistic regression or small neural net predicts P(engagement). Trained offline on historical engagement data. Inference < 10ms for 500 candidates.
- Diversity injection: After ranking, ensure no more than 3 consecutive posts from same author. Mix in “discovery” posts (from friends-of-friends) at 10% ratio.
Why not a huge ML model? Feed ranking runs on every feed load for 500M daily users. At 200M feed loads/day, even 50ms per inference = saturated GPU cluster. Keep the model small (< 1ms inference on CPU). Heavy ML is for offline training, not online serving.
Deep Dive 6: Storage and Data Lifecycle
Problem: 100M photos/day × 4 variants × average 500KB = 200TB new storage per day. At $0.023/GB, that’s $4.6M/month in S3 Standard alone.
Bad: Keep everything in S3 Standard forever — cost grows linearly, unbounded.
Good: Lifecycle policies: move to S3 Infrequent Access after 30 days, Glacier after 1 year.
Great: Intelligent tiering based on access patterns:
- Hot tier (S3 Standard): Posts < 7 days old. 80% of all accesses hit content from the last week.
- Warm tier (S3 IA): Posts 7-90 days old. Occasionally accessed via profile views and search.
- Cold tier (S3 Glacier Instant Retrieval): Posts > 90 days. Rare access but must still serve in < 100ms when profile is scrolled.
- Delete originals: After processed variants are confirmed, delete the original full-res upload (keep only the 1080px max). Saves 40% storage.
- Deduplication: Perceptual hash (pHash) on upload. If near-duplicate exists, store a reference instead of new file. Catches reposts and memes — saves ~15% storage.
Cost after optimization: 200TB/day → 120TB/day (after dedup + original deletion). Tiered storage reduces effective cost from $0.023/GB to ~$0.008/GB average. Monthly storage cost drops from $4.6M to $960K.
7.5. Design Self-Audit
| Question | Answer |
|---|---|
| Dedicated search index? | Not needed for core feed. Explore/discovery (below the line) would use Elasticsearch for hashtag and location search. |
| Stale reads after writes? | User who just posted sees their own post immediately (read-your-writes via write-DB check). Followers see it within 2-5s (fan-out delay). |
| Single points of failure? | Cassandra is multi-node with RF=3. S3 is 11-nines durable. Redis is clustered. Feed Service is stateless, horizontally scaled. |
| Dead-letter / reconciliation? | Failed media processing jobs → DLQ with 3 retries. Reconciler scans PROCESSING posts > 10min. |
| Data freshness across caches? | Feed cache TTL 60s + event-driven invalidation on new post. CDN images are immutable (cache forever). |
| Cost at scale? | S3 tiering + CDN = biggest cost drivers. Covered in Deep Dive 6. Fan-out Cassandra writes are the hot write tier — managed via celebrity exemption. |
8. Final Architecture
flowchart LR
subgraph Clients
MOB["Mobile App"]:::client
WEB["Web App"]:::client
end
subgraph Edge
LB["Load Balancer"]:::edge
GW["API Gateway"]:::edge
CDN["CDN Edge Nodes"]:::edge
end
subgraph Services
US["Upload Service"]:::service
FS["Feed Service"]:::service
SGS["Social Graph Service"]:::service
FO["Fan-out Service"]:::service
MW["Media Workers"]:::service
RANK["Feed Ranker"]:::service
end
subgraph Async
KF["Kafka"]:::async
PQ["Processing Queue"]:::async
end
subgraph Data
S3["S3 Object Store"]:::data
CASS["Cassandra Feed Store"]:::data
PG["Postgres Post Metadata"]:::data
RD["Redis Cluster"]:::data
end
MOB --> LB
WEB --> LB
LB --> GW
MOB --> CDN
WEB --> CDN
CDN --> S3
GW --> US
GW --> FS
GW --> SGS
US --> S3
US --> PG
US --> PQ
PQ --> MW
MW --> S3
MW --> KF
KF --> FO
FO --> CASS
FS --> RD
FS --> CASS
FS --> RANK
SGS --> RD
SGS --> PG
classDef client fill:#4c3a5e,stroke:#818cf8,color:#e2e8f0
classDef edge fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
classDef service fill:#1a3a2a,stroke:#4ade80,color:#e2e8f0
classDef async fill:#3b1f5e,stroke:#c084fc,color:#e2e8f0
classDef data fill:#3b3520,stroke:#fbbf24,color:#e2e8f0
Want a deep dive on Stories (ephemeral content with TTL), Explore page (recommendation engine), or Direct Messages? Drop a comment below 👇
🎯 Key Takeaways
- Hybrid fan-out: write for normal users, read for celebrities (>500K followers)
- CDN + Object Storage for global image delivery — 95%+ cache hit ratio
- Async media pipeline: user doesn’t wait for image processing
- BlurHash placeholders for instant feed skeleton rendering
Related Designs
- Twitter Feed — fan-out patterns and timeline caching
- Notification System — push delivery for likes and follows
- Chat System — real-time messaging infrastructure
💬 Comments