Home › Design Netflix — Video Streaming System Design Interview

Designing Netflix — Video Streaming Platform

⚡ Difficulty: Advanced 🏷️ Topics: Video Encoding Pipeline, CDN, Adaptive Bitrate Streaming, Recommendation Engine, Content Catalog 🏢 Asked at: Netflix, YouTube, Disney+, Amazon Prime Video, Hotstar

1. Understanding the Problem

Netflix is a video streaming platform that lets content creators upload video, encodes it into dozens of format/resolution/bitrate combinations, distributes it globally via CDN, and streams it to users with adaptive quality based on their connection speed. The system serves 200M+ concurrent users, each streaming a unique video at a unique bitrate. The hardest parts: encoding pipeline (converting one upload into 100+ playable variants), CDN distribution (getting the right bits to the right edge node before the user needs them), and recommendations (deciding what to show each user from a catalog of 15,000+ titles).

1.5. Naive First Cut

flowchart LR
    Client["Smart TV App"]:::client
    API["API Server"]:::service
    DB["Postgres DB"]:::data
    Files["File Server"]:::data

    Client --> API
    API --> DB
    API --> Files

    classDef client fill:#4c3a5e,stroke:#818cf8,color:#e2e8f0
    classDef service fill:#1a3a2a,stroke:#4ade80,color:#e2e8f0
    classDef data fill:#3b3520,stroke:#fbbf24,color:#e2e8f0

How this breaks:

Single file server can’t serve 200M concurrent streams — bandwidth alone would be 200 Tbps
No encoding pipeline — raw 4K master files are too large (50GB per hour of content) to stream directly
No adaptive quality — users on 3G get buffer wheels, users on fiber get potato quality
Centralized serving means 500ms+ latency for users far from origin (India streaming from US-West)
No personalization — everyone sees the same homepage, engagement drops
Single DB can’t handle 200M concurrent session lookups + play position tracking

The rest of the doc evolves this into a globally distributed streaming platform with intelligent encoding, edge delivery, and personalized recommendations.

1.7. Prior Art We’re Drawing From

Netflix Open Connect — Netflix’s own CDN: custom hardware appliances (OCAs) deployed inside ISP networks. Serves 95%+ of traffic from within the ISP, eliminating internet transit costs. Pre-positions content overnight based on popularity predictions. (Netflix Open Connect overview)
YouTube Vitess (Video Metadata at Scale) — Sharded MySQL via Vitess for video metadata, serving billions of queries/day. Demonstrates that video catalogs need a horizontally scalable metadata tier. (Vitess.io)
Netflix Zuul (API Gateway) — Edge gateway handling auth, routing, canary deployments, and load shedding for 200M+ users. Shows why a smart edge layer is critical for streaming. (Netflix Zuul blog)
Netflix Cosmos (Encoding Pipeline) — Microservice-based media processing platform replacing the monolithic encoder. Uses per-title encoding to optimize bitrate per scene complexity. (Netflix Tech Blog — Cosmos)
Netflix Recommendations (Two-Stage Ranking) — Candidate generation via collaborative filtering, then re-ranking via deep learning. Row-based homepage layout driven by ML. (Netflix Tech Blog — Recommendations)

2. Functional Requirements

Core (Top 3)

Upload and encode video content — content team uploads master files; system encodes into multiple resolutions, bitrates, and codecs
Stream video with adaptive bitrate — users watch content with quality that adjusts to their bandwidth in real-time
Personalized recommendations — homepage shows titles ranked by predicted relevance for each user

Below the Line

User profiles and parental controls
Offline downloads
Subtitles and multi-language audio
Watch party (synchronized playback)
Content licensing and regional availability

3. Non-Functional Requirements

Core

NFR	Target
Start Playback	Time to first frame < 2 seconds globally
Buffer-free	99% of streams play without rebuffering
Scale	200M+ concurrent streams during peak (Sunday evening)
Availability	99.99% — downtime during prime time is front-page news

Below the Line

Encoding pipeline completes within 4 hours of upload (not user-facing latency)
Recommendation freshness: incorporate new viewing signals within 1 hour
Multi-region disaster recovery for control plane

Technology Choices

Tier	Purpose	Stores	Access Pattern	Primary	Alternatives
Object Storage	Master files and encoded segments	Raw uploads + HLS/DASH segments	Write once, read via CDN	S3	GCS, Azure Blob
CDN	Video segment delivery	Encoded segments cached at edge	Ultra-high throughput reads	Open Connect (custom)	CloudFront, Akamai, Fastly
Content Catalog DB	Title metadata, episodes, licensing	Structured catalog data	Read-heavy, complex queries	Postgres (Vitess-sharded)	CockroachDB, Spanner
User Activity Store	Watch history, progress, ratings	Time-series user events	High-write (play events), read for recs	Cassandra	DynamoDB, ScyllaDB
Encoding Queue	Encoding job orchestration	Job state, dependencies, retries	Workflow orchestration	Temporal or Cadence	Step Functions, Airflow
Recommendation Cache	Pre-computed user recommendations	Ranked title lists per user	High-QPS reads on homepage load	Redis Cluster	Memcached, DynamoDB DAX
Event Bus	User events, encoding events	Streaming events	High-throughput append, multiple consumers	Kafka	Redpanda, Kinesis
Search Index	Title search, genre browse	Catalog text + facets	Full-text + filtered queries	Elasticsearch	OpenSearch, Meilisearch
Session Store	Playback state, DRM tokens	Ephemeral session data	High-QPS read/write	Redis	Memcached

Why a custom CDN (Open Connect), not CloudFront? At Netflix’s scale (15% of global internet traffic during peak), paying per-GB to a CDN provider is prohibitively expensive. Custom hardware appliances (Open Connect Appliances — OCAs) placed inside ISP networks cost $0.001/GB vs $0.02+/GB for commercial CDNs. Saves $1B+/year.

Why Temporal for encoding, not a simple queue? Video encoding is a multi-step workflow: split → encode each resolution → validate → package into HLS/DASH → DRM encrypt → publish. Steps have dependencies, retries, and can take hours. Temporal handles long-running workflows with checkpointing, retry policies, and visibility — far better than chaining SQS queues.

4. Core Entities

Title — movie or series metadata: name, genre, cast, rating, licensing regions
Video Asset — physical encoding: resolution, bitrate, codec, segment manifest
Encoding Job — workflow state for converting a master into playable assets
User Profile — preferences, watch history, ratings, maturity settings
Playback Session — active stream: title, position, quality level, device info
Recommendation — pre-computed ranked list of titles for a user

5. API / System Interface

POST /api/v1/content/upload
  Body: { titleId, masterFileUrl (S3 pre-signed), metadata }
  Response: { encodingJobId, status: "QUEUED", estimatedCompletion }
  Auth: Internal service token (content team only)

GET /api/v1/browse/home?profileId=<id>
  Response: { rows: [{ rowTitle, titles: [{ titleId, thumbnail, matchScore }] }] }
  Note: Personalized homepage with ranked rows

POST /api/v1/playback/start
  Body: { titleId, profileId, deviceType, preferredQuality }
  Response: { sessionId, manifestUrl (HLS/DASH), licenseUrl (DRM), resumePosition }
  Note: Returns manifest URL pointing to CDN

POST /api/v1/playback/heartbeat
  Body: { sessionId, currentPosition, currentBitrate, bufferHealth }
  Response: { continue: true }
  Note: Sent every 10s to track progress and detect abandonment

GET /api/v1/search?q=<query>&filters=genre,year
  Response: { results: [{ titleId, title, matchType, thumbnail }] }

6. High-Level Design

FR1: Upload and Encode Video Content

When a studio delivers a new movie to Netflix, it arrives as a massive master file (ProRes 4K, 50-100GB). We need to convert it into playable formats: multiple resolutions (240p to 4K), multiple bitrates per resolution, multiple codecs (H.264, H.265, VP9, AV1), and package them into streamable segments (HLS for Apple, DASH for everything else). This is a multi-hour pipeline.

New components:

Ingest Service — Receives upload notification, validates master file, creates encoding job. Think of it as the “front desk” for new content.
Encoding Orchestrator (Temporal) — Manages the multi-step encoding workflow. 💡 Temporal is a workflow engine that breaks complex jobs into steps, handles retries, and checkpoints progress. If a step fails after 2 hours of encoding, it retries just that step, not the whole job.
Encoding Workers — GPU-powered machines that do the actual transcoding. Each worker handles one resolution/bitrate variant.
Object Storage (S3) — Stores master files and all encoded segments. Organized as titles/{titleId}/assets/{resolution}_{bitrate}/{segment_000.ts}
Content Catalog DB — Stores metadata about what’s available: which encodings exist, which regions can play it, licensing windows.

flowchart LR
    Studio["Content Studio"]:::external
    IS["Ingest Service"]:::service
    S3M["S3 Masters"]:::data
    ORCH["Temporal Orchestrator"]:::async
    EW["Encoding Workers GPU"]:::service
    S3E["S3 Encoded Segments"]:::data
    CAT["Catalog DB"]:::data

    Studio --> IS
    IS --> S3M
    IS --> ORCH
    ORCH --> EW
    EW --> S3E
    EW --> ORCH
    ORCH --> CAT

    classDef client fill:#4c3a5e,stroke:#818cf8,color:#e2e8f0
    classDef edge fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    classDef service fill:#1a3a2a,stroke:#4ade80,color:#e2e8f0
    classDef async fill:#3b1f5e,stroke:#c084fc,color:#e2e8f0
    classDef data fill:#3b3520,stroke:#fbbf24,color:#e2e8f0
    classDef external fill:#4a1942,stroke:#f472b6,color:#e2e8f0

Color	Meaning
🟠 Purple	Client
🔵 Blue	Edge / Gateway
🟢 Green	Service
🟣 Purple	Async (Workflow / Queue)
🟡 Yellow	Data store
🔴 Pink	External

Step-by-step flow:

Studio uploads master file to a designated S3 bucket (pre-signed URL, resumable upload for 100GB files)
S3 event notification triggers Ingest Service. It validates: correct codec? correct resolution? audio tracks present?
Ingest Service creates an encoding job in Temporal with parameters: target resolutions (240p, 360p, 480p, 720p, 1080p, 4K), codecs (H.264, H.265, AV1), bitrate ladder
Temporal orchestrator splits the job into parallel tasks: one per resolution×codec combination (~20-30 tasks per title)
Each Encoding Worker picks a task: downloads the master (or a pre-split chunk), transcodes to target format using FFmpeg/x265/SVT-AV1, outputs 4-second segments
Workers upload segments to S3 Encoded bucket, report completion to Temporal
After all variants complete, Temporal triggers packaging: generate HLS manifests (.m3u8) and DASH manifests (.mpd) that list all available quality levels
Final step: update Catalog DB marking title as “playable” + push to CDN pre-positioning queue

FR2: Stream Video with Adaptive Bitrate

When a user hits “Play,” we need to start delivering video segments immediately and adapt quality to their bandwidth in real-time. If they’re on WiFi, stream 4K. If they switch to cellular, drop to 720p without buffering.

The key protocol: HLS (HTTP Live Streaming) or DASH (Dynamic Adaptive Streaming over HTTP). 💡 Both work the same way: the video is split into small segments (2-4 seconds each). A manifest file lists all available quality levels. The player fetches segments one at a time, choosing quality level based on current bandwidth.

New components:

Playback Service — Handles play requests. Resolves DRM license, generates manifest URL, records session start.
CDN Edge (Open Connect) — Serves video segments from ISP-local caches. The user’s player fetches segments directly from the nearest OCA.
ABR Client (in-player) — The adaptive bitrate algorithm running on the user’s device. It decides which quality to request next based on buffer level and throughput estimates.
Session Store (Redis) — Tracks active playback sessions: position, quality, buffer health. Used for resume and analytics.

flowchart LR
    TV["Smart TV"]:::client
    GW["API Gateway Zuul"]:::edge
    PS["Playback Service"]:::service
    DRM["DRM License Server"]:::external
    CDN["CDN Open Connect"]:::edge
    S3["S3 Segments"]:::data
    SS["Session Store Redis"]:::data

    TV --> GW
    GW --> PS
    PS --> DRM
    PS --> SS
    TV --> CDN
    CDN --> S3

    classDef client fill:#4c3a5e,stroke:#818cf8,color:#e2e8f0
    classDef edge fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    classDef service fill:#1a3a2a,stroke:#4ade80,color:#e2e8f0
    classDef async fill:#3b1f5e,stroke:#c084fc,color:#e2e8f0
    classDef data fill:#3b3520,stroke:#fbbf24,color:#e2e8f0
    classDef external fill:#4a1942,stroke:#f472b6,color:#e2e8f0

Step-by-step flow:

User hits “Play” on a title → app sends POST /playback/start to Playback Service
Playback Service checks: Is this title available in user’s region? Does user have an active subscription? What was their last watch position (resume)?
Service requests a DRM license from the License Server (Widevine for Android/Chrome, FairPlay for Apple, PlayReady for Windows)
Service returns: manifest URL (pointing to CDN), DRM license, resume position
Player downloads the manifest from CDN — it lists all quality levels (e.g., 240p@300kbps, 480p@1.5Mbps, 1080p@5Mbps, 4K@15Mbps)
ABR algorithm starts conservative: first segment at medium quality (480p). Measures download speed.
If segment downloaded faster than real-time → next segment at higher quality. If slower → drop quality.
Player fetches segments sequentially from CDN edge. CDN cache hit → instant. Cache miss → pull from S3 origin, cache locally.
Every 10 seconds, player sends heartbeat with current position + bitrate → Session Store updates.

FR3: Personalized Recommendations

When users open Netflix, they see a homepage with rows of titles (“Because you watched X,” “Trending Now,” “Top 10 in India”). Each user’s homepage is different. Recommendation quality directly drives engagement and retention — Netflix estimates their rec system is worth $1B/year in reduced churn.

New components:

Recommendation Service — Serves pre-computed recommendations for a user profile. Returns ranked rows of titles.
ML Recommendation Pipeline (offline) — Batch job that runs every few hours: takes all user viewing history, computes collaborative filtering + content-based signals, generates ranked lists per user.
Event Collector (Kafka) — Ingests real-time user events (play, pause, skip, rate, browse). Feeds both the offline pipeline and real-time signals.
Recommendation Cache (Redis) — Stores pre-computed homepage for each active user. Homepage load = one Redis GET.
Content Catalog + Search (Elasticsearch) — Powers the search bar and genre browsing with full-text + faceted search.

flowchart LR
    TV["Client App"]:::client
    GW["Gateway"]:::edge
    RS["Recommendation Service"]:::service
    RC["Recommendation Cache"]:::data
    KF["Kafka Events"]:::async
    ML["ML Pipeline Spark"]:::service
    ES["Elasticsearch"]:::data
    CAT["Catalog DB"]:::data

    TV --> GW
    GW --> RS
    RS --> RC
    RS --> CAT
    TV --> GW
    GW --> ES
    KF --> ML
    ML --> RC

    classDef client fill:#4c3a5e,stroke:#818cf8,color:#e2e8f0
    classDef edge fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    classDef service fill:#1a3a2a,stroke:#4ade80,color:#e2e8f0
    classDef async fill:#3b1f5e,stroke:#c084fc,color:#e2e8f0
    classDef data fill:#3b3520,stroke:#fbbf24,color:#e2e8f0

Step-by-step flow:

User opens Netflix app → GET /browse/home?profileId=X hits Recommendation Service
Service checks Redis cache for pre-computed homepage (key: recs:{profileId})
Cache hit (99% of cases) → return immediately. Homepage loads in < 200ms.
Cache miss (new user or cache expired) → fall back to popularity-based defaults while async job computes personalized recs
Meanwhile, every user action (play, pause at 5 min, finish, skip after 30s, add to list) → published to Kafka as events
ML Pipeline (runs on Spark every 4 hours): consumes all events, updates user embeddings (collaborative filtering), combines with content features (genre, cast, director), generates ranked homepage per user
Pipeline writes results to Redis cache. Next homepage load picks up fresh recommendations.
Search queries hit Elasticsearch: full-text on title/cast/director + filters (genre, year, rating)

6.5. Core Flows

Flow 1: Video Playback Start-to-Stream

sequenceDiagram
    participant User
    participant App as Player App
    participant GW as API Gateway
    participant PS as Playback Service
    participant DRM as DRM Server
    participant CDN as CDN Edge
    participant S3

    User->>App: Tap "Play"
    App->>GW: POST /playback/start
    GW->>PS: Route request
    PS->>PS: Check subscription + region + resume position
    PS->>DRM: Request license (Widevine or FairPlay)
    DRM-->>PS: License token
    PS-->>App: manifestUrl + license + resumeAt
    App->>CDN: GET manifest.m3u8
    CDN-->>App: Manifest (all quality levels)
    App->>App: ABR selects initial quality (480p)
    App->>CDN: GET segment_001_480p.ts
    alt CDN cache hit
        CDN-->>App: Segment (from edge cache)
    else CDN cache miss
        CDN->>S3: Fetch segment
        S3-->>CDN: Segment
        CDN-->>App: Segment (cached for next user)
    end
    App->>App: Decode + render first frame
    Note over User,App: Time to first frame target < 2s
    loop Every segment
        App->>App: Measure throughput
        App->>CDN: GET next segment at adapted quality
    end

Non-obvious failure path: If CDN edge is overloaded (flash crowd for a new release), the player’s ABR algorithm detects slow segment downloads and drops quality aggressively. If even the lowest quality buffers, the player switches to a different CDN PoP (DNS-based failover). Playback Service also maintains a “steering” endpoint that can redirect players away from saturated edges.

Flow 2: Encoding Pipeline

sequenceDiagram
    participant Studio
    participant IS as Ingest Service
    participant S3
    participant Temporal
    participant Worker as Encoding Worker
    participant CAT as Catalog DB
    participant CDN as CDN Pre-position

    Studio->>S3: Upload master file
    S3->>IS: S3 event notification
    IS->>IS: Validate format + metadata
    IS->>Temporal: Create encoding workflow
    Temporal->>Temporal: Split into parallel tasks
    loop For each resolution x codec
        Temporal->>Worker: Assign encoding task
        Worker->>S3: Download master chunk
        Worker->>Worker: Transcode (FFmpeg or SVT-AV1)
        Worker->>S3: Upload encoded segments
        Worker->>Temporal: Report completion
    end
    Temporal->>Temporal: All variants complete
    Temporal->>Temporal: Generate HLS and DASH manifests
    Temporal->>S3: Upload manifests
    Temporal->>CAT: Mark title as PLAYABLE
    Temporal->>CDN: Trigger pre-positioning for predicted-popular regions

Non-obvious failure path: If an encoding worker crashes mid-transcode (GPU OOM, spot instance reclaimed), Temporal automatically retries the failed task on a different worker. The worker resumes from the last completed segment (checkpointed in Temporal state). Partial segments in S3 are overwritten. The overall workflow only fails if a single task exceeds 5 retries — then it’s flagged for manual review.

Playback Session State Machine

stateDiagram-v2
    [*] --> INITIALIZING : Play pressed
    INITIALIZING --> BUFFERING : Manifest received
    BUFFERING --> PLAYING : Buffer full enough
    PLAYING --> BUFFERING : Buffer depleted
    PLAYING --> PAUSED : User pauses
    PAUSED --> PLAYING : User resumes
    PLAYING --> SEEKING : User scrubs timeline
    SEEKING --> BUFFERING : Seek position set
    PLAYING --> ENDED : Reached end
    PLAYING --> ERROR : Network failure
    ERROR --> BUFFERING : Retry succeeds
    ERROR --> ENDED : Max retries exceeded

7. Deep Dives

Deep Dive 1: Video Encoding Pipeline — Per-Title Encoding

Bad: Fixed bitrate ladder — encode every title at the same bitrates (e.g., 1080p always at 5Mbps). Animated movies (simple scenes) waste bits; action movies (complex scenes) look blocky.

Good: Per-resolution encoding — analyze content complexity and assign optimal bitrate per resolution. “My Little Pony” at 1080p needs only 2Mbps; “Mad Max” needs 8Mbps. But: analyzing the entire movie adds encoding time.

Great: Per-shot encoding (borrowing from Netflix Cosmos):

Scene detection: Split video into shots (scene changes detected via frame difference). Each shot gets its own optimal encoding parameters.
Convex hull optimization: For each shot, encode at multiple bitrate/quality points. Plot quality (VMAF score) vs bitrate. Find the convex hull — the set of points that gives maximum quality per bit.
Bitrate allocation: Given a target average bitrate for the stream, allocate more bits to complex shots and fewer to simple shots. Result: consistent visual quality throughout.
Codec selection: Encode each title in H.264 (compatibility), H.265 (50% more efficient, most devices), and AV1 (30% more efficient than H.265, newer devices). Client picks best codec their device supports.

flowchart LR
    Master["Master File"]:::data
    SD["Scene Detector"]:::service
    CH["Convex Hull Analyzer"]:::service
    ENC["Parallel Encoders"]:::service
    PKG["Packager HLS and DASH"]:::service
    S3["S3 Segments"]:::data

    Master --> SD
    SD --> CH
    CH --> ENC
    ENC --> PKG
    PKG --> S3

    classDef service fill:#1a3a2a,stroke:#4ade80,color:#e2e8f0
    classDef data fill:#3b3520,stroke:#fbbf24,color:#e2e8f0

Cost consideration: Per-shot encoding increases compute time by 3x vs fixed-ladder. But savings in storage and bandwidth (20-30% smaller files at same quality) pay back within weeks for popular titles. Netflix encodes each new title once; it’s streamed millions of times.

Deep Dive 2: CDN Architecture — Open Connect

Bad: Use a commercial CDN (CloudFront, Akamai) — at Netflix scale (15% of global internet traffic), per-GB pricing is ruinous ($0.02/GB × exabytes/month = billions/year).

Good: Build your own CDN with PoPs in major metros. Better cost, but ISP peering still adds latency and transit costs.

Great: Deploy custom appliances (OCAs) directly inside ISP networks:

Open Connect Appliances (OCAs): Custom servers with 100+ TB SSD storage + 100Gbps NIC. Deployed inside ISP data centers (Comcast, Vodafone, Jio, etc.).
Pre-positioning (proactive caching): Every night, a popularity prediction model identifies which titles will be watched tomorrow in each region. Those titles are pushed to local OCAs overnight (off-peak bandwidth is nearly free).
Fill strategy: Cache miss on an OCA → fetch from a parent OCA (regional hub) → only if parent misses → fetch from S3 origin. Three-tier hierarchy minimizes origin load.
Steering: Control plane selects the best OCA for each client based on: network proximity, current load, content availability. Uses DNS-based and HTTP redirect steering.

Result: 95%+ of bytes served from within the user’s ISP network. Latency < 5ms for segment fetch. ISPs benefit too (no transit costs for Netflix traffic), so they deploy OCAs for free.

Cache eviction: LRU with popularity weighting. A title watched by 1000 users/day stays cached over one watched by 10 users/day, even if the latter was accessed more recently.

Deep Dive 3: Adaptive Bitrate Streaming (ABR Algorithms)

Bad: Fixed quality — user selects “HD” and it stays there. When bandwidth drops, video buffers for 30 seconds while next segment downloads.

Good: Simple throughput-based ABR — measure download speed of last segment, choose next quality that fits within that bandwidth. But: highly reactive, causes frequent quality oscillation (480p → 1080p → 480p every few seconds). Jarring user experience.

Great: Buffer-based ABR with throughput smoothing (borrowing from Netflix’s practical ABR research):

Buffer-Based (BBA): Decision is driven by buffer level, not just throughput. If buffer is full (30s ahead), be aggressive (higher quality). If buffer is low (< 5s), be conservative (lower quality). Smooth transitions.
Throughput estimation: Use harmonic mean of last 5 segment downloads (not arithmetic mean — resists outliers from temporary spikes).
Startup optimization: During initial buffering, start at lowest quality for the first 2 segments (fast start, reduces time-to-first-frame). Then ramp up aggressively once buffer grows.
Quality lock: Once at a stable quality for 30+ seconds, don’t drop unless buffer is critically low. Prevents flicker.

Pseudocode for ABR decision:
  buffer_level = current_buffer_seconds
  throughput = harmonic_mean(last_5_segments_speed)
  
  if buffer_level > 30s:
      quality = highest where bitrate < throughput * 0.9
  elif buffer_level > 10s:
      quality = current_quality  (hold steady)
  elif buffer_level > 5s:
      quality = max(current_quality - 1, lowest)
  else:
      quality = lowest  (emergency drop)

Why not server-side ABR? The client knows its actual buffer state and network conditions in real-time. Server can’t know if the user is on WiFi or just entered a tunnel. Client-side ABR reacts in < 100ms; server-side would add round-trip latency to every quality decision.

Deep Dive 4: Recommendation Engine

Bad: Show everyone the same “Top 10” list — no personalization, engagement drops, churn increases.

Good: Collaborative filtering only — “users who watched X also watched Y.” Works well for popular content but fails for niche titles and new users (cold start problem).

Great: Two-stage pipeline with hybrid signals:

Stage 1 — Candidate Generation (offline, runs every 4 hours on Spark):

Collaborative filtering: Matrix factorization on user-item interaction matrix. Each user and title gets a 128-dimension embedding. Similarity in embedding space = likely interest.
Content-based: Encode title features (genre, director, cast, keywords, avg shot length) into embeddings. Match against user preference vector.
Output: 1000 candidate titles per user (from 15K catalog).

Stage 2 — Ranking (near-real-time, per request):

Features: Candidate title embedding + user context (time of day, device, recent watches, account age).
Model: Lightweight neural net (2-layer MLP) predicting P(watch > 70% of title).
Inference: < 10ms for 1000 candidates on CPU. Results sorted by score.
Row assembly: Group ranked titles into themed rows (“Because you watched Stranger Things,” “Trending,” “New Releases”). Each row is a retrieval source.

flowchart LR
    Events["User Events Kafka"]:::async
    Spark["Spark Pipeline"]:::service
    CF["Collaborative Filter"]:::service
    CB["Content-Based"]:::service
    CAND["Candidates 1K per user"]:::data
    Ranker["Neural Ranker"]:::service
    Cache["Redis Rec Cache"]:::data

    Events --> Spark
    Spark --> CF
    Spark --> CB
    CF --> CAND
    CB --> CAND
    CAND --> Ranker
    Ranker --> Cache

    classDef service fill:#1a3a2a,stroke:#4ade80,color:#e2e8f0
    classDef async fill:#3b1f5e,stroke:#c084fc,color:#e2e8f0
    classDef data fill:#3b3520,stroke:#fbbf24,color:#e2e8f0

Cold start (new user): First session uses onboarding quiz (pick 3 genres you like) + popularity-based defaults. After 5+ interactions, collaborative filtering kicks in.

Freshness vs compute tradeoff: Full pipeline rerun every 4 hours. But: real-time signals (user just finished a horror movie → boost horror in next homepage load) are injected via a lightweight “re-ranker” that adjusts cached scores using the last 30 minutes of activity.

Deep Dive 5: Content Pre-positioning and Cache Warming

Problem: When a new blockbuster drops (Stranger Things Season 5), millions of users hit play within minutes. If the content isn’t pre-cached on edge nodes, all requests slam the origin — massive latency spike.

Bad: Reactive caching — first user in each region triggers origin fetch. For a 2-hour movie at 4K, that’s a 20GB pull per OCA on first play. Multiply by 10,000 OCAs = 200TB origin egress in one hour.

Good: Push content to all OCAs 24 hours before release. But: OCA storage is limited (100TB). Pushing everything everywhere is wasteful — content that’s popular in India may get 0 plays in Brazil.

Great: Predictive pre-positioning with regional popularity models:

Popularity prediction: ML model trained on: historical premiere viewership, pre-release engagement (trailers watched, “remind me” clicks), genre popularity per region, time of year, marketing spend.
Regional allocation: Model predicts views-per-region for the first 48 hours. Content pushed to OCAs proportionally — more copies in high-demand regions, fewer in low-demand.
Fill during off-peak: Transfers happen 2AM-6AM local time when ISP bandwidth is idle. OCAs pull from regional hubs (not origin directly).
Dynamic rebalancing: After release, actual viewership data feeds back. If Brazil has unexpected demand, nearby OCAs serve while additional copies propagate.
Tiered encoding push: Push only the most popular bitrates first (1080p H.265, 720p H.264). Niche formats (4K AV1) stay at regional hubs until demanded.

Result: For a major premiere, 99%+ of first-play requests hit warm OCA cache. Origin egress stays flat regardless of demand spike.

Deep Dive 6: Playback Reliability and Error Resilience

Problem: Users are watching on unreliable networks (cellular, congested WiFi, train tunnels). Any interruption > 2 seconds and they close the app.

Bad: Single CDN endpoint — if it goes down, playback stops entirely.

Good: Retry on same CDN with exponential backoff. But: if the problem is the CDN node itself (hardware failure, network partition), retries never succeed.

Great: Multi-level resilience with fast failover:

Segment-level retry: If a segment fetch fails or times out (3s), immediately retry from a different CDN PoP (player knows 3+ candidate OCAs from steering).
Quality downshift on repeated failure: If 2 consecutive segments fail at current quality, drop to lowest bitrate (which has smaller segments, more likely to succeed on degraded network).
Buffer-based tolerance: ABR algorithm maintains a 30-second buffer when healthy. This “runway” absorbs network glitches — user doesn’t notice a 5-second outage if buffer covers it.
Session resume: If player crashes or device sleeps, resume from exact position (stored in Session Store every 10s). User re-opens app → continues from where they left off, no buffering.
CDN steering updates: Every 30 seconds, player pings the steering service for updated CDN PoP rankings. If their current OCA is degraded, next segment fetches from a better one.

Backstop: If all CDN PoPs in a region fail (rare, ISP-level outage), players fall back to fetching from a regional hub OCA in a neighboring ISP. Higher latency (50ms vs 5ms) but playback continues.

7.5. Design Self-Audit

Question	Answer
Dedicated search index?	✅ Elasticsearch for title/cast/genre search. Covered in FR3 components.
Stale reads after writes?	Encoding: title marked PLAYABLE only after all variants confirmed. Recs: 4-hour staleness acceptable for homepage; real-time re-ranker handles short-term signals.
Single points of failure?	CDN is massively distributed (10K+ OCAs). Catalog DB is sharded (Vitess). Temporal orchestrator has its own HA (multiple workers, durable state). Playback Service is stateless behind LB.
Dead-letter / reconciliation?	Encoding: failed tasks retry 5x in Temporal, then flag for review. Playback: heartbeat timeout after 5 minutes marks session as abandoned (for analytics).
Data freshness across caches?	Rec cache refreshed every 4 hours + real-time re-ranker. CDN content is immutable (new encoding = new URL). Session store is real-time (Redis).
Cost at scale?	CDN (Open Connect): dominant cost, mitigated by ISP partnerships. Encoding: GPU compute is expensive but one-time per title. Storage: tiered S3 (originals archived to Glacier after encoding). Rec pipeline: Spark cluster — scales with user base.

8. Final Architecture

flowchart LR
    subgraph Clients
        TV["Smart TV"]:::client
        MOB["Mobile App"]:::client
        WEB["Web Browser"]:::client
    end

    subgraph Edge
        DNS["DNS Steering"]:::edge
        GW["API Gateway Zuul"]:::edge
        CDN["CDN Open Connect OCAs"]:::edge
    end

    subgraph Services
        PS["Playback Service"]:::service
        RS["Recommendation Service"]:::service
        IS["Ingest Service"]:::service
        SS["Search Service"]:::service
    end

    subgraph Async and ML
        KF["Kafka"]:::async
        TMP["Temporal Orchestrator"]:::async
        EW["Encoding Workers GPU"]:::service
        ML["ML Pipeline Spark"]:::service
    end

    subgraph Data
        S3["S3 Video Segments"]:::data
        CAT["Catalog DB Vitess"]:::data
        CASS["Cassandra User Activity"]:::data
        RD["Redis Cluster"]:::data
        ES["Elasticsearch"]:::data
    end

    subgraph External
        DRM["DRM Widevine FairPlay"]:::external
    end

    TV --> DNS
    MOB --> DNS
    WEB --> DNS
    DNS --> GW
    DNS --> CDN
    GW --> PS
    GW --> RS
    GW --> SS
    PS --> DRM
    PS --> RD
    RS --> RD
    RS --> CAT
    SS --> ES
    CDN --> S3
    IS --> TMP
    TMP --> EW
    EW --> S3
    EW --> TMP
    TMP --> CAT
    KF --> ML
    ML --> RD
    TV --> CDN
    MOB --> CDN
    WEB --> CDN
    CASS --> KF

    classDef client fill:#4c3a5e,stroke:#818cf8,color:#e2e8f0
    classDef edge fill:#1e3a5f,stroke:#60a5fa,color:#e2e8f0
    classDef service fill:#1a3a2a,stroke:#4ade80,color:#e2e8f0
    classDef async fill:#3b1f5e,stroke:#c084fc,color:#e2e8f0
    classDef data fill:#3b3520,stroke:#fbbf24,color:#e2e8f0
    classDef external fill:#4a1942,stroke:#f472b6,color:#e2e8f0

Want a deep dive on DRM (content protection), offline downloads, multi-language audio switching, or live streaming (sports events)? Drop a comment below 👇

🎯 Key Takeaways

Per-shot encoding optimizes bitrate per scene complexity — saves 20-30% bandwidth
Open Connect CDN inside ISPs serves 95% of traffic locally
Adaptive Bitrate (ABR) driven by buffer level, not just throughput
Two-stage recommendation: collaborative filtering (offline) → neural ranker (online)

URL Shortener — CDN caching patterns and redirect optimization
Chat System — real-time delivery and session management
Notification System — push notification infrastructure

Designing Netflix — Video Streaming Platform

1. Understanding the Problem

1.5. Naive First Cut

1.7. Prior Art We’re Drawing From

2. Functional Requirements

Core (Top 3)

Below the Line

3. Non-Functional Requirements

Core

Below the Line

Technology Choices

4. Core Entities

5. API / System Interface

6. High-Level Design

FR1: Upload and Encode Video Content

FR2: Stream Video with Adaptive Bitrate

FR3: Personalized Recommendations

6.5. Core Flows

Flow 1: Video Playback Start-to-Stream

Flow 2: Encoding Pipeline

Playback Session State Machine

7. Deep Dives

Deep Dive 1: Video Encoding Pipeline — Per-Title Encoding

Deep Dive 2: CDN Architecture — Open Connect

Deep Dive 3: Adaptive Bitrate Streaming (ABR Algorithms)

Deep Dive 4: Recommendation Engine

Deep Dive 5: Content Pre-positioning and Cache Warming

Deep Dive 6: Playback Reliability and Error Resilience

7.5. Design Self-Audit

8. Final Architecture

🎯 Key Takeaways

Related Designs

💬 Comments