System Design Fundamentals
Everything you need to know before tackling specific HLD problems. These are the building blocks every design uses.
Scalability
Vertical scaling (scale up): Add more CPU/RAM to one machine. Simple but has a ceiling.
Horizontal scaling (scale out): Add more machines. Harder (state management, coordination) but practically unlimited.
| Β | Vertical | Horizontal |
|---|---|---|
| Cost | Expensive hardware | Cheap commodity servers |
| Limit | Hardware ceiling | Practically unlimited |
| Complexity | Low | High (distributed state) |
| Downtime | Requires restart | Zero-downtime rolling |
| When to use | DB primary, cache | Stateless services, web tier |
Load Balancing
Distributes traffic across multiple servers so no single server gets overloaded.
Where it sits:
Client β Load Balancer β Server 1
β Server 2
β Server 3
Algorithms:
| Algorithm | How it works | Best for |
|---|---|---|
| Round Robin | Rotate through servers 1β2β3β1ββ¦ | Equal-capacity servers |
| Weighted Round Robin | More traffic to stronger servers | Mixed hardware |
| Least Connections | Send to server with fewest active requests | Long-lived connections |
| IP Hash | Same client always hits same server | Session stickiness |
| Random | Pick randomly | Simple, surprisingly effective |
L4 vs L7:
- L4 (Transport): Routes based on IP/port. Fast, no content inspection. (e.g., AWS NLB)
- L7 (Application): Routes based on URL, headers, cookies. Smarter, slightly slower. (e.g., AWS ALB, Nginx)
Caching
Store frequently accessed data closer to the consumer. Trades freshness for speed.
Where to cache:
Client β CDN β API Gateway Cache β Application Cache β Database
β β β
static response-level object-level
assets (full response) (query results)
Cache strategies:
| Strategy | How | Best for |
|---|---|---|
| Cache-Aside | App checks cache, misses β read DB β write cache | General purpose, most common |
| Write-Through | Write to cache + DB together | Strong consistency needs |
| Write-Behind | Write to cache, async flush to DB later | High write throughput |
| Read-Through | Cache itself fetches from DB on miss | Simpler app code |
Cache eviction policies:
- LRU (Least Recently Used) β evict what hasnβt been touched longest. Most common.
- LFU (Least Frequently Used) β evict whatβs been used least overall.
- TTL (Time To Live) β expire after N seconds regardless of access.
Cache invalidation (the hard problem):
- TTL-based: simple but stale window exists
- Event-based: DB change β invalidate cache entry. Consistent but complex.
- Versioning: key includes version number. New version = cache miss.
Tools: Redis, Memcached, CDN (CloudFront, Cloudflare), local in-process (Caffeine, Guava)
Database Concepts
SQL vs NoSQL
| Β | SQL (Postgres, MySQL) | NoSQL (DynamoDB, Cassandra, MongoDB) |
|---|---|---|
| Schema | Fixed, enforced | Flexible, schema-on-read |
| Relationships | Joins, foreign keys | Denormalized, no joins |
| Scale | Vertical (hard to shard) | Horizontal (built for it) |
| Consistency | Strong (ACID) | Tunable (eventual to strong) |
| Best for | Transactions, complex queries | High throughput, simple access patterns |
Database Replication
Primary-Replica: One primary handles writes. Replicas handle reads. Read-heavy workloads scale horizontally.
Writes β Primary DB ββreplicatesβββ Replica 1 (reads)
β Replica 2 (reads)
β Replica 3 (reads)
Replication lag: Replicas might be a few ms behind primary. If you write then immediately read from a replica, you might not see your write. Solutions: read-your-writes consistency, sticky sessions to primary after write.
Database Sharding (Partitioning)
Split data across multiple databases by a shard key.
User ID 1-1M β Shard 1
User ID 1M-2M β Shard 2
User ID 2M-3M β Shard 3
Shard key choice is critical:
- Bad key (e.g., country) β one shard gets 80% of data (hot shard)
- Good key (e.g., user_id hash) β even distribution
Problems with sharding:
- Cross-shard queries are expensive
- Resharding (adding shards) is painful
- Transactions across shards are very hard
CAP Theorem
In a distributed system, you can only guarantee 2 of 3:
- C (Consistency): Every read sees the latest write
- A (Availability): Every request gets a response (might be stale)
- P (Partition Tolerance): System works despite network splits
In practice: Network partitions WILL happen. So you choose between:
- CP: Consistent + Partition-tolerant. During partition, some requests fail. (e.g., ZooKeeper, HBase)
- AP: Available + Partition-tolerant. During partition, you might read stale data. (e.g., Cassandra, DynamoDB)
Most systems are AP with tunable consistency β you choose per-operation whether you need strong or eventual consistency.
Consistency Models
| Model | Guarantee | Example |
|---|---|---|
| Strong | Read always sees latest write | Single-node DB, ZooKeeper |
| Eventual | Read will eventually see latest write | DynamoDB (default), Cassandra |
| Read-your-writes | YOU see your own writes immediately; others might not | Social media feeds |
| Causal | If A caused B, everyone sees A before B | Chat messages |
Message Queues
Decouple producers from consumers. Enable async processing.
Producer β Queue β Consumer
β
(buffer, retry, ordering)
Why use queues:
- Producer doesnβt wait for consumer (async)
- Consumer can be offline; messages wait
- Spike absorption (queue buffers during bursts)
- Retry on failure (message stays until acked)
Delivery guarantees:
- At-most-once: Fire and forget. Message might be lost. Fast.
- At-least-once: Retry until acked. Message might be delivered twice. Most common.
- Exactly-once: Hard to achieve. Usually βat-least-once + idempotent consumer.β
Tools: Kafka (log-based, ordered, high throughput), SQS (simple queue, managed), RabbitMQ (routing, priority)
Kafka vs SQS:
| Β | Kafka | SQS |
|---|---|---|
| Ordering | Per-partition guaranteed | FIFO queue or best-effort |
| Retention | Days/weeks (replay possible) | 14 days max, once consumed gone |
| Throughput | Millions/sec | Thousands/sec |
| Consumer model | Pull (consumer controls pace) | Pull (long-polling) |
| Use case | Event streaming, log aggregation | Task queues, decoupling |
API Design
REST
GET /users/123 β fetch user
POST /users β create user
PUT /users/123 β replace user
PATCH /users/123 β partial update
DELETE /users/123 β delete user
Key principles:
- Stateless (no server-side session)
- Resource-based URLs (nouns, not verbs)
- HTTP verbs for actions
- Proper status codes (200, 201, 400, 404, 500)
Rate Limiting
Protect services from abuse or thundering herds.
Algorithms:
- Token Bucket: Bucket fills at constant rate. Each request takes a token. Empty = rejected.
- Sliding Window: Count requests in last N seconds. Exceed limit = rejected.
- Fixed Window: Count per time window (e.g., per minute). Simpler but bursty at boundaries.
CDN (Content Delivery Network)
Cache static content at edge locations close to users.
User in India β CDN edge in Mumbai (cache hit) β fast!
β (cache miss)
Origin server in US β slow, but CDN caches for next time
What to put on CDN: Images, CSS, JS, videos, static HTML, API responses (with TTL)
Tools: CloudFront, Cloudflare, Fastly, Akamai
Consistent Hashing
Problem: you have N cache servers. hash(key) % N works until you add/remove a server β then ALL keys remap.
Consistent hashing: Only K/N keys remap when a server is added/removed.
How: place servers on a ring (0 to 2^32). Hash the key β walk clockwise β first server you hit owns that key. Adding a server only steals keys from its clockwise neighbor.
Used in: DynamoDB, Cassandra, Redis Cluster, load balancers
Idempotency
An operation is idempotent if doing it 1 time or N times produces the same result.
Why it matters: In distributed systems, retries happen. If βcharge $10β is retried, you donβt want to charge $20.
How to achieve:
- Client sends a unique
idempotency-keywith each request - Server checks: βdid I already process this key?β β yes: return cached result, no: process and store result
- Key stored with TTL (e.g., 24h)
Examples:
PUT /users/123 {name: "Alice"}β naturally idempotent (same result every time)POST /payments {amount: 100, idempotency_key: "abc"}β made idempotent via key
Heartbeat & Health Checks
How distributed systems detect dead nodes.
- Heartbeat: Node periodically sends βIβm aliveβ signal. No heartbeat for N seconds β considered dead.
- Health check: Load balancer pings
/healthendpoint. Non-200 β remove from rotation.
Failure detection trade-off:
- Aggressive (short timeout): detect failures fast, but false positives during GC pauses or network blips
- Conservative (long timeout): fewer false positives, but slow to detect real failures
Leader Election
When multiple nodes exist, sometimes one must be the βleaderβ (coordinates work, makes decisions).
Algorithms: ZooKeeper (ephemeral nodes), Raft (consensus), Bully algorithm
Why needed:
- Only one node should run a cron job (avoid duplicates)
- Only one node writes to a resource (avoid conflicts)
- Shard assignment coordination
Back-of-Envelope Estimation
Quick math to validate design decisions.
Key numbers to memorize:
| Operation | Time |
|---|---|
| L1 cache read | 1 ns |
| RAM read | 100 ns |
| SSD read | 100 ΞΌs |
| HDD seek | 10 ms |
| Network round-trip (same DC) | 0.5 ms |
| Network round-trip (cross-continent) | 150 ms |
Data size rules:
- 1 char = 1 byte (ASCII) or 2 bytes (Unicode)
- 1 KB = 1000 chars
- 1 million rows Γ 1 KB each = 1 GB
- 1 billion rows Γ 1 KB each = 1 TB
Traffic rules:
- 1 million DAU Γ 10 requests/user/day = 10M requests/day
- 10M requests/day Γ· 86400 = ~115 QPS average
- Peak = 2-5Γ average β ~230-575 QPS peak
How to Approach an HLD Interview
- Clarify requirements (2-3 min): functional + non-functional. Ask whatβs in scope.
- Back-of-envelope (2 min): traffic, storage, bandwidth estimates.
- High-level design (10-15 min): boxes and arrows. Client β LB β Service β DB.
- Deep dive (15-20 min): interviewer picks 2-3 areas. Show depth.
- Wrap up (2-3 min): trade-offs, what youβd change at 10Γ scale.
Donβt:
- Donβt jump into details before drawing the big picture
- Donβt pick a database without justifying why
- Donβt ignore non-functional requirements (latency, consistency, availability)
Next: pick a specific design problem and see these concepts applied.
π¬ Comments