🎉 Launch Sale: Get 30% off annual plans with code LAUNCH30

🎉 Launch Sale: Get 30% off annual plans

LAUNCH30

Use code at checkout for exclusive savings on yearly membership

Knowledge Hub

Interview-level system design guides

Real content with database schemas, API designs, capacity calculations, and the exact tradeoffs interviewers expect you to discuss.

Catalog snapshot

Articles

62

Topics

27

Reading

850m

Topic:
Level:
Showing 62 of 62 articles
BeginnerWeb Services
16 min read

Design a URL Shortener

Most candidates fail this problem not because URL shortening is hard, but because they over-design too early or give choices without proving them.

PartitioningReplicationConsistencyAvailability
BeginnerOpen article
IntermediateSocial Media
11 min read

Design a Twitter/X Home Timeline

Goal: fast home feed reads at massive scale while handling write amplification and celebrity skew.

PartitioningConsistencyAvailabilityCaching
IntermediateOpen article
IntermediateReal-Time Messaging
14 min read

Design a WhatsApp-Style Chat System

This is a real-time system design problem where correctness, latency, and failure handling matter more than drawing many boxes.

ReplicationConsistencyCDNPub/sub
IntermediateOpen article
IntermediateInfrastructure
12 min read

Design a Rate Limiter

This problem is about protecting downstream services under load while preserving fair access.

PartitioningReplicationConsistencyAvailability
IntermediateOpen article
AdvancedLocation Services
15 min read

Design a Ride-Hailing System (Uber-Style, Interview Walkthrough)

Designing ride hailing is about real-time decisions under uncertainty: live locations, fast matching, ETA quality, and failure-safe trip lifecycle.

ReplicationConsistencyAvailabilityRedis
AdvancedOpen article
IntermediateNotifications
11 min read

Design a Notification System

Goal: deliver notifications across multiple channels (push/email/SMS) reliably, with user preferences and retries.

ReplicationRate limitingCircuit breakerKafka
IntermediateOpen article
AdvancedAdvertising
14 min read

Design an Ad Click + Impression Tracking System

Ad measurement systems break when teams optimize ingestion throughput but ignore correctness contracts such as deduplication, attribution windows, and anti-fraud boundaries. This design prioritizes...

PartitioningReplicationConsistencyAvailability
AdvancedOpen article
IntermediateSocial MediaPRO
13 min read

Design an FB News Feed Platform

Facebook feed design is a ranking-heavy, read-dominant problem where product quality and system reliability are tightly coupled. The challenge is not only serving fresh posts quickly, but doing it ...

PartitioningReplicationConsistencyAvailability
IntermediateUnlock article
AdvancedVideoPRO
16 min read

Design a Video Streaming Platform

A video streaming platform is three different systems that must behave like one product: upload ingestion, asynchronous transcoding, and low-latency playback delivery. The hard part is not drawing ...

ConsistencyAvailabilityCachingCDN
AdvancedVideoPRO
14 min read

Design a TikTok-like Short Video Platform

Short video platforms look like simple video hosting, but the real system problem is sub-200ms time-to-first-frame for infinite scroll, hyper-personalized recommendations that update in real-time b...

ConsistencyAvailabilityCachingCDN
AdvancedReal-Time CollaborationPRO
16 min read

Design a Google Docs-like Real-time Document Platform

A realtime document platform is a consistency system disguised as a text editor. The product feels simple to users, but the core engineering challenge is preserving intent and convergence under con...

ReplicationConsistencyAvailabilityPub/sub
IntermediateSearchPRO
12 min read

Design a Web Crawler

A crawler looks simple until scale and politeness collide: billions of URLs, duplicate pages, robots rules, and scheduling freshness.

Circuit breakerSSEBASESLO
IntermediateUnlock article
AdvancedStoragePRO
14 min read

Design a Cloud File Sync Service (Dropbox-like)

A cloud file sync system looks simple in demos and fails in production when clients treat sync as naive file upload/download. Real correctness is about version vectors, conflict policy, delta trans...

PartitioningReplicationAvailabilityCDN
IntermediateSearchPRO
10 min read

Design Search Autocomplete

Autocomplete is a latency problem with ranking constraints: users expect useful suggestions in <100ms while index freshness keeps changing.

ConsistencyAvailabilityCachingRate limiting
IntermediateUnlock article
IntermediateCachingPRO
10 min read

Design a Distributed Cache (Redis/Memcached-Style, Interview Walkthrough)

In this problem, the challenge is not only low latency. A cache must stay fast under pressure, predictable under failures, and safe for downstream databases during miss storms.

ShardingReplicationConsistencyAvailability
IntermediateUnlock article
IntermediateE-CommercePRO
13 min read

Design a Ticket Booking System

Goal: prevent double-booking under high concurrency while keeping booking flow responsive.

ConsistencyAvailabilitySQLRedis
IntermediateUnlock article
AdvancedInfrastructurePRO
12 min read

Design a Distributed Job Scheduler (Batch + Delayed + Recurring)

Schedulers fail when dispatch throughput is optimized but execution guarantees are vague: duplicate runs, starvation, runaway retries, and opaque recovery. This design focuses on deterministic sche...

ShardingPartitioningReplicationSSE
IntermediateInfrastructurePRO
11 min read

Design an API Gateway

An API gateway is a control point for routing, auth, retries, limits, and observability. The main risk is becoming both bottleneck and failure amplifier.

ConsistencyAPI gatewayRate limitingCircuit breaker
IntermediateUnlock article
AdvancedPaymentsPRO
14 min read

Design a Payment System (Authorization + Capture + Ledger)

Payment systems fail less from single bugs and more from broken guarantees: duplicate charges, inconsistent ledgers, and weak reconciliation loops. This walkthrough focuses on correctness-first arc...

PartitioningReplicationConsistencyAvailability
IntermediateVideoPRO
15 min read

Design YouTube Top-K Trending Videos

Trending is not just "most viewed". The ranking must balance freshness, momentum, spam resistance, and regional relevance, while remaining explainable enough to debug ranking anomalies. This design...

ReplicationConsistencyAvailabilityMicroservices
IntermediateUnlock article
AdvancedAudioPRO
14 min read

Design a Spotify-like Music Streaming System

Music streaming looks like simple audio file delivery, but the real system problem is seamless gapless playback across variable network conditions, hyper-personalized recommendations combining coll...

AvailabilityCachingCDNMicroservices
IntermediateSocial MediaPRO
14 min read

Design an Instagram Live Comment System

Live comments look like a simple chat stream, but the real system problem is low-latency fanout under bursty write spikes, abuse resistance, ranking/relevance choices, and replay consistency when u...

PartitioningReplicationConsistencyAvailability
IntermediateUnlock article
AdvancedReal-Time MessagingPRO
15 min read

Design a Slack/Discord-like Workspace Messaging Platform

Workspace messaging looks like a simple chat app, but the real system problem is managing stateful WebSocket connections at scale, presence propagation across millions of users, message fanout to c...

ShardingConsistencyAvailabilityWebSocket
AdvancedReal-Time CollaborationPRO
14 min read

Design a Real-Time Collaborative Editor

Real-time collaborative editors are not just "WebSocket + save to DB." The hard part is preserving a responsive typing experience while converging to one consistent document under concurrent edits,...

ShardingReplicationConsistencyAvailability
AdvancedGamingPRO
17 min read

Design an Online Chess Platform

An online chess platform looks straightforward at product level, but interview depth comes from realtime correctness, fair matchmaking, anti-cheat enforcement, and low-latency analysis under high f...

ShardingReplicationAvailabilityPub/sub
AdvancedLocation ServicesPRO
14 min read

Design a Google Maps-like Platform (Map Tiles + Routing + ETA + Traffic Updates)

A maps platform is not one system. It is four coupled systems with different physics: low-latency map rendering (tiles), compute-heavy path search (routing), uncertainty-aware travel prediction (ET...

PartitioningConsistencyAvailabilityCaching
IntermediateE-CommercePRO
19 min read

Design an FB Marketplace-like Platform

An FB Marketplace-like platform looks simple at UI level, but interview depth comes from local relevance, listing integrity, buyer-seller messaging reliability, abuse control, and low-latency read ...

ReplicationConsistencyAvailabilityCaching
IntermediateUnlock article
AdvancedFintechPRO
17 min read

Design a Robinhood-like Trading Platform

A retail trading platform is primarily a correctness system with strict latency goals, not just a fast UI. The hardest part is balancing UX speed, market integrity, regulatory controls, and financi...

Horizontal scalingPartitioningReplicationConsistency
AdvancedVideoPRO
14 min read

Design a Twitch-like Live Streaming Platform

Live streaming platforms look like simple video broadcast, but the real system problem is sub-3-second glass-to-glass latency while scaling to millions of concurrent viewers, handling thousands of ...

ShardingAvailabilityCDNPub/sub
AdvancedVideoPRO
14 min read

Design a Zoom/Google Meet Video Conferencing System

Video conferencing looks like simple media streaming, but the real system problem is managing real-time bidirectional media with sub-200ms latency, handling heterogeneous network conditions across ...

ReplicationAvailabilityCDNWebSocket
AdvancedSearchPRO
14 min read

Design a Distributed Search System (Index + Query Serving)

A distributed search system is a two-speed architecture: writes build and refresh indexes continuously, while reads must return relevant results under strict latency SLOs. This walkthrough focuses ...

ShardingConsistencyAvailabilityCaching
AdvancedStoragePRO
15 min read

Design an Object Storage Service (S3-like)

An object storage design interview tests whether you can separate data-plane scale from control-plane correctness: massive read/write throughput, durable bytes, cheap lifecycle management, and pred...

PartitioningReplicationConsistencyAvailability
IntermediateInfrastructurePRO
14 min read

Design a CDN

A CDN design interview is usually about one core skill: can you move bytes closer to users while keeping correctness, cost, and operational safety under control. This walkthrough focuses on cacheab...

ReplicationConsistencyAvailabilityCaching
IntermediateUnlock article
AdvancedMachine LearningPRO
14 min read

Design a Recommendation Serving System

Recommendation serving is an online decision system: every request must return relevant items fast, safely, and with measurable business impact. This design focuses on low-latency personalized serv...

ShardingReplicationConsistencyAvailability
IntermediateNotificationsPRO
16 min read

Design an Email Delivery Platform (transactional + bulk)

Email delivery looks simple at API level, but production reliability depends on policy enforcement, sender reputation, ISP feedback loops, and strict separation between urgent transactional traffic...

AvailabilitySSEIdempotencyBASE
IntermediateUnlock article
AdvancedAnalyticsPRO
13 min read

Design a Real-Time Analytics Platform (Events -> Aggregates -> Queries)

Real-time analytics systems fail when teams optimize ingestion throughput but ignore semantic correctness, late data handling, and query isolation. This design prioritizes accurate, low-latency ins...

ConsistencyCachingSQLKafka
AdvancedDistributed SystemsPRO
15 min read

Design a Distributed Lock Service

A distributed lock service is not about storing a lock key; it is about preserving mutual exclusion and bounded wait under failures, retries, and clock drift. This design targets practical correctn...

PartitioningReplicationConsistencyAvailability
AdvancedDistributed SystemsPRO
14 min read

Design a Distributed Configuration Service

Distributed configuration services look easy until the first bad rollout. The hard part is not storing key-value pairs; it is keeping reads fast, updates safe, and behavior predictable during parti...

PartitioningReplicationConsistencyAvailability
AdvancedMonitoringPRO
14 min read

Design a Distributed Metrics Platform (Prometheus-like + Long-Term Store)

A metrics platform fails in practice when teams optimize scrape and ingest throughput, but ignore cardinality controls, query isolation, and long-term retention economics. This design keeps fast op...

ShardingReplicationConsistencyAvailability
IntermediateInfrastructurePRO
11 min read

Design a Feature Flag Platform (Evaluation + Rollout + Experimentation)

Feature flag systems fail when teams focus on toggles but ignore consistency, blast-radius control, and lifecycle hygiene. This design prioritizes safe progressive delivery with low-latency runtime...

ConsistencyAvailabilityGRPCWebSocket
IntermediateUnlock article
AdvancedSecurityPRO
16 min read

Design a Fraud Detection Pipeline (Real-Time + Batch)

Fraud systems fail when teams optimize only model accuracy while ignoring serving latency, feedback loops, and false positive cost. This design keeps real-time blocking, near-real-time review, and ...

ReplicationConsistencyAvailabilityAPI gateway
AdvancedPaymentsPRO
15 min read

Design a Payment Reconciliation Engine

Payment API success is not financial truth. Money correctness is established only after internal records, PSP/acquirer outcomes, settlement files, fees, and chargeback events are reconciled into on...

PartitioningReplicationConsistencyAvailability
AdvancedDeveloper ToolsPRO
16 min read

Design an Online Code Execution Platform

An online code execution platform is a controlled untrusted-compute system: users submit code in many languages, the platform compiles/runs it safely, and returns output quickly without letting one...

ReplicationAvailabilityCachingSSE
AdvancedReal-Time MessagingPRO
13 min read

Design a Global Chat Platform (1:1 + Group Messaging)

Global chat systems look simple at UI level, but the backend is a hard mix of low latency, durability, ordering boundaries, and regional failure handling.

PartitioningReplicationConsistencyAvailability
AdvancedMonitoringPRO
11 min read

Design an Observability Platform (Logs + Metrics + Traces)

Observability systems fail when ingestion success is prioritized over query usability, or when cardinality explodes without guardrails.

PartitioningConsistencySSEBASE
IntermediateSocial MediaPRO
12 min read

Design a Tinder-like Dating Platform

Dating apps are not just swipe APIs. The hard parts are low-latency candidate serving, exactly-once-ish interaction recording, consistent match creation, and safe real-time messaging under abuse co...

ShardingPartitioningConsistencyAvailability
IntermediateUnlock article
IntermediateSocial MediaPRO
12 min read

Design a Strava-like Fitness Platform (Activities + Social Feed + Leaderboards)

Fitness platforms combine telemetry ingestion, geospatial processing, social interactions, and fairness-sensitive leaderboards. The hard part is balancing fast uploads and feed freshness with corre...

PartitioningReplicationConsistencyMicroservices
IntermediateUnlock article
IntermediateLocation ServicesPRO
13 min read

Design a Nearby Discovery Platform (Users + Places + Events, Privacy-Aware)

Nearby discovery looks simple at product level ("show me what is near me"), but the hard part is balancing low-latency relevance with strong location privacy guarantees. The system must answer geo ...

PartitioningReplicationConsistencyAvailability
IntermediateUnlock article
IntermediateProductivityPRO
12 min read

Design a Google Calendar-like Platform

Calendar systems look simple until recurrence, timezone shifts, invite workflows, and conflict-free sync collide at scale. The hard part is preserving scheduling correctness while keeping sharing a...

ConsistencyAvailabilityMicroservicesWebSocket
IntermediateUnlock article
IntermediateSocial MediaPRO
13 min read

Design a Google News Feed Platform

News feed systems are latency-sensitive ranking systems under strict trust constraints. The hard part is not rendering cards; it is ingesting fast-changing publisher content, ranking per user inten...

AvailabilityCachingKafkaSSE
IntermediateUnlock article
IntermediateE-CommercePRO
12 min read

Design an Online Auction Platform

Auction systems are fairness-critical transaction systems disguised as simple listing apps. The hard parts are bid ordering correctness, auction-close race handling, anti-sniping policy, payment/se...

PartitioningConsistencyAvailabilityMicroservices
IntermediateUnlock article
IntermediateE-CommercePRO
15 min read

Design a Price Tracker Platform

A price tracker sounds simple at product level, but interview depth comes from handling massive crawl skew, noisy merchant feeds, duplicate product mappings, and user alert latency under bursty pri...

ReplicationConsistencyAvailabilityCaching
IntermediateUnlock article
AdvancedGamingPRO
15 min read

Design a Real-Time Multiplayer Game State Sync System

Multiplayer games look like a simple client-server exchange, but the real system problem is sub-50ms state propagation under adversarial latency, cheat-resistant authoritative logic, deterministic ...

ReplicationConsistencyAvailabilityLoad balancing
IntermediateReal-TimePRO
14 min read

Design a Live Sports Score/Commentary System

Live sports scores look like a simple data feed, but the real system problem is sub-second delivery to millions of concurrent users during peak moments, handling 100x traffic spikes during major ev...

PartitioningConsistencyAvailabilityLoad balancing
IntermediateUnlock article
IntermediateAudioPRO
14 min read

Design a Podcast Platform

Podcast platforms look like simple audio hosting, but the real system problem is efficiently ingesting millions of RSS feeds with varying update frequencies, delivering large audio files (50MB-500M...

CachingCDNRate limitingSSE
IntermediateUnlock article
IntermediateMedia ProcessingPRO
14 min read

Design an Image Processing Pipeline (Cloudinary-like)

Image processing pipelines look like simple resize operations, but the real system problem is on-the-fly transformation at CDN edge with sub-100ms latency, supporting infinite transformation combin...

ConsistencyAvailabilityCachingCDN
IntermediateUnlock article
IntermediateDistributed SystemsPRO
13 min read

Design a Distributed Task Queue (Celery/SQS-like)

Distributed task queues look like simple message passing, but the real system problem is exactly-once task execution under worker failures, visibility timeout tuning that balances latency with reli...

ReplicationAvailabilityRate limitingSQL
IntermediateUnlock article
IntermediateSecurityPRO
13 min read

Design a Secret Management System (Vault-like)

Secret management systems look like simple key-value stores, but the real system problem is encryption at rest with secure unsealing, dynamic credential generation with automatic rotation, fine-gra...

ReplicationAvailabilityCachingSSE
IntermediateUnlock article
AdvancedInfrastructurePRO
13 min read

Design an A/B Testing / Experimentation Platform

Experimentation platforms look like simple feature flags with analytics, but the real system problem is statistically rigorous assignment with minimal latency, consistent user bucketing across sess...

Horizontal scalingConsistencyAvailabilityCaching
IntermediateMonitoringPRO
13 min read

Design a Log Aggregation System (Splunk/ELK-like)

Log aggregation systems look like simple storage with search, but the real system problem is ingesting terabytes of logs per day without data loss, indexing for sub-second search across billions of...

PartitioningCachingKafkaSSE
IntermediateUnlock article
AdvancedDatabasePRO
13 min read

Design a Database Connection Pooler (PgBouncer-like)

Database connection poolers look like simple proxies, but the real system problem is multiplexing thousands of application connections over limited database connections, choosing the right pooling ...

ShardingAvailabilityLoad balancingCaching
IntermediateInfrastructurePRO
13 min read

Design a Canary Deployment System

Canary deployment systems look like simple traffic splitting, but the real system problem is progressively shifting traffic with automatic rollback on degradation, defining meaningful health metric...

AvailabilitySSEBASESLO
IntermediateUnlock article