🎉 Launch Sale: Get 30% off annual plans with code LAUNCH30

← Back to Blog
Social MediaIntermediate11 min read

Design a Twitter/X Home Timeline

Tech:PartitioningConsistencyAvailabilityCachingRedisSSE

Design a Twitter/X Home Timeline (Interviewer Walkthrough)

Goal: fast home feed reads at massive scale while handling write amplification and celebrity skew.

---

0) Pre-Design Research Inputs

Key research:

  1. Feed systems typically compare fanout-on-write vs fanout-on-read.
  2. Redis sorted structures are common for fast recency-ordered retrieval.
  3. Ordering and queue partitioning decisions must align with consistency needs.

Design implications:

  • Hybrid feed strategy is required (not one-size-fits-all).
  • Celebrity accounts need special handling.
  • Timeline cache/store should optimize top-N recent reads.

---

1) Requirements

Now that we have problem context, let us pin down what this feed system must deliver before debating fanout strategy.

Functional

  1. User can post tweet.
  2. User can follow/unfollow.
  3. User can load home timeline quickly.

Non-functional

  1. Home timeline p95 < 200ms.
  2. Very high read scale.
  3. Eventual consistency acceptable for timeline updates.
  4. System handles skew (celebrity followers).

---

2) Capacity Estimation

With requirements fixed, we quantify read/write asymmetry, because feed architecture is primarily decided by this ratio.

Assumptions:

  • DAU: 100M
  • Avg tweets/day: 500M
  • Home feed reads/day: 20B
  • Peak factor: 8x

Numbers:

  • Tweet writes avg: ~5.8k/s, peak ~46k/s
  • Feed reads avg: ~231k/s, peak ~1.85M/s

Implication:

  • Read path dominates by huge margin.
  • Precomputation/caching for home feed is essential.

---

2.1) Storage + Ops

Now let us estimate storage and write amplification impact, especially for fanout-heavy designs.

Assume timeline entry ~80B (tweetId + authorId + score/timestamp + metadata):

  • If fanout-on-write for all users, storage duplication is massive.

Example:

  • Avg followers 300 => one tweet can generate ~300 timeline writes.
  • At 46k tweet/s peak, naive fanout writes can approach 13.8M timeline writes/s.

Implication:

  • Full fanout-on-write for everyone is too expensive at skew.
  • Need hybrid strategy.

---

3) Core Entities v1

With scale pressure visible, we define entities that separate source-of-truth data from serving-model data.

  • User(userId, ...)
  • Tweet(tweetId, authorId, createdAt, text, mediaRef?)
  • FollowEdge(followerId, followeeId, createdAt)
  • HomeTimeline(userId, scoreTs, tweetId, authorId) (materialized)

Thought process:

  • Tweet is source of truth.
  • HomeTimeline is serving model optimized for read latency.
  • FollowEdge drives both write fanout and read assembly.

Functional requirement traceability:

  • FR1 (post tweet) -> Tweet persists canonical content.
  • FR2 (follow/unfollow) -> FollowEdge controls feed eligibility and fanout edges.
  • FR3 (load home timeline) -> HomeTimeline serves low-latency read path.

Why this mapping matters:

  • Feed systems fail when entities are generic; this mapping keeps model tied to product behavior.

---

4) API / Interface

Now that entities are set, we define minimal feed contracts and parameters that support low-latency pagination.

  • POST /v1/tweets
  • POST /v1/follows
  • DELETE /v1/follows/{followeeId}
  • GET /v1/home?cursor=...&limit=...

Parameter reasoning:

  • cursor supports pagination without expensive deep offsets.
  • limit bounded to protect tail latency and cache efficiency.

Functional requirement to API mapping:

  • FR1 -> POST /v1/tweets
  • FR2 -> POST /v1/follows, DELETE /v1/follows/{followeeId}
  • FR3 -> GET /v1/home?cursor=...&limit=...

Why this mapping matters:

  • API shape is justified by requirements; no unused endpoints in interview scope.

---

5) High-Level Design (progressive)

Now we build the architecture incrementally: first correctness, then read optimization, then skew handling, then resilience.

Step A: Correct baseline

User calls POST /v1/tweets to create a tweet, which is persisted in the tweet store. User calls GET /v1/home?cursor=...&limit=... to load their home timeline. The feed service must fetch the user's followees from the follow graph, retrieve recent tweets from each, merge-sort by time, and return paginated results. The question is: what baseline flow gives us correct timeline assembly before optimization?

Components:

  • Tweet service + tweet store
  • Follow graph store
  • Feed read service

Baseline behavior:

  • On read, fetch recent tweets from followees and merge-sort.

Why baseline:

  • Simple correctness demonstration before optimization.

Decision details (data stores in baseline):

  • Tweet store choice: Cassandra-style append-optimized store keyed by author/time.
  • Follow graph choice: relational/graph-friendly store keyed by followerId -> followeeId.
  • Why this split:

- tweet writes are high-volume append; - follow graph needs edge queries and consistency for follow/unfollow semantics.

  • Why not one store for both:

- access patterns differ significantly; single-model compromise hurts either write throughput or edge-query efficiency.

Step B: Fanout-on-write for normal users

When a user calls POST /v1/tweets, fanout workers push the tweet entry into each follower's HomeTimeline store. When a follower calls GET /v1/home, the feed service reads precomputed entries instead of assembling from scratch. The question is: how do we make reads fast by precomputing timelines without exploding write costs?

Components added:

  • Fanout workers
  • Home timeline cache/store (e.g., Redis + persistent backing)

Decision:

  • Push tweets into followers' home timelines for non-celebrity authors.

Why:

  • Read volume is much higher than write; precompute to reduce read-time joins.

Choice details:

  • Use async fanout workers reading tweet events.
  • Materialize into HomeTimeline(userId, scoreTs, tweetId) serving model.
  • Keep timeline in Redis for hot reads with persistent backing store for recovery.

Why this solves:

  • Converts expensive read-time merges into cheap top-N reads for most users.
  • Keeps home endpoint p95 stable under heavy read load.

Why not fanout-on-read only:

  • At ~1.85M/s peak reads, per-request merge of many followees is too expensive.

How with numbers:

  • Reads peak 1.85M/s; serving from prebuilt timeline keeps feed request bounded to top-N lookup.
  • Example:

- if each read merged even 200 followee streams, backend query fanout would be huge. - precomputed timeline reduces this to one or few key-range reads.

Step C: Celebrity hybrid strategy

When a celebrity with millions of followers calls POST /v1/tweets, the fanout-on-write approach from Step B would generate millions of HomeTimeline writes. A single tweet from an account with 20M followers produces 20M timeline inserts, creating massive write amplification and potential fanout worker backlogs that degrade feed freshness for everyone.

Decision:

  • For high-follower accounts, skip full fanout-on-write; fetch on read (fanout-on-read for celebrity edges).

Why:

  • Prevent write explosion from celebrity posts.

Choice details:

  • Define dynamic threshold (e.g., follower count or historical fanout cost) for "celebrity mode."
  • For celebrity tweets, store canonical tweet and inject at read time for followers.

Why this solves:

  • caps extreme write amplification from rare high-fanout authors.

Why not fully fanout-on-write:

  • one celebrity tweet can produce tens of millions of writes.
  • with burst posting, fanout workers backlog and degrade feed freshness for everyone.

How with numbers:

  • If account has 20M followers, one tweet => 20M timeline writes if fully fanned out.
  • Hybrid avoids this spike and shifts controlled merge cost to readers following celebrities.
  • Tradeoff:

- feed assembly path becomes heterogeneous (push + pull merge).

Step D: Ranking + caching + freshness

When GET /v1/home is called, the feed service may optionally call a ranking service to reorder tweets by engagement and personalization features before returning results. Cached timeline segments improve latency, but stale caches hurt freshness. The question is: how do we add ranking without making it a hard dependency that blocks feed availability?

Components added:

  • Ranking service (lightweight in interview scope)
  • Cache invalidation/refresh strategy

Decision:

  • Use recency-first baseline with optional ranking features.

Why:

  • Guarantees deterministic fallback if ranking service degrades.

Choice details:

  • Recency order is always available from timeline store.
  • Ranking service enriches top window only (not entire feed).
  • Cache ranked segments with short TTL.

Why this solves:

  • personalization improves quality when healthy;
  • service remains available when ranking/feature pipelines fail.

Why not ranking-hard-dependency:

  • adds failure coupling between feed availability and ML/feature services.

How with numbers:

  • keep home feed p95 target (<200ms) by capping ranking calls and falling back to recency when ranking latency breaches budget.

---

Core Entities v2

After these architecture changes, we refine entities so hybrid fanout and ranking controls are explicit in the data model.

  • HomeTimeline(..., sourceType) where sourceType=materialized|pull
  • UserFeedConfig(userId, rankingMode, language, contentPrefs)
  • TweetEngagement(tweetId, likes, replies, retweets) for ranking features

Why changed:

  • Added source-type to support hybrid feed merge.
  • Added config/engagement to evolve ranking without changing core tweet store.

---

6) Deep Dives (numeric + mechanism)

With the complete design in place, let us stress-test each high-risk tradeoff in the same order we introduced it.

Deep Dive 1: Fanout strategy

Now that we implemented hybrid fanout, we first verify why this tradeoff beats one-size-fits-all approaches. Bad: full fanout-on-read only.

  • Why bad: every home read becomes expensive multi-source merge at huge read QPS.
  • Example: peak home reads ~1.85M/s.
  • How technically:

- each request must fetch recent tweets from many followees and merge-sort. - backend query fanout and CPU per request increase sharply, hurting p95.

Good: full fanout-on-write for everyone.

  • Why good: home reads become cheap top-N lookups.
  • Why it breaks: celebrity posts cause write explosion.
  • Example: one user with 20M followers posts once -> 20M timeline inserts.

Great: hybrid fanout.

  • Strategy: fanout-on-write for normal accounts, fanout-on-read for celebrity edges.
  • Why great: balances read latency with write amplification control.
  • Tradeoff: feed assembly logic is more complex for users following celebrities.

Deep Dive 2: Timeline storage choice

Next, we validate storage-layer choices because feed latency is highly sensitive to serving model design. Bad: query canonical tweet store on every home request.

  • Why bad: tweet store optimized for writes/history, not low-latency per-user timeline assembly.
  • Example: 500M tweets/day with high read fanout.
  • How: repeated wide reads + merge at request time increase DB and service load.

Good: materialized home timeline entries.

  • Why good: serving layer precomputes per-user candidates.
  • Example: read path becomes "get top 50 by scoreTs" instead of multi-followee merge each time.

Great: in-memory timeline cache (Redis) + durable backing store.

  • Why great: p95 is stabilized by cache; durable store enables recovery and backfill.
  • Tradeoff: invalidation and warmup complexity.

Deep Dive 3: Cache invalidation

With serving model selected, we now examine freshness and invalidation behavior under updates/deletes. Bad: no invalidation strategy.

  • Why bad: stale or missing tweets persist in home timeline.
  • Example: tweet delete/unfollow not reflected quickly.
  • How: cache entries stay outdated until manual refresh, hurting trust.

Good: TTL-based refresh.

  • Why good: simple and bounded staleness.
  • Limitation: short TTL increases backend load; long TTL increases staleness.

Great: event-driven partial invalidation + cursor/version checks.

  • Example:

- delete event removes specific tweet from affected caches; - follow/unfollow event invalidates only impacted segments.

  • Tradeoff: event plumbing and cache key design complexity.

Deep Dive 4: Follow churn

Now that invalidation is addressed, we test follow/unfollow churn correctness in precomputed timelines. Bad: ignore unfollow effects.

  • Why bad: user continues seeing content from accounts they unfollowed.
  • Example: user unfollows account at 10:00, still sees posts at 10:05.
  • How: stale precomputed entries remain unless explicitly filtered.

Good: read-time follow filter.

  • Why good: correctness at display time even if precomputed timeline is stale.
  • Tradeoff: extra check on read path.

Great: background cleanup + read-time safety filter.

  • Why great: cleanup reduces stale data while safety filter guarantees correctness.
  • Example: async worker removes obsolete entries; read path still validates follow edge for zero-leak behavior.

Deep Dive 5: Reliability

Finally, we verify that the feed still works when ranking dependencies degrade. Bad: ranking service is hard dependency for every feed read.

  • Why bad: ranking outage takes down home feed availability.
  • Example: ranking feature store latency spike.
  • How:

- feed request waits on ranking call; - p95 breaches SLA, timeouts increase.

Good: fallback to recency-only feed.

  • Why good: preserves core product behavior even without personalization.
  • Example: disable ranking enrichment toggle during incident.

Great: graceful degradation tiers.

  • Tier 1: full ranking.
  • Tier 2: lightweight ranking with cached features.
  • Tier 3: pure recency.
  • Why great: keeps service available while controlling latency/cost during incidents.
  • Tradeoff: temporary relevance drop in feed quality.

---

7) Common mistakes

  1. Picking only one fanout model for all users.
  2. No handling of celebrity skew.
  3. No cursor design for feed pagination.
  4. Tight coupling between ranking and feed availability.

---

8) Interviewer signals

  • Did you quantify read/write skew?
  • Did you recognize and solve celebrity write amplification?
  • Did you design fallback path when ranking/deep services fail?

---

9) References

  • Fanout strategy overview (industry discussion): https://www.systemdesignsandbox.com/learn/fan-out-strategies
  • Redis sorted set patterns: https://redis.io/learn/howtos/leaderboard

Key Takeaways

  • 1User can follow/unfollow.
  • 2User can load home timeline quickly.
  • 3Home timeline p95 < 200ms.
  • 4Very high read scale.
  • 5Eventual consistency acceptable for timeline updates.

Continue Learning

🎉 Launch Sale!

30% off annual plans with code LAUNCH30

View Pricing