πŸŽ‰ Launch Sale: Get 30% off annual plans with code LAUNCH30

← Back to Blog
Real-Time Messagingβ€’Intermediateβ€’45 min read

Design a WhatsApp-Style Chat System

Asked at:MetaGoogleMicrosoftSlack
Tech:Horizontal scalingShardingReplicationConsistencyCDNRate limiting

Why Chat Is Harder Than It Looks

Sending a message feels instant. You type "hello," tap send, and it appears on your friend's screen. Behind that simplicity is one of the hardest real-time systems to build correctly.

The challenge is not sending a single message β€” it is sending 463,000 messages per second at peak, guaranteeing that every message is delivered exactly once even when phones go offline, ensuring that two people in the same group chat always see messages in the same order, and doing all of this while a user seamlessly switches between their phone, tablet, and web browser without missing a single message.

This post walks through designing a WhatsApp-style chat system from first principles. Every decision β€” from the database to the message bus to the delivery protocol β€” will be grounded in specific research (RFCs, official docs) and justified by capacity math. We will build progressively: a minimal working chat first, then real-time fanout, offline recovery, and media handling β€” adding complexity only when a measurable problem demands it.

If you are reading this for the first time, follow the story in order. The requirements tell us what to build. The capacity math tells us how much pressure the system faces. The entities and APIs define the vocabulary. The high-level design assembles the machinery. And the deep dives break things on purpose to see how the design recovers.

Let us begin.

1. Requirements β€” Defining the Conversation

Before choosing any technology, we need to agree on exactly what this chat system does and β€” just as importantly β€” what it does not do. Chat systems are feature-rich products, and trying to design for everything at once guarantees you design nothing well.

Functional Requirements

Our system needs to support four core user actions:

  1. Create chats. Users can start one-on-one conversations and small group chats (up to 256 members, matching WhatsApp's group limit β€” a number we will use later when calculating write amplification).
  2. Send and receive messages in near real-time. When both sender and recipient are online, messages should appear almost instantly.
  3. Offline message recovery. When a user is offline and comes back, they must see every message they missed β€” no gaps, no losses. This is the hardest requirement to get right.
  4. Media messages. Users can send images, videos, and documents. The system handles the metadata; the binary blobs live elsewhere.

Non-Functional Requirements

  1. Message delivery p95 under 500 ms for online, in-region recipients. Half a second feels instant in a conversation. Anything slower feels broken.
  2. Durable message storage. Once the server acknowledges a message, it must not be lost β€” not during server crashes, not during deployments, not during datacenter failovers.
  3. Scale to ~463,000 messages per second at peak with hundreds of millions of concurrent connections.
  4. Tolerate partial failures. A single chat server crashing must not lose messages or leave users in a broken state.
  5. Basic abuse protection. Rate limits prevent a single user or bot from degrading the system for everyone.

Scope Control

In scope: text messaging, media metadata, offline sync, message ordering, delivery acknowledgment, multi-device support.

Out of scope: end-to-end encryption protocol details, voice/video calling, spam ML models, stories/status features.

Now that we know what we are building, the next question is: how much load does this system actually face? The answer will determine whether we need one database or a distributed fleet, whether we can afford per-recipient writes or need a cursor-based model, and whether our message bus needs 16 partitions or 128.

Login to continue reading

You reached the preview limit. Sign in to unlock the remaining sections.

Continue Learning

πŸŽ‰ Launch Sale!

30% off annual plans with code LAUNCH30

View Pricing