πŸŽ‰ Launch Sale: Get 30% off annual plans with code LAUNCH30

← Back to Blog
Infrastructureβ€’Intermediateβ€’36 min read

Design a Rate Limiter

Asked at:GoogleAmazonStripeCloudflareNetflix
Tech:ShardingConsistencyAvailabilityPub/subAPI gatewayRate limiting

The Protector That Must Outrun the Flood

Every system has a breaking point. A database that handles 10,000 queries per second collapses at 50,000. An API server that serves 200 ms responses at normal load starts timing out at 3Γ— traffic. The rate limiter exists to prevent that collapse β€” to stand at the gate and decide, for every single request, whether to let it through or turn it away.

Here is the paradox: the system protecting you from overload must itself handle more load than anything it protects. If your API processes 200,000 requests per second on average and peaks at 1 million, the rate limiter must make 1 million decisions per second β€” each in under 5 milliseconds β€” without becoming the bottleneck it was designed to prevent. If the limiter is slow, every request is slow. If the limiter is down, you choose between two bad options: let everything through (no protection) or block everything (total outage).

This post walks through designing a distributed rate limiter from first principles. We will start with the algorithm (how to decide allow or deny), build the storage layer (where to keep state), add distributed coordination (how to enforce limits across multiple servers), and harden for failure (what happens when the limiter itself breaks). Every decision will be driven by the numbers: 1 million decisions per second, 5 ms latency budget, 10 million active keys.

Let us begin.

1. Requirements β€” What the Limiter Must Enforce

A rate limiter is infrastructure, not a product. It sits in the request path β€” typically in the API gateway or as a middleware β€” and makes a binary decision for every request: allow or deny. The decision must be fast, accurate, and consistent across all servers handling traffic for the same client.

Functional Requirements

  1. Enforce per-key rate limits. Each request is identified by a key β€” a user ID, an API key, an IP address, or a composite β€” and the limiter enforces a configured maximum request rate for that key. Different keys can have different limits.
  2. Support both burst tolerance and sustained-rate control. Real clients are bursty. A mobile app loading a screen might fire 15 requests in 200 milliseconds, then go quiet. The limiter should allow short bursts while enforcing an average rate over time β€” not punish a legitimate page load.
  3. Return allow/deny decisions in real time. Every request needs a decision before it reaches the downstream service. The decision must include enough metadata (remaining budget, reset time) for clients to implement intelligent backoff.

Non-Functional Requirements

  1. Decision latency p95 under 5 ms on the server side. The limiter adds overhead to every request. At 5 ms, it is invisible inside a 200 ms API response. At 50 ms, it doubles a fast endpoint's latency.
  2. High availability. The limiter must not be a single point of failure. If the limiter goes down, the system must have a defined fallback behavior β€” not an undefined one.
  3. Accuracy close to configured policy. If the policy says 100 requests per minute, actual enforcement should be within a small margin β€” not 200 because of distributed counting errors.
  4. Horizontal scalability to 1 million decisions per second. The limiter must scale with the traffic it protects.
  5. Tenant isolation. One noisy client hammering the limiter should not degrade decision quality or latency for other clients.

Scope Control

In scope: rate limiting algorithms, distributed state management, failure modes, deployment topology.

Out of scope: billing integration, user-facing analytics dashboards, long-term usage reporting.

Now that we know what the limiter must do, we need to understand the scale it must handle. These numbers will determine whether a simple counter suffices or we need a distributed, sharded state store.

Login to continue reading

You reached the preview limit. Sign in to unlock the remaining sections.

Continue Learning

πŸŽ‰ Launch Sale!

30% off annual plans with code LAUNCH30

View Pricing