πŸŽ‰ Launch Sale: Get 30% off annual plans with code LAUNCH30

← Back to Blog
Advertisingβ€’Advancedβ€’33 min read

Design an Ad Click and Impression Tracking System

Asked at:GoogleMetaAmazonTwitter
Tech:ReplicationAvailabilityCDNRedisKafkaSSE

When Counting Is a Billion-Dollar Problem

Every time you see an ad on a website, an impression event fires. Every time you click one, a click event fires. These events look trivial β€” tiny JSON payloads, maybe 320 bytes each. But multiply that by 130 billion events per day, and you are processing 41 terabytes of raw data daily, making real-time deduplication decisions at 5 million events per second during peak, joining clicks to their originating impressions across a 30-minute attribution window, filtering out bot traffic that could inflate an advertiser's bill by millions, and producing two different versions of the truth β€” a fast provisional count for real-time dashboards and a reconciled final count for billing.

Get the counting wrong by 1%, and at $10 billion in annual ad spend flowing through your platform, that is a $100 million discrepancy. Advertisers dispute. Publishers lose trust. Auditors investigate.

This post walks through designing an ad event tracking system from first principles. We will build progressively: durable event ingestion first, then deduplication, attribution joins, real-time analytics, fraud filtering, and billing reconciliation. Every decision will be driven by the numbers β€” because in ad tech, the numbers are not just engineering constraints, they are the product.

Let us begin.

1. Requirements β€” What Gets Counted and How

Ad tracking systems serve two masters with fundamentally different needs: product teams want real-time dashboards showing campaign performance right now, and finance teams want billing-grade numbers that can survive an audit. These needs conflict β€” speed versus correctness β€” and the architecture must serve both.

Functional Requirements

  1. Record ad impressions and clicks from web, mobile, and server-to-server channels. Publishers send batches of events; the system must ingest them durably at extreme throughput.
  2. Deduplicate retries and preserve attribution. Mobile SDKs retry on timeout. Server-to-server integrations replay on failure. Every duplicate must be suppressed. Every click must link to a prior eligible impression β€” without this link, the click is unbillable.
  3. Provide near-real-time metrics β€” impressions, clicks, CTR β€” sliced by campaign, ad, geo, and device. Dashboard freshness target: under 10 seconds.
  4. Produce billing-grade daily aggregates with invalid-traffic filtering and reconciliation. These numbers go on invoices. They must be versioned, auditable, and reproducible.
  5. Support reprocessing and backfill when fraud models are updated or attribution rules change. If a new model reclassifies 2% of traffic as invalid, the system must recompute affected billing periods without manual intervention.

Non-Functional Requirements

  1. Sustained ingest: 1.5 million events per second. Peak: 5 million events/s during major events (Super Bowl, Black Friday, election nights).
  2. Ingestion availability: 99.99%. Every lost event is lost revenue for a publisher or overbilling for an advertiser.
  3. Data loss under 0.01% for accepted events. Late events tolerated up to 24 hours.
  4. Dashboard freshness under 10 seconds. Billing closure within T+1 (numbers finalized by end of next business day).
  5. 90-day hot retention for raw events; archived to cold storage beyond that.

Scope Control

In scope: ingest API, streaming pipeline, deduplication, attribution join, real-time OLAP serving, billing aggregates, invalid-traffic scoring.

Out of scope: ad auction ranking logic, creative serving CDN, advertiser UI.

Now we need to understand the scale β€” because at 130 billion events per day, every architectural choice is load-bearing.

Login to continue reading

You reached the preview limit. Sign in to unlock the remaining sections.

Continue Learning

πŸŽ‰ Launch Sale!

30% off annual plans with code LAUNCH30

View Pricing