All articles
System Design June 5, 2026 · 8 min read

System Design: How to Design a URL Shortener

“Design a URL shortener” (think Bitly or TinyURL) is one of the most common system design interview questions because it’s small enough to finish in 45 minutes yet rich enough to test the fundamentals: scale estimation, key generation, caching, sharding, and trade-offs. Here is a complete, structured walkthrough you can adapt.

In one sentence: a distributed URL shortener uses Base62-encoded keys, a sharded relational store for the mappings, a large cache plus a CDN for ultra-fast redirects, and an async pipeline for analytics — tuned for a read-heavy workload of billions of URLs and millions of redirects per second.

1. Functional requirements

  • Shorten a long URL into a short ~7-character Base62 code (e.g. bitly.com/abc123d).
  • Redirect a short code to its original URL.
  • Track analytics: clicks, referrers, devices, and geography.
  • Support custom aliases (e.g. bitly.com/mycode).
  • User accounts with a URL-management dashboard.
  • An API for bulk shortening and analytics queries.

2. Non-functional requirements

RequirementTarget
Read latency< 100 ms (p99), served from cache
Write latency< 500 ms (p99)
Availability99.99% (multi-region failover)
Durability99.999% (replicated storage)
Read:write ratio~1000:1 (mostly redirects)
Shorten throughput~10K QPS
Redirect throughput~10M QPS
5-year scale100B+ URLs

The single most important takeaway: this is an overwhelmingly read-heavy system, so the whole design optimizes the redirect path.

3. Capacity estimation

Writes (shortening). At ~10K new URLs/sec, that’s roughly 315 billion/year, or about 1.6 trillion over five years.

Reads (redirects). With a 1000:1 read:write ratio, redirects run at ~10M QPS.

Storage. Each record is ~200 bytes (7-byte code, ~2 KB long URL average, metadata, indexes). At 1.6T URLs that’s ~320 TB raw, roughly 960 TB with 3× replication. The analytics log is separate and far larger — at ~100 bytes per redirect and 10M/sec, that’s hundreds of TB per month, so you archive cold data to object storage after a year.

Cache. By the 80/20 rule, ~20% of URLs serve ~80% of traffic. Holding the hot set (code → URL pointers) needs on the order of 16 TB of RAM, sharded across 50–100 cache nodes.

4. High-level architecture

A client hits the CDN; on a miss the load balancer routes to the redirect service, which checks the cache (a ~99% hit) and returns a 302 in under 100 ms. Shortening goes to a separate write service that generates a unique code via a distributed ID generator, persists it to a sharded database, and publishes an event to a queue for asynchronous analytics.

URL shortener system architecture: client to CDN, load balancer, shorten and redirect services, distributed ID generator, Redis cache, sharded MySQL with read replicas, Kafka queue, analytics service and store, and object storage.

5. Core components

CDN (Cloudflare / Akamai). Caches redirects at the edge and serves them from the nearest point of presence, cutting redirect latency from ~200 ms to under 50 ms.

Load balancer (NGINX / HAProxy). Routes shorten requests to the write service and redirect requests to the read-optimized service, since the two paths have very different latency targets.

Shorten service (Go or Node). Accepts a long URL, obtains a numeric ID, encodes it to Base62, persists the mapping, and returns the short URL. It’s stateless and scales horizontally. Slow DB writes (~100–200 ms) are acceptable because the client can absorb that latency.

Redirect service (Go). Takes a short code, looks it up (almost always a cache hit), and returns a 302. This path is effectively CPU-bound, so a low-overhead runtime like Go handles enormous throughput per machine without GC pauses.

Redis cache (code → URL). Holds the hot set so redirects avoid the database entirely. A sub-millisecond lookup replaces a 10–50 ms query and cuts database load by ~100×.

Sharded MySQL + read replicas. The source of truth. A relational store gives you a UNIQUE constraint on the short code (collision protection) and ACID guarantees, while read replicas absorb cache-miss reads.

Distributed ID generator. Hands each service instance a reserved range of unique numeric IDs (a Snowflake-style scheme of timestamp + machine + sequence works well), so there’s no global counter bottleneck.

Message queue (Kafka). Buffers shorten/redirect events so the hot path never waits on analytics. It decouples writes from aggregation and lets you replay events.

Analytics store (Cassandra / ClickHouse). A write-heavy, time-series store for clicks, referrers, geo, and device data, with cheap compression and easy horizontal scaling.

6. Key flows

Write path (shorten). The client posts a long URL → the load balancer routes to the shorten service → it requests a numeric ID, encodes it to Base62, confirms uniqueness, inserts the row, writes the mapping to the cache with a TTL, and publishes an event to Kafka → it returns the short URL. A consumer later writes the event to the analytics store.

Read path (redirect). The client requests a short code → the CDN serves it from the edge on a hit; otherwise it forwards to the load balancer → the redirect service checks the cache (a ~99% hit), returns the long URL, and emits an async click event → on a cache miss it reads a replica, populates the cache, and returns → the service responds with a 302 and the browser follows it.

7. Database design

The mapping table in MySQL:

CREATE TABLE urls (
  id           BIGINT PRIMARY KEY AUTO_INCREMENT,
  short_code   VARCHAR(7)  NOT NULL UNIQUE,   -- Base62 code, indexed
  long_url     VARCHAR(2048) NOT NULL,
  user_id      BIGINT NOT NULL,
  custom_alias VARCHAR(100) UNIQUE,           -- optional vanity code
  expires_at   TIMESTAMP NULL,                -- optional expiry
  created_at   TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  INDEX idx_short_code (short_code),
  INDEX idx_user_id (user_id),
  INDEX idx_created_at (created_at)
);

Shard by a hash of short_code so a lookup goes straight to the owning shard. Keep the schema lean — pushing title/description/tags into this table bloats writes; store extra metadata separately.

The analytics table in Cassandra is modeled for time-series reads:

CREATE TABLE clicks (
  short_code TEXT,        -- partition key
  ts         BIGINT,      -- clustering key, newest first
  ip         TEXT,
  referrer   TEXT,
  country    TEXT,
  device     TEXT,
  PRIMARY KEY (short_code, ts)
) WITH CLUSTERING ORDER BY (ts DESC);

Cassandra fits because the workload is write-heavy, time-series, needs no joins, and tolerates eventual consistency. MySQL fits the mapping because uniqueness and ACID matter there.

8. Scalability

Scale horizontally at every layer. Shard the database (start at ~100 shards, grow toward ~1000), with each shard backed by replicas that absorb reads. Scale the stateless shorten and redirect services by adding instances. Shard the cache by code hash and add nodes as the hot set grows. Let the CDN absorb the bulk of redirect traffic so the origin sees only a fraction.

BottleneckSymptomMitigation
Cache evictionHit rate dropsAdd cache nodes; raise TTL
MySQL write latencyShorten requests slowShard further; batch writes
Replication lagStale readsAdd replicas; dedicate an analytics read tier
CDN origin loadOrigin QPS spikesScale the stateless redirect tier
ID exhaustionCodes collideWiden the numeric ID space

9. Trade-offs worth naming out loud

301 vs 302. A 301 (permanent) is cached by the browser, so repeat visits skip your servers — great for load, but you lose per-click analytics. A 302 (temporary) always reaches you, giving full analytics at the cost of more origin traffic. Most shorteners choose 302 because analytics is a core feature, and the CDN absorbs most of the load anyway.

Base62 vs UUID. Seven Base62 characters yield 62^7 ≈ 3.5 trillion codes — short and shareable. A UUID is 36 characters; only reach for it when your infrastructure already standardizes on it.

Hashing vs ID-based codes. Hashing the long URL invites collisions and collapses duplicate URLs into one code (breaking per-user analytics and expiry). Generating a unique ID and encoding it avoids both, with the DB UNIQUE constraint as a final safety net.

Cache invalidation. When a URL is deleted or expires, stale cache entries must go. Combine TTL expiry with active invalidation (publish a delete event so a worker clears the cache and DB), so correctness holds even if one path fails.

Multi-region. Keep a primary region with read replicas in others; on failure, promote a replica. Asynchronous cross-region replication keeps write latency low and accepts a small recovery-point window.

10. Wrap-up

This design prioritizes redirect latency through caching and a CDN, accepts slower writes because the 1000:1 read:write ratio makes redirects the bottleneck, and scales horizontally at every layer to handle ~10M QPS and 100B+ URLs over five years. The repeatable shape — clarify, estimate, design the core, then deepen and defend trade-offs — generalizes to nearly every system design prompt.


The fastest way to internalize this is to talk through it out loud against a timer. A structured copilot like NostrobeAI can prompt the next step — requirements, estimate, core design, trade-offs — in real time while you build the muscle.

Practice with a real-time copilot

NostrobeAI brings structure to coding, system design, and behavioral interviews — in practice and live. Free trial, no subscription.

Download Free Trial