Coding June 13, 2026 · 9 min read

System Design Interview: A Framework for Any Question

The hardest part of a system design interview isn’t knowing what a load balancer does — it’s the blank page. The prompt is two words (“Design Twitter”), the clock is 45 minutes, and there’s no single correct answer. Unlike a coding round, the interviewer isn’t checking your solution against a key; they’re watching how you think under ambiguity.

The good news: every well-run system design interview follows the same arc. If you internalize one repeatable framework, you can walk into any prompt — a URL shortener, a ride-sharing app, a chat system — and have a confident first move. This post is that framework, step by step. (For a single problem worked end to end, see a full worked example: design a URL shortener.)

What the interviewer is actually grading

Before the framework, know the rubric. A system design interview is a 45–60 minute collaboration, and the interviewer is looking for:

Ambiguity navigation — can you turn “Design YouTube” into concrete, scoped requirements?
Trade-off awareness — can you justify SQL over NoSQL, or push over pull, with reasons?
Breadth — do you know how the standard building blocks (caches, queues, load balancers, databases) fit together?
Communication — do you think out loud, take hints, and keep the conversation moving?

You are not being graded on building a flawless system. You’re being graded on building a defensible one and explaining the trade-offs you made.

The framework: six steps, every time

Follow this order on every prompt. The biggest mistake candidates make is jumping straight to boxes and arrows. Resist it.

Step 1 — Clarify requirements and scope (5–10 min)

Never start designing until you’ve defined the boundaries out loud. Split requirements in two:

Functional — what the system does. For a ride-sharing app: riders request rides, drivers broadcast location, the system matches them.
Non-functional — the constraints: scale, availability, latency, consistency, durability.

Then deliberately cut scope. Say “I’ll focus on the core feed and skip notifications and DMs for now.” Narrowing the problem is a signal of seniority, not a cop-out.

Step 2 — Capacity estimates (5 min)

A few back-of-the-envelope numbers prove your design can carry the load and tell you where the pressure is. Estimate:

Traffic (QPS). 100M daily active users × 10 requests/day ≈ 1B requests/day. There are ~86,400 seconds in a day (round to ~100,000), so that’s roughly 10,000 QPS average — and plan for peaks several times higher.
Storage. 10M photos/day × 1MB ≈ 10TB/day. Multiply out for a year to size your data store.
Read/write ratio. This single number often dictates the whole design. A 100:1 read-heavy system pushes you toward aggressive caching and read replicas.

Say your assumptions out loud and write them down — the interviewer grades the logic, and they can only follow it if you verbalize it.

Step 3 — API design (5 min)

Sketch the handful of endpoints that satisfy your functional requirements. This pins down the contract before you draw infrastructure:

POST /posts — create a post
GET /feed?userId=&cursor= — fetch a paginated timeline

Note things like pagination (cursor over offset at scale) and idempotency keys for write operations. A clean API keeps the rest of the design honest.

Step 4 — Data model (5 min)

Define the core entities and their relationships, then choose a store because of your access patterns — not by habit:

Relational (PostgreSQL, MySQL) — structured data with relationships, strict ACID guarantees. Right for ledgers, payments, inventory where integrity is non-negotiable.
NoSQL (Cassandra, DynamoDB, MongoDB) — built for horizontal scale and high write velocity, usually trading strict consistency for eventual consistency. Right for feeds, chat, time-series metrics.

State the trade-off explicitly: “I’ll use a relational store for the user/payment data and a wide-column store for the feed, because the feed is write-heavy and tolerates eventual consistency.”

Step 5 — High-level architecture (10 min)

Now draw the boxes. Start simple: client → API gateway → application servers → database. Confirm it satisfies the functional requirements before adding anything. A monolith that works is a better starting point than a premature microservice mesh.

Step 6 — Scale it (10–15 min)

This is where the interview is won. The interviewer asks “what happens at 100× traffic?” and you walk through the standard scaling levers, introducing each one only when a bottleneck justifies it.

The scaling toolbox

These are the moves you reach for in Step 6. Know the trade-offs, not just the names.

Caching

The fastest query is the one you never make. Cache frequently read data in memory (Redis, Memcached) to cut latency and database load.

Where: browser → CDN (static media, geographically close) → server/API response cache → database query cache.
Strategies: cache-aside (app reads cache, falls back to DB on miss), write-through (write cache + DB together — consistent, slower writes), write-back (write cache, sync to DB async — fast, risks loss on crash).
Eviction: LRU (least recently used — the default answer), LFU, FIFO. Memory is finite; something has to go.
Advanced flex: the thundering herd — when a hot key expires, thousands of concurrent misses stampede the database. Fix with a mutex so only the first request rebuilds the cache, or refresh the key asynchronously before it expires.

Load balancing

To scale horizontally (add servers) instead of vertically (bigger server), you need to spread traffic across the fleet. Almost always design for horizontal scaling in an interview — vertical scaling hits a hardware ceiling and leaves a single point of failure.

Layer 4 routes on IP/port — fast, no app context. Layer 7 routes on HTTP headers/paths — slower, but enables smart routing.
Mention algorithms: round robin, least connections, IP hashing (useful for sticky sessions).

Sharding

When one database can’t hold the data or serve the load, partition it:

Vertical partitioning splits by columns (auth data on one store, profiles on another).
Horizontal partitioning (sharding) splits by rows (user IDs 1–1M on shard A, 1M–2M on shard B).

Watch for hot spots — shard by user ID and a single celebrity can swamp one shard. And don’t shard with naive hash(key) % N: adding a server changes N and forces a near-total reshuffle. Consistent hashing maps servers and keys onto a ring so adding or removing a node remaps only a small slice of keys.

Asynchronous processing with queues

For slow work (video encoding, email blasts, PDF generation), don’t make the user wait. Drop a job on a message queue (Kafka, RabbitMQ, SQS), return success immediately, and let background workers process at their own pace. This decouples components, smooths traffic spikes, and adds fault tolerance — if a worker dies, the message stays in the queue.

Rate limiting and the API gateway

The gateway is your single entry point — auth, routing, SSL termination, and rate limiting to fend off abuse and runaway clients. Know a couple of algorithms: token bucket (tokens refill at a fixed rate; empty bucket drops the request) and leaky bucket (requests drain at a constant rate, smoothing bursts).

The trade-offs that separate seniors

Components are table stakes. For senior and staff roles, you’re expected to reason about the constraints behind them.

CAP theorem

In a distributed store you get two of three: Consistency, Availability, Partition tolerance. Network partitions are a fact of physics, so you must keep P — which means the real choice is C vs A:

CP (return an error rather than stale data): banking, ledgers.
AP (always answer, even if slightly stale): social feeds, shopping carts.

Monolith vs. microservices

A monolith is simple to build, test, and deploy, with no internal network hops — but it scales as one lump and one bad module can take down everything. Microservices scale independently and isolate failures, at the cost of real operational complexity: network latency, distributed transactions, heavier CI/CD. Don’t reach for microservices unless the scale justifies them.

Idempotency

If you design payments or checkout, this is mandatory. A dropped success response makes the client retry — and without protection, the user is charged twice. Have the client send a unique idempotency key; the server checks a fast cache for it and returns the cached result instead of reprocessing.

Worked example and practice

A framework only sticks once you run it on a real prompt. Walk through the full URL shortener design to see all six steps applied end to end — requirements through sharding — on a single problem. Then drill the fundamentals with system design interview questions, and shore up the building blocks with DBMS and operating system question sets.

Five habits that win the room

State assumptions out loud. “I’m assuming 100:1 read-to-write.” The interviewer grades logic they can hear.
Take the hints. “Are you sure a relational DB is best here?” isn’t a gotcha — it’s a nudge. Pause and re-evaluate.
Don’t over-engineer. Start simple, add complexity only when a bottleneck demands it. A multi-region Kafka mesh for a URL shortener gets you penalized.
Keep talking while you draw. Going silent for five minutes loses the room. Narrate your thinking.
Rehearse out loud. A mock interview or even talking to yourself surfaces the gaps a silent read never will.

There’s no perfect system — every choice introduces a new bottleneck, failure mode, or cost. The goal isn’t a flawless design; it’s a defensible one you can reason about under pressure. Clarify, estimate, design the API and data model, draw the architecture, then scale the bottlenecks. That sequence works on any prompt they throw at you.

When the prompt goes sideways

A framework gets you through the questions you expect. The hard moments are the follow-ups you didn’t — an edge case in your sharding scheme, a consistency question you didn’t rehearse. That’s where a real-time copilot earns its place: NostrobeAI listens to the question and drafts a clear, structured answer on your screen — invisible on Zoom, Google Meet, and Microsoft Teams — with one-time pricing instead of a subscription. See how it stacks up against other AI interview tools, and walk into your next system design interview with the framework and a backstop.