What is load balancing?
A load balancer distributes incoming requests across multiple servers so no single server is overwhelmed. It is fundamental to scaling horizontally and to staying available — if one server fails, the balancer routes around it.
Common algorithms include round robin (rotate through servers), least connections (send to the server with the fewest active requests), and hashing on a key like client IP for stickiness. Load balancers operate at the transport layer (L4, routing by IP/port) or the application layer (L7, routing by URL or headers), and they typically run health checks to stop sending traffic to unhealthy servers.
In system design, a load balancer sits in front of your stateless app tier; you keep the app stateless (sessions in a shared store) so any server can handle any request. A strong answer mentions the balancer also enables zero-downtime deploys and pairs with autoscaling to handle traffic spikes.