Scalability bottlenecks

The C10K problem

Web servers were not able to handle more than 10,000 concurrent connections

With each request that is currently active in the system, one thread is hogged up and competes for resources like CPU and memory with all other threads
The increased number of threads leads to increased context switching
Networking problems: Each new packet would have to iterate through all the 10K processes in the kernel to figure out which thread should handle the packet.
The select/poll constructs needed a linear scan to figure out what file descriptors had an event.

Based on this learning, web servers based on an entirely different programming model (called the eventing model) got designed. Nginx is an example:

It is designed for high performance and extensibility. It consists of a limited set of worker processes (usually one per CPU core) that route requests using non-blocking event-driven I/O (using the non-blocking provisions of the native kernel such as epoll and kqueue )
There are a static number of processes, typically one for each core
Each process handles a large number of concurrent connections using asynchronous I/O and without spawning separate threads. They manage a set of file descriptors (FDs)(one for each request) and handles events on these FDs using the efficient epoll system call.

Caches expires at the same time, resulting in a lot of simultaneous requests to the backend

One way to avoid this is to slightly randomize the cache TTLs