About eGURU
eGURU Process
Operational Details
Project Ideas
The Proposal Details
Linux scalability & Web Server Optimizations ? The C10K problem
With ever increasing Internet traffic and decreasing hardware costs, day is not too far when web servers will be handling tens of thousands of clients simultaneously. This project primarily aims to configure the operating system along with the use of some network programming, to enable web servers to handle such large amount of clients simultaneously.
As stated this project will deal with idiosyncrasies of various network applications at the level of socket programming. So, its advised to make your self comfortable with these concepts, and [1] will right thing to start with, along with developing some toy networking application like ftp server. Development of toy application is that it will help in getting acquainted with problems faced during development of various applications and design goals to consider while dealing with network application. Designers of networking software have many options. Here are a few: * Whether and how to issue multiple I/O calls from a single thread. o Don't; use blocking/synchronous calls throughout, and possibly use multiple threads or processes to achieve concurrency. o Use non-blocking calls to start I/O, and readiness notification to know when it's OK to start the next I/O on that channel. o Use asynchronous calls to start I/O, and completion notification to know when the I/O finishes. * How to control the code servicing each client. o one process for each client (classic Unix approach, used since 1980 or so) o one OS-level thread handles many clients. o one OS-level thread for each client (e.g. classic Java with native threads) o one OS-level thread for each active client. * Whether to use standard O/S services, or put some code into the kernel (e.g. in a custom driver, kernel module, or VxD) Other than above design goals there are many other issues that should be looked upon like: * Kernel Issues. * Issues related to file handlers. * Exploring different kinds of servers. * Library related issues. Many more such issues can be found in [2]. Prime goal of these projects is to find the optimal configuration, both at Operating System level and network application level which helps in minimizing the latency while serving clients in order of tens of thousand. This problem is popularly known as C10K problem where C stands for Client and K stands for 1000. Along with this there is one more interesting problem, ?thundering herd?, which also deals with network scalability. [3] contains proper details of this problem. This project is more like a research project with no definite deliverables, but its proper completion will require in depth study of various aspects of networking applications and web servers.
1. Stevens, W Richard, UNIX Network Programming, Volume 1: Networking APIs:Sockets and XTI, 2nd Ed., Prentice-Hall, Inc., 1998 2. http://www.kegel.com/c10k.html 3. http://www.citi.umich.edu/projects/linux-scalability/reports/accept.html 4. http://www.citi.umich.edu/projects/linux-scalability/ Login Panel Username Password New user Guidelines Sign Up Student Groups Mentors Organization/Industries Academic Institutions Sponsors