About eGURU
eGURU Process
Operational Details
Project Ideas
The Proposal Details
Intelligent Cache management for Squid Proxy Server
While caching is not traditionally a function of proxy servers, it is becoming an increasingly frequent and important feature. An increase in performance is achieved by caching the contents of an accessed location with the result that subsequent requests for access will lead to already cached contents being used. The proxy servers like squid do have catapulted caching software but they need to be improved to achieve higher caching efficiency. Current implementation of squid decides when it is right to issue a refresh or purge object from disk based on very few things, namely timestamp, expires is effectively unused field in squid?s metadata. When object is retrieved successfully, squid checks if it is private or public type. Principal logic is to "save all that come in and see later if that was effective". If anything gets into cache it cannot be selectively expired and thrown out. Very minimal control is given to cache manager on how objects will be purged when space is needed, and the default LRU algorithm does not take into account any site specific preferences. Due to these problems there is an extreme need to go for an intelligent transparent cache management. The main goal of the project is to patch up squid proxy server for efficient working.
Squid patch for Optimizing cache performance: When object is first fetched, estimate penalty of fetch - size, time needed, delay, hop-count to source, estimate cost of retrieving ? pattern configurable billing weight, possibly taking into account hop-count. During reuse (normal operation) pure hit increases average hit-rate and hit count. Object has been once more time useful to be kept, it has conserved bandwidth, time and thus some money. Refresh or reload decrease or reset object's usefulness and implicitly reduces average hit-rate and conserved money. Save last reference time independently from last refresh time to keep track of object's useful reuse count. Usefulness can drop below 0 and become negative. This is logically very much possible. Consider objects that expires every 5 minutes, and are requested no more than every 30 minutes with max-age of 5. Logically, it is cheaper not to use disk space at all for those. Squid patch for public/private data cache storage: object must be checked while caching if it is private or public type and must be stored only if it is public type. At the same time a list of email sites and private sites can be checked with squid configured not to cache them using access control lists. Content aliasing Patch for squid proxy server: There is a Content Aliasing problem in Squid due to which Squid cannot recognise duplicate content from different URLs and might actually download content that might already exist on its cache, from a different URL (mirrors). This leads to a wastage of bandwidth and cache space. While caching a content it must be checked for its presence in the form of other url but in a way which doesn?t increase the overhead. This can be done by storing MD5 hash values of the page contents and changing squids cache policy to be based on hash values rather than urls. Exclusive deletion of objects from cache: Administrator must have facility to delete specific objects from cache. Otherwise even if some page is updated and squid cache timeout has not occurred then it keeps on presenting same page to end user. Deletion can be done by searching objects in cache spool, deleting them and their references explicitly in an automated fashion.
Squid Web Proxy Server documentation http://www.squid-cache.org/ . Squid Web Proxy Server documentation http://www.squid-cache.org/ Prior work on squid project http://devel.squid-cache.org/dsa/ Duane Wessels, Squid: The Definitive Guide W. Richard Stevens,. UNIX Network Programming UNIX Programming Environment" by Kernighan and Pike. "UNIX : Concepts and Applications" by Sumitabha Das. MD5, Message-Digest Algorithm www.networksorcery.com C. Williamson and R. Bunt, ?Characterizing short-term file referencing behavior'' in Proc. Int. Phoenix Conference on Computers Communication?. (IPCCC), Phoenix, AZ, Mar. 1986, pp. 651-660. M. Arlitt, R. Friedrich, and T. Jin, ?Performance Evaluation of Web Proxy Cache Replacement Policies? Technical Report HPL, Hewlett-Packard Laboratories, June 1998. Ajay Prabhu and Sachin Bhatkar, Computer Science Department, University Of Maryland, Baltimore County, ?A Web Server Caching Proxy Mechanism?, Dec 2001 Roland P. Wooster and Marc Abrams, Network Research Group, Computer Science Department, Virginia Tech., ?Proxy Caching That Estimates Load delays?, April 1997.