fromMarch 2011

The Many Layers of Caching Drupal


Today's Internet mantra may as well be, “fast is not fast enough.” We are all concerned, some may say obsessed, with our site's performance. With response times measured in milliseconds and penalties for slowness—be they objective like Google's PageRank or subjective like your users' dislike for delays—this focus is warranted and needed. In today's competitive environment, faster performance and more users is the bottom line.

One way to improve your site's performance is through the use of caching. Caching, at a high level, is saving an answer so that it is ready the next time the same question is asked. In the context of the web, it is saving web pages, or pieces of web pages, so they can be delivered to the next user without regenerating them. A typical site's home page may change infrequently, be that measured in minutes or days, so it can be generated once, cached, and then returned quickly to each user. The result of caching pages, from the system administrator's perspective, is reduced server load. More importantly, from the user's perspective the result is faster responses from the website.

Drupal includes basic caching out of the box. There are also modules that you can use to help you with caching.

One of the most popular caching modules is Boost. If you do not have the ability to add caching layers outside of Drupal, it is a good option and starting point. Boost works by creating static (non-PHP) copies of Drupal pages that can be served more efficiently by the web server. If you have access to use additional layers of caching around Drupal’s web layer, these can be a better long term option because they use less server resources, enabling your site to scale much further.

Outside of Drupal, there are many additional levels of caching possible. From the user to the server, they are:

  • Browser caching
  • Content Delivery Network (CDN)
  • Reverse caching proxies (Varnish, Squid, Nginx)
  • PHP APC extension
  • PHP memcache extension

All browsers implement a cache of their own which stores local copies of data—images, pages, metadata, etc.—on the user's machine. The browser cache will automatically be used when visiting Drupal sites unless it is explicitly turned off by a no-cache meta tag or header, or voided through the addition of a randomized string to the end of a URL.

A content delivery network (CDN) is a Geo-dispersed network that stores content closer to your user. CDN's offer two distinct advantages for a developer. First, they reduce the load on a server by returning data to the user without requiring the user to interact with the primary server(s). The second advantage is they increase performance and capacity by serving users from the closest geographic location. A typical CDN, for example, will have an East Coast and West Coast U.S. presence, so they can serve users in California from a West Coast datacenter and users in New York from an East Coast datacenter. Unlike the other caching options though, a CDN will always involve an additional financial cost.

The best caching level to implement in terms of bang for your buck is a reverse proxy such as Varnish. Varnish runs on layer 7 of the OSI model, in place of a regular HTTP server such as Apache on port 80, so it is the first “web server” the HTTP requests interact with on your server. Varnish is configured with Varnish Configuration Language (VCL) files that give a developer or administrator very granular control over how to handle the HTTP requests. Using Varnish, you can do everything from caching a specific page that was just listed on Slashdot or Drudge for a certain amount of time, to rewriting bad, old URLs, to embedding C code for handling complex geoip functionality or caching algorithms. Varnish is very fast at handling these requests, and can be made even faster if you store your cache in RAM—which you should always do if possible.

If a user's request cannot be returned by any of the above caches, then the web server itself must process the request, generate the data, and return it to the user. If PHP needs to access the database or run complex functions, it adds to the resources and time required by PHP. The ideal situation is for a request to be handled without running any PHP. If that is not possible, you want to run as little PHP as possible, and even then preferably opcode cached PHP.

Inside the web server there are several options for caching PHP; the most popular is APC, which is an Apache extension to cache PHP's opcodes. This allows PHP to skip the parsing and recompiling of PHP code for every request. APC should be used whenever possible since no code changes are required to take advantage of the performance boost.

Memcached, a distributed memory object caching system, is another level of caching. The Memcache API module helps easily integrate Memcache with Drupal and provide some immediate results. Drupal’s memcache module uses Memcached to replace the cache_* tables in Drupal’s database.

If the web browser, CDN, or Varnish doesn’t have a cached response for the HTTP request, then Drupal itself will check its cache. If the answer is not there, it will gather all of the pieces of the response, combine them together, and reply to the user. This process is more resource intensive than any of the options listed above, so you want limit these requests as much as possible.

The number of cached requests a server can handle is much larger than the number of non-cached requests. By adding some or all of these caching levels you simultaneously increase your site’s performance and the number of concurrent users you can support on your site. Drupal 6's caching capabilities were greatly enhanced in the Pressflow distribution, and Drupal 7 has continued the improvements. Both Pressflow and Drupal 7 give you a solid caching foundation, enabling you to serve more users, faster, with fewer resources.

In today's environment, faster performance isn't just a nice-to-have; it's a business requirement.