fromMarch 2011

Design Patterns of Drupal


"Good programmers are lazy," goes the saying. That doesn't mean that good programmers avoid doing their job; it means they avoid doing work they don't need to do. One of the best ways to do that is to not solve two separate problems but to find the common elements of them and solve them together, once. Usually this is called "code reuse", and is a staple of any software professional's repertoire.

Even more valuable than code reuse, however, is concept reuse. There are certain problems that come up over and over in software development, regardless of the language or platform, and a great way to be productively lazy is to not have to reinvent a solution over and over again. In fact, software engineers have developed a common vocabulary for concept reuse: Design patterns.

What is a Design Pattern?

A design pattern is not runnable code; a design pattern is a conceptual template for a given set of problems. Suppose you have, say, a set of values and you need to apply the same logic to all of them. If you naturally jumped to "ah, I need an array and I need to iterate over it with a foreach() loop", then you're on the right track. Software design patterns are the same concept, writ large. Understanding common design patterns can save time trying to come up with a solution, as well as offer a better solution than you may have come up with on your own.

On the flip-side are anti-patterns. Where a design pattern is a template for a common good approach to a problem space, an anti-pattern is a very common but still dumb approach to a problem. Knowing common anti-patterns is also important, as it helps head off stupid ideas before they get written.

Although most design patterns are usually described in object-oriented terms, the concepts behind them are fairly universal. Drupal actually leverages a number of common design patterns, as well as having a few idiomatic ones of its own. Let's have a look through Drupal 7 at some design patterns, and how we can leverage the same concepts in our own code.

Observers, visitors, and peeping toms

Observer and Visitor are two of the most common design patterns discussed, and are extremely important. We mention them together as they are very similar concepts.

In traditional Observer pattern, one object, called a subject, maintains a list of other objects, called observers, that it will notify when something happens (for some definition of "something"). That allows those observer objects to react to that event in some way. Most importantly, any object with the proper interface can observe another object. That means you can extend the behavior of a system without modifying existing code, which is one of the hallmarks of a good design.

Implemented in PHP, a trivial example might look like this:

class Subject implements Observable {
  protected $observers = array();
  public function addObserver(Observer $o) {
    $observers[] = $o;
  public function sayHello() {
    foreach ($this->observers as $o) {

class France implements Observer {
  public function speak() {
    print "Bonjour!";

$s = new Subject();
$s->addObserver(new France());
$s->sayHello(); // prints "Bonjour!"

With this setup, we can add an unlimited number of actions that happen when the sayHello() method is called. This is the essence of event-driven programming; something happens and it causes some arbitrary number of other things to happen, which can be configured without rewriting the whole system.

The Visitor pattern leverages the same concept, but is used to extend the behavior of the subject rather than simply respond to events. In essence you pass a visitor object to a subject and tell it "run this on yourself". In the simplest case, we would take the example above and pass $this to the France object, which in turn would call methods on the subject to change its state. Visitor is especially used in single-inheritance languages (such as PHP and Java) to add functionality to an object at runtime, or at least work-around the inability of the language to do so natively.

Drupal is actually built on the twin patterns of Observer and Visitor, although it doesn't call them by name. Instead, it calls them hooks. hook_node_load(), hook_user_login(), and so forth are, in essence, observers on nodes and users. hook_form_alter(), hook_node_view(), and so forth are, in essence, visitors. Because Drupal does not differentiate between the two, some hooks could arguably be considered both patterns but the principle is the same. Rather than an active registration process as above, though, Drupal relies on magic function naming to "register" an observer or visitor. That makes sense in PHP where we would otherwise have to actively re-register every hook on every page request.

Factories and commands

Drupal 7's new database layer implements a number of common design patterns in a more traditional way, as it is primarily an object-oriented system. Two in particular are worth mentioning, as they serve as an example for how Drupal's increasingly complex systems can be simplified: Factories and Command objects.

There are several variations on the Factory pattern, but they all boil down to the same idea: One object, a client, asks another object, a factory, for an appropriate implementation of a piece of logic, but doesn't care which it is. That's up to the factory to decide. Consider the db_insert() function, the important bits of which are show below:

function db_insert($table, array $options = array()) {
  // ...
  return Database::getConnection($options['target'])->insert($table, $options);

The Database::getConnection() method is a factory that returns a connection object appropriate for this site. There are lots of factors to determine which object to return, such as whether we're dealing with a MySQL or SQLite database or if there is a slave server available. As the caller, we don't care. We just want "the right connection object", which the factory builds for us. (Actually it keeps an index of them and just returns the right one, but close enough. It's just a pattern, right?)

We actually have a second factory here, too. The connection object's insert() method doesn't run an insert query. It creates a brand new InsertQuery object and returns it to us. Actually, it could return an instance of InsertQuery_mysql or InsertQuery_pgsql or any other subclass of InsertQuery; again, as the caller we are letting the factory decide what the appropriate object to return is. This is an excellent way to provide pluggable systems; as long as all possible objects we could get back have the same interface (ideally defined by the interface language construct in PHP), we don't care which class it is or what logic it has internally. Factories can also take care of complex object configuration for us, vastly improving the usability of an API.

InsertQuery itself is an example of the Command pattern. In the Command pattern, the process of "doing something" is wrapped up into an object. We then create a new instance of that object, configure it, and execute it, causing the action to take place.

Why not just do the action in the first place? There are many reasons. For one, if the "configuration" is complex it can be far far easier to specify how to run the action as a series of method calls that validate configuration instructions than to pass a half-dozen parameters into a function, or to define a giant, difficult to document, difficult to debug array and hope we didn't make a typo somewhere. That is the case here, where InsertQuery can, depending on the options, insert one or many rows into a database table and there are a number of ways we can specify what rows to insert. The usability for the developer is far better with a command object than with a big, convoluted array.

Another benefit is flexibility. We can create a series of command objects without executing them and save them for later, or we can allow other systems to modify the command before it's run using the Observer or Visitor patterns above. In fact, the SelectQuery object, another command object, does exactly that: It passes through the hook_query_alter() visitor hook to allow other systems to modify the query before it's run.

Another benefit of command objects that Drupal does not currently leverage is encapsulation of behavior. Once a command object is run we still have it, and when it executes we can record whatever we want to within the object. One common use-case is an Undo operation. A command object can know exactly what steps it took, and thereby what steps to un-take. That makes an undo command quite simple: Ensure that all related actions (like SQL queries) are controlled by the command object, record in the object how to reverse them (if possible), and then keep the object around as long as you care to so you can call $command->undo() on it if necessary.

Doctor Drupal's Dependency Injection

One of the most important design patterns for sustainable software is called Dependency Injection. As with Factory there are lots of variations, some considerably more complex than others, but all boil down to the same basic concept: A given algorithm should never have to request the resources it needs but should be given its resources. That is, its dependencies should be "injected" into it rather than it having to go out and get them.

At the simplest level that is simply proper use of function parameters. If a function needs to know the ID of a node on which to operate, it should take a $nid parameter rather than needing to go out and find the right node ID from some magical location. That makes the code much more flexible.

As an example, back in the bad old days of Drupal 4.6 the way we would create a page callback that cared about a node was:

function example_page() {
  if (arg(0) == 'node' && is_numeric(arg(1)) {
    $node = node_load(arg(1));
    if (arg(2) == 'example') {
        // Do useful stuff here.

This is an example of a particular Drupal anti-pattern: arg() (properly pronounced "ARGH!") While easy to write, it is horribly brittle. It hard codes a particular function to be useful in exactly one place: on the path node/$nid/example. The problem here is that the function is actively requesting information from global state. Global state is a generally horrid idea because it makes it impossible to encapsulate parts of the system and make them reusable.

If, for instance, we wanted to reuse the page callback above at some other path the only option was to manually manipulate $_GET to pretend that we're at node/$nid/example and hope that nothing else breaks as a result. If you just threw up a little at the idea, good. It means you realize how bad that anti-pattern is.

The solution to such hard-coded logic is dependency injection. In Drupal 6, the menu system was gutted and rewritten to be a multi-step process. We would now implement the same page callback like this:

function example_menu() {
  $items['node/%node/example'] = array(
    'page callback' => 'example_page',
    'page arguments' => array(1),
    // ...
  return $items;

function example_page($node) {
  // Do useful stuff here.

In this new setup the menu router gets more complex. In return, however, the page callback gets the node on which it depends passed to it, that is, injected into it. That gives us a number of benefits. For one, we can now move example_page() to a new path, or attach it to multiple paths, without modifying it. It is easier to read. It is also now possible to unit test, since it should vary only by the $node that is passed into it. That means we can call it with a fake $node object to make sure it works as expected.

That is a very simple example, however. What if example_page() depends on a number of other values besides a node? What if the values it depends on themselves depend on values in $node?

That's where dependency injection can get complicated. There are a number of ways to handle more complex use cases. The first is to switch from simple functions to objects. An object offers many more ways to pass in relevant information. We can specify a value in a constructor, or with a setX() method of some sort. And if there's a lot of dependent objects or values we can simplify the process by using a Factory to do it for us.

Going back to the database layer, that is exactly what it does. Earlier, we created a new InsertQuery object. That object requires a connection object, which it will call to actually execute the query. The insert() method of the connection object provides it with that object without us having to worry. It looks like this:

public function insert($table, array $options = array()) {
  $class = $this->getDriverClass('InsertQuery', array(''));
  return new $class($this, $table, $options);

The first line is the factory logic to determine which version of the InsertQuery class we will use. The second line creates a new query object, and injects into it all the information it will need: The connection object ($this), the table that it will be inserting into, and any user-specified options. In the majority case, all of that work is handled for us by the factory so module developers need only call

  ->fields(array('id' => 1, 'name' => 'Example'))

and everything just works.

Go for Broke(r)

What if we don't know in advance what dependent information, or context, we will need, or if it will vary? That makes dependency injection considerably more difficult. Currently Drupal has no examples of good solutions to this problem. Various other frameworks, PHP or otherwise, have some sort of framework for this case, often called a Dependency Injection Container. These can vary from little more than an array of objects to a complex interconnected mess, depending on the implementation. In Java, it's not uncommon for a system to have an XML-based configuration for dependent context objects.

One common approach is some variation on the idea of a Broker or Mediator. In this design, an object doesn't request information from another object directly. Instead it asks an intermediary object, which may have been injected into it, to make such a request on its behalf. While it still means the first object has to actively request information, it is only tightly coupled to the mediator object and not to the myriad of other objects it may need to request information from. That means if those other systems change we need only update the mediator, not every system that touches it.

Patterns, not code

Remember that design patterns are a common vocabulary and stockpile of tried-and-tested ideas. They are not themselves code, although they will often suggest a given way of writing code. Most traditional Observer implementations in classic languages (Java, PHP, C++, etc.) will look very similar, even if they have subtle differences.

By studying design patterns, however, and learning how to apply them, we can save time by not having to over-think a program. Someone else already thought it through for us. Instead of reinventing the wheel, and probably making a number of mistakes along the way that cost time and result in limited flexibility, we can rely on a tried-and-tested approach that we know will have fewer gotchas than trying to invent it from scratch.

It also gives us a common vocabulary and background to understand code we've not seen before. If you're looking at Drupal 7's new database layer for the first time it may seem like a convoluted mess of objects passing around to each other needlessly. Why so complicated?

If, however, you understand the concept of command objects, factories, and dependency injection, and are able to recognize their telltale signs then everything comes into focus. The code itself is just as complex, but because it's a common and often-used type of complexity it is easier to understand. Then instead of trying to figure out why the developer bothered to pass this object to that object (because the pattern calls for that for these reasons you already know) you can focus on figuring out what the developer is actually doing with that object. That's another good reason to rely on common patterns in your own code, too. It makes bringing on new people to help out much easier.

Further reading