fromMarch 2011
Column:

Drupal: Secure by Default, Inconsistent by Nature

0

There are lots of considerations to take into account when building a software platform that can be extended by third-party modules. One major goal is that the system should be secure from the minute you install it, and that’s pretty easy for a project with a lot of talented contributors like Drupal. But it’s not so easy to create a system that will be secure from install and stay secure as you change the configuration, add modules, write modules, and create your own theme.

What is a “secure” web application?

A web application is secure if private data is kept private, the site cannot be forced offline or into a degraded mode by a remote visitor, the site resources are used only for their intended purposes, and the site content can be edited only by appropriate users. - Chapter 1, Cracking Drupal by Greg James Knaddison

Most software projects don’t have to worry about this problem as much as a modular framework like Drupal does. They build software and it gets used. There is no need to make life easy on developers and themers. For Drupal, though, contributed or custom modules are key to a typical site and it is important that the core API is easy to learn and helps site builders keep their product safe.

Drupal has increasingly taken the stance that security should be built into the API and those who use it properly get protection as a hidden benefit. This means that sites are more likely to have safe code even if their owners don’t understand the complexities of security issues. However, many people both inside and outside the Drupal community feel that this is inconsistent “magic” and should be avoided, suggesting that those who don’t understand the issues are likely to accidentally introduce them if and when the API fails their use case.

Let’s review some examples of how Drupal’s API is safe by default and see if we can’t determine which is better.

Several “safe by default” functions

First, let’s take a look at the l() function (a lower case L), which takes several arguments and returns a formatted HTML link.
The shortest form follows and hasn’t changed from Drupal 4.6 until now:

l(‘Register’, ‘user/register’);

The result of that would be a string of HTML:

<a href="//www.drupalwatchdog.net/user/register">Register</a>

But sometimes one or both of the inputs to l() is a variable that contains user input. Fortunately, l() will run check_plain() on the link text (the first argument) and sends the URL (the second argument) through check_url(url()), which confirms that it’s a valid URL, urlencodes certain characters, and removes potentially harmful destinations using a whitelist of protocols considered to be safe.

A free benefit of this function using url() is that if you use aliases or the URL rewriting mechanisms of Drupal then the rendered HTML will be updated to reflect those values. You get properly formatted HTML and it happens to include text filtering to protect against Cross Site Scripting (XSS). In my opinion, this is an ideal situation!

Some similar features that follow this practice of filtering by default include:

  • t(), when used with the @ or % placeholders, is safe from XSS
  • the Drupal 6 and below db_query function is safe from SQL Injection as long as all user input is passed via the %s, %f, %d, %b style placeholders and not simple string concatenation
  • DBTNG - the Drupal 7 and above database API - is safe from SQL Injection as long as all user input is passed in via parameters and not via string concatenation
  • the Form API includes a token validation process that prevents cross site request forgeries
  • the Contact module provides flood protection limiting users to 3 messages (or 3 spams) per hour by default

If we look at the changes between the Drupal 6 and 7 database API, we see some real progress in both Developer Experience (DX) and security. Compare these two queries. The first is for Drupal 6 and has a security vulnerability; the second is for Drupal 7 and is safe.

$result = db_query("SELECT nid, title FROM {node} WHERE title LIKE %s", array(
  $user_submitted,
));

$result = db_query("SELECT nid, title FROM {node} WHERE title LIKE :title", array(
  ':title' => $user_submitted,
));

In Drupal 6 and below, developers needed to not only use the proper placeholder (%s, %f, %d, or %b) they also had to properly enclose them in single quotes if those were appropriate for the data type. Drupal 7 removes both the need to know which data type and the need to enclose the placeholder in single quotes. Again, this is a great example of combining functionality, ease of development, and security into one API.

For a final example, consider the Form API:

  $pants = drupal_map_assoc(array("Khaki", "Monkey", $user_submitted));
  $form['pants'] = array(
    '#type' => 'select',
    '#title' => t('Choose your pants'),
    '#default_value' => "Monkey",
    '#options' => $pants,
  );

If a user tries to inject JavaScript into the $user_submitted value, it will be sanitized by the Form API which uses check_plain() on both the value parameter and the user-facing text. This makes sense as HTML tags are never appropriate in a select option:

<option value="&lt;script&gt;alert(&#039;xss&#039;);&lt;/script&gt;">
&lt;script&gt;alert(&#039;xss&#039;);&lt;/script&gt;</option>

These are just a few examples of the many places that Drupal does extra work to make your site secure against a broad range of attacks.

Several functions aren’t safe by default

However, this philosophy of providing functions that are “safe by default” gets folks into trouble. More than once I’ve heard people say “Well, t($variable) is sanitized by the t function” or “db_query(“SELECT * FROM node WHERE title = “. $title)” is safe because it uses the database API.

Those examples are just poor code that ignore Drupal best practices. The developer who makes those kinds of mistakes is also unlikely to know how to write secure code in general. Can we make it easier for them to write code and write it securely? Probably so - DBTNG not only is easier to use for a variety of query types, but it is also more likely to be safe since it requires the developer to have even less knowledge of how to write proper queries.

Improper use of the Drupal API is a great way to introduce security vulnerabilities. What about some scenarios where the Drupal API is likely to trick you into making mistakes?

The drupal_set_title function in Drupal 6 and below takes a string of HTML and displays it to end-users untouched: unlike the l, t, and FAPI option elements, it does no filtering. In many situations, this is fine: the arguments to drupal_set_title() are likely to be the results of a call to t() so they should be safely sanitized. However... the talk.module in 2008 used drupal_set_title() on user-submitted node titles without any filtering.

Since the page title doesn’t usually contain HTML, it was changed in Drupal 7 to take a second, optional parameter to state what kind of filtering it should perform, if any. By default, it will run the text through check_plain() which matches the philosophy of “secure by default” while allowing a developer to do something different if they know that HTML is acceptable and they’ve done their own filtering. In this case, the developer experience has been sacrificed because they now have to remember the second function argument if they want non-default behavior and they may initially be confused why their HTML isn’t being printed as they expect it to. Safer by default, but some cost to our cognitive load.

In Drupal 6, a developer had to remember that drupal_set_title() and the similarly named drupal_set_message() both behaved the same in terms of text filtering. But a developer learning Drupal 7 is left with a situation where drupal_set_title() will escape her HTML unless she uses a weird parameter while drupal_set_message() cannot be coerced into doing any filtering along the way. That makes some sense because drupal_set_message() is very likely to contain HTML and it would be annoyingly repetitive to have to constantly give a second argument to drupal_set_message telling it to let HTML through. However, the lack of filtering in drupal_set_message could also lead to a situation where the developer assumes drupal_set_message() will do filtering just like drupal_set_title() and therefore not do their own filtering. The inconsistent API may make sense on deep reflection, but without that knowledge it can cause vulnerability-inducing mistakes.

Now we’ve seen how security improvements can introduce DX problems and situations where a logical API can be made inconsistent compared to other functions in ways that make it likely to confuse developers to the point that they create vulnerabilities. Given these examples, is it really a good idea to try to make Drupal magically safe? Should we instead focus on just educating folks about secure coding practices rather than trying to create more magical functions?

The Future: Consistency, Security and More Education

It’s hard to say exactly what the solution is here. Good API design is hard. I’d really like to think that with each iteration of Drupal we will improve both the consistency of the API and push the “secure by default” philosophy into more and more areas. These are the kinds of systemic changes we can undertake to make Drupal a winning choice when compared to other systems.