The Future of Drupal: Drupal 8 and Beyond

It's the year 2000. The NASDAQ is at its all-time high. However, ADSL is barely standardized yet, and cable broadband is still new and expensive. A dutch student writes a piece of software to facilitate and utilize the sharing of an ADSL connection between eight friends. The software moves to the Internet where it powers a message board. When feature requests begin to pour in, Dries Buytaert, the author, open sources Drupal “here, do it yourself” engaging the community in development who answers the challenge...

Technical:

The Drupal software is growing rapidly. The first version in CVS is less than 4,000 lines of code (plus one theme). In the beginning of 2005, Drupal 4.6 gets close to 30,000 lines (plus several themes). Drupal 7 in early 2011 gets released with close to 100,000 lines (plus themes and tests).

The software changes as it grows. It started as a message board and soon became a platform for experimentation. Support for aggregating third party content via RSS was added in 2001 and Drupal was generating feeds even earlier – years before RSS became popular.

The world does not stand still. The semantic web slowly is becoming a reality. AJAX made the dream of desktop-like applications inside the browser a reality.

The Drupal software continues to widen its reach. AJAX has been supported for years now and Drupal 7 adds RDF support.

However, there are many problems we need to face. The organic growth of the software means many parts are very seriously outdated and no longer adequate.

To this day, input for Drupal mostly means a form filled in a browser. Output is almost always an XHTML page.

Drupal for many years has worked like this: First bootstrap Drupal fully, initializing a lot of data (current language, session, path, theme, user etc). Then decide which function handles the page and call it. Once that's done, bolt on the blocks and render an HTML page.

This does not work well if we want to use Drupal as a service platform or dynamic web application. Both of these are cases that require just a small piece of information encoded in JSON or XML as a reply, and not the whole page. Although Drupal 7 made it easier to send non-HTML replies, there is no facility to just send a small part.

Also, the hardwired workflow of the page makes it impossible to use a smarter layout manager. So, we plan to create a central “clearinghouse” for context information. Instead of pre-calculating all sorts of information in the beginning of every request, this system will most likely work on demand. Once that's done, parts of the page can be served individually.

For many years, we have discussed whether comments should be nodes. Instead, we have introduced entities in Drupal 7 and made comments and nodes (and users and taxonomy terms and files) into entities. Fields can be attached to entities and the Field API is relatively decent – but the entity API is almost nonexistent. You can load an entity, but there is no consistent API for saving or deleting one.

In Drupal 8, the legacy save functions will go away and be replaced by a consistent entity API. There are two competing approaches: CRUD (Create, Read, Update, Delete) and CRAP (Create, Read, Archive, Prune). Which approach we will take is currently being debated, but the latter would solve our issues with revisions and thus is my preference.

Translation:

The Drupal community is growing rapidly. By mid-2004, there are more than 10,000 users on Drupal.org. While the first face-to-face meeting in 2005 draws 30 people, in 2010 DrupalCon Francisco is attended by 3,000.

The community changes as it grows. It started with people who knew each other in real life but soon it became an international community where the only connection is Drupal and almost all communication happens online. It was no longer a given that a user could hack PHP – to the contrary.

Of course, the world did not stand still. The vision of ubiquitous Internet became a reality as end users have migrated from clunky desktops to laptops and then to smart phones.

The Drupal community continues to widen its reach. Among the Drupal 7 contributors we can find renowned designer Mark Boulton, well known security expert solardiz, and so on.

However, there are many problems we need to face. The organic growth of the community means many processes are very seriously outdated and no longer adequate.

To this day, changes to Drupal core mostly are done by an ad-hoc patch. Anyone can write a patch, and anyone can review it. The so-called core maintainers are often only sought out by a core committer before the patch is committed.

It's almost obvious the problem with this: the expert feedback needs to happen early on, preferably in the planning stages. It is neither possible to design after the fact nor to add a solid architecture for backend changes after the fact.

And then of course there is the fact that the patch-by-patch workflow does not bode well for bigger changes or for designer work. It's simply not how designers are working. If we want more designers to contribute to core (yes, we want!) then we need to figure out new ways to work with them.

Another recent discussion in the community was around the buzzword “Drupal App Store” – somehow selling Drupal ”Apps” whatever those might be. Here are some excerpts from Alex Bronstein's most excellent post on this topic:

“Those of us who enjoy spending time working on Drupal core and contrib want to do more of it, but we also want to have shelter [and] food".

“Software development and maintenance costs money, that there's nothing wrong with considering different ideas for how to fund it, including funding models where lots of users each pay a very small fraction of the total cost.”

You can read more opinions on the whole topic and some other funding models at http://tinyurl.com/drupalappstore-opinion-alex. It is indeed the challenge of the year(s) to come to find good funding models. We need to be innovative – as taxes have proven ever the centuries (cowardice tax, beard tax et), there are certainly a good number of ways to skin a cat.

While certainly even more could be said about the community's problems and challenges ahead, we will move on to focus on the technical aspects. So far, we have discussed the need for a central context “clearinghouse”, the ability to deliver information in more than HTML and a consistent entity manipulation API.

One more fundamental problem is deployability. A rallying cry over the years has been “deprecated”: first the CMS made the HTML-writing webmaster deprecated, then the rich ecosystem of Drupal made the developer deprecated by allowing the user to “click together” a site. However, this means that the traditional deploying techniques (push out code) do not work well. Instead, we have configuration and content mixed up in the database, which we cannot version and deploy. Among the solutions often heard and thought of is removing auto-increment identifiers in favor of (or in addition to) machine names and UUIDs. Also, we need to have a solid export API that copies the configuration data into code which then can be easily deployed. I think the CRAP model is more deploy-friendly than CRUD because of the lack of an update operation.

In Drupal 7, we have a ton of pluggable subsystems. A pluggable subsystem is used where a certain functionality is necessary, for example the set/get/clear of a cache, but there is a need for different implementations. For example, the cache_get()/cache_set()/cache_get_multiple() functions are called the same way regardless of where the cache is actually stored (popular options include SQL databases, memcache and MongoDB) and how its accessed.

The queue system very closely mimics the architecture of the cache system. For both, there is an interface, the various implementations are in classes and a variable controls which class is instantiated for a given cache bin or queue. Here is an excerpt from the _cache_get_object function, known as a factory:

$class = variable_get('cache_class_' . $bin);
if (!isset($class)) {
$class = variable_get('cache_default_class', 'DrupalDatabaseCache');
}
$cache_objects[$bin] = new $class($bin);

After this, cache_get() simply returns _cache_get_object($bin)->get($cid). For example, the install process uses $conf['cache_default_class'] = 'DrupalFakeCache', which allows the install to operate without any caching. If you want to use Varnish, you can use the same fake class but only for page: add $conf['cache_class_cache_page'] = 'DrupalFakeCache'; to settings.php. The field storage subsystem is very similar, but not identical – what matters is that you can specify per field where to store the data and there is a default (SQL).

Several include files are even loaded by variable name allowing for another kind of replacement. For example, the session subsystem is included as follows: require_once DRUPAL_ROOT . '/' . variable_get('session_inc', 'includes/session.inc');. Session, lock, path, menu and user password are pluggable this way. This is not as nice as the interface-class-factory pattern as you must implement all the functions that are (or might be!) called from outside of these include files.

It's very likely that in Drupal 8 this will be generalized. There will be slots, the cache subsystem is one slot, the queue is another and so on. The name of the cache bin, the queue or the field will be called a target. Those plugins that are currently loaded from a variable-named include file in Drupal 7 likely will not only have a default target but will be converted to the interface-factory pattern. More and more plugins will be done this way – I would like to see everything that currently uses an SQL string be converted to a plugin. So for example, variable_get / variable_set / variable_del, system_list(), etc.

There are two advantages to this approach: testing becomes possible without re-installing Drupal over and over again. Right now we need to reinstall Drupal because there is no good way to persist data between requests and then be able to reliably clean up. Not to mention what happens if a test fails and wipes the data... The solution could be that we write a mock DBTNG driver that stores everything in a big PHP array. In Drupal 7, this was not really feasible because every bootstrap ran a number of fixed SQL string based queries – but if those are in plugins then we can easily mock them. This might seem like overkill – why not just use the query builder to kill SQL strings? The query builder is slow while the plugin system's overhead seems to be very, very small. As a side benefit, we can use a NoSQL data store as our primary database the same way – all we would need is one DBTNG driver and the plugin implementations.

We already talked about fields and entities – but there is more. There is the minor detail that the current “field hooks” really should be field type callbacks, a minor change in the structure returned by hook_field_info(). More importantly, we could have a relationship and a hierarchy API in core – both plugins, of course. There is a relation module in contrib which might or might not be mature enough to be included in Drupal 8. As for hierarchy, while menu links provide a performant hierarchy storage model, it's too limited even for taxonomy. Taxonomy also provides hierarchy storage but it's not performant. A challenge to solve, for sure. I have little doubt that hook_node_info and the related “node hooks” will be retired in favor of entities / fielded nodes. Whether there is a point in keeping node API (and user API and comment API and...) will be a matter of debate, my cautious opinion is that it'd be better if we could get rid of them in favor of a uniform API.

The comment module also needs to be cleaned up. One thing that can be abstracted out is the anonymous user handling. I have suggested adding a 'bundle' column to the users table and storing anonymous users there in one bundle and normal users in another. This way, anonymous users can have different fields (home page for example) than normal users. Another useful addition would be what I called “small entities”. These are basically the abstraction of aggregator items and comments: they have an author, a created property, a title and a body. After this, comment module only needs to place the entity form where it needs to be (bottom of the page, reply, etc) and thread the comments.

Profile module was only saved from the axe in Drupal 7 because the update path to the Field API was not finished in time, it's not even shown to new users any more. It will definitely be on the chopping block for Drupal 8. The menu system is still as it was originally cobbled together for Drupal 5 – one system doing links, tabs, routing and breadcrumb (and in D7, local actions too). In Drupal 8, there will be a hook_router subsystem working similarly to hook_menu now – but the information will only be used for page routing. If you need a link, save it – in Drupal 7 it's menu_link_save() but we may turn links into a fully fledged fielded entity, so we can store information associated with the link. Certainly one might want to attach images to links for icons. Tabs are being reworked into their own hook at http://drupal.org/node/484234. Many other cleanups are being discussed, but these three areas have mass agreement and will likely make it into Drupal 8.

I have floated the idea to get rid of maintenance mode – if we mandate SQLite support then installation and all other operations not being able to reach the main SQL database could work from an SQLite database file. One might argue in light of the plugin/DBTNG driver scenario above, we could write a maintenance version of our plugins and a maintenance DB driver. In my opinion this is a lot harder than just using an SQLite database.

Another problem regarding maintenance mode is upgrading from one major Drupal version to the next. While the upgrades are running, there is no working Drupal instance yet and we are changing the database as we go and hoping at the end it all works out. Instead, we could switch to a migrate model where a freshly installed Drupal reads the old database and copies the data over while massaging it into the new format as necessary. The advantage of this is crystal clear: the Drupal APIs could be used as there is a working Drupal instance the entire time. The disadvantage would be the work this entails.

All in all, Drupal 8 will be a release with a double focus on cleaning up the APIs and paying back the “architectural debt” we accumulated over the years. The effort will be backed up by the huge number of tests we wrote for Drupal 7. We'll also keep improving the user experience which made nothing short of a quantum leap in Drupal 7 – but that's for another article.

On the web

There are number of blogposts and discussions discussing the future of Drupal 8. One of the best is Jeff Eaton's http://angrylittletree.com/11/01/drupal-8-road-ahead.

Articles

Module Selection In The Wild

Jeffrey A. “jam” McGuire

Tips for choosing the right contrib modules.

Coming Soon

Seven Modules You'll Be Using Next

Michael Anello

A look at seven lesser known modules for Drupal 7.

Coming Soon

Features

Performance and Scalability in Drupal 7

Nathaniel Catchpole

A comprehensive overview of the performance and scalability improvements in Drupal 7.

Coming Soon

See all articles from this issue

Coming Soon

About the author:

Károly (Chx) Négyesi

Károly Négyesi is an internaut. He spent the Nineties reporting on using the Web, and the last 10 years forming it by becoming one of the most prolific contributors to Drupal. An always curious mind helps him in scaling websites, and trying to understand other curious minds via cognitive sciences.