Blog

Write a Migrate Process Plugin, Learn Drupal 8

Károly (Chx) Négyesi

A few of us were coaching Campbell Vertesi on porting the CSV source to Drupal 8 and he asked as an aside about mapping US states he had in a taxonomy vocabulary to taxonomy IDs during a migration. Glad you asked! The answer gives us an example for quite a few concepts in Drupal 8, so let’s dig in! We will go over the code line by line.

Plugins

This particular class is a plugin. Plugins are normal objects in a predefined directory with a little metadata. For example, field widgets and formatters are plugins: they get a field and they return a form or a render array. We can change the formatter freely, only the type and meaning of the inputs and the output is fixed. Another good example are the image effects. Migrate uses plugins for everything: sources, processing, destinations. See more.

Namespaces, PSR-4

Line 8 contains a namespace declaration: the first part is Drupal and then the module name migrate_plus then the rest. Typically a plugin will follow by the a Plugin part and then the name of the defining module migrate and finally the type of a plugin process if the defining module has several. Not every plugin type requires such a long namespace, for example entities simply use Entity after the module name: Drupal\taxonomy\Entity. Drupal 8 will look for classes of the migrate_plus module under modules/migrate_plus/src (and all the other usual places for modules) and then the rest of the path is the same as namespace -- this is specified by the PSR-4 standard so this class is in the directory modules/migrate_plus/src/Plugin/migrate/process (sneak preview: a few lines later we will find the class name is TermReference and so the filename is TermReference.php).

Use Statements

Line 10-16 contains use statements. use some\namespace\classallows us to just write class in later code and the Drupal coding standards require this. It really is just syntactic sugar, you can even use non-existing classes. As an aside, many of us have found the PhpStorm IDE very convenient for Drupal 8 development: for example, it takes care of the file placement and naming from the previous section and adds these use statements automatically for you.

Annotations

Line 21-23 contains an annotation. Annotations are a very useful feature in sane languages (like Python) so much so that the PHP community have implemented them in user space… several times. As such, Drupal 8 uses the annotations syntax of Doctrine on classes and PHPUnit annotation on tests. The Doctrine annotations are pretty close to a PHP array except {} is used instead of array(). We can see a very simple example here: this is using the MigrateProcessPlugin annotation and the plugin definition is array(‘id’ => ‘term_reference’). Every plugin must have an id at least. In previous versions of Drupal you would’ve used a hook_migrate_process_info returning an array keyed by the same id and some data. Although the info hooks are gone the alter hooks are still here: for example migrate_process_info_alter is a valid hook (although at this moment undocumented as its utility is severely limited). Other similar hooks, however, are much more useful, for example hook_entity_info_alter.

MigrateProcessPlugin itself is a class in the Drupal\migrate\Annotation namespace and it’s useful to know this because this class is the nexus of information about process plugins.

Classes, Base Classes and Interfaces

Line 25 contains the class name, a base class and an interface. One of the fundamental building bricks of Drupal 8 are interfaces. Interfaces provide a contract, that by which classes that implement it agree to provide certain functionality so that they can be used the same way other classes that use the interface. In other words, every class will have certain methods which take a certain kind of input and provide a certain kind of output. They are absolutely fundamental to plugins since any code interacting with a plugin will only know about the methods the interface require and nothing about the plugin details itself. Because of this, plugin types can require their plugins to implement a specific interface and Drupal throw an exception if they don’t.

Base classes are not a language feature, they are typical of Drupal 8 however: these classes contain some useful common logic for implementing an interface. Extending these instead of implementing an interface is very strongly recommended (although not mandatory at all). Some interfaces do not have a base class, for example ContainerFactoryPluginInterface.

Services, Injection

We will skip the constructor for now and talk about the create method starting on Line 40 required for implementing ContainerFactoryPluginInterface and then we will cover the constructor briefly.

Previous versions of Drupal were often strongly coupled: hardwired function calls were the norm. In Drupal 8 a lot of functionality is provided by so called services. There is a service for all sorts of things: working with entities, logging information, installing modules etc. The container itself is an object and the most used method of it (by far) is get as visible on line 46. You can find the services provided by core here. Because the container provides so many things it is not a good practice to pass and store the container in an object. By doing so, it becomes harder to understand (and to test) a class as it can basically depend on anything. Instead only the static create method will get the container, it passes the necessary services to the constructor and the class itself now has clean dependencies.

By far the most commonly used service is the entity manager: the getDefinition method gives us the entity type object, the equivalent of entity_get_info in Drupal 7. The getStorage gives us the storage object, which in turn can query and load entities of a particular type. (Then the entity objects can save themselves.) If we are not coding a nice little plugin then the entity manager can also be accessed at \Drupal::entityManager(). The Drupal class has methods for most common of the functionality. Most of these methods are just wrapping a $container->get() call so this list is also useful as a list of services. See more on services.

So the create method grabs the taxonomy term storage object and passes it to the constructor. The constructor in turn will call the base class constructor which initializes the common plugin properties, our constructor then initializes our own properties: most importantly the term storage is now available to every method in the class.

Entity Query

We have a getTermId helper method, not required by any interface -- it can not be as interfaces have public methods only. This method queries the term storage for the terms in the specified vocabulary. This perhaps looks familiar -- almost like a database query in Drupal 7. This, however, is for entities only and the condition method is extremely powerful, for example to find nodes posted by users joined in the last hour, condition(‘uid.entity.created’, REQUEST_TIME - 3600, ‘>’). Also, in general, already in Drupal 7 using SQL queries was discouraged but in Drupal 8 it’s safe to assume accessing the database is just doing it wrong.

The entity query returns a list of entity ids and then we load those terms. The following interesting tidbit is $term->name->value, this is one of the ways to access a field value in D8 but it’s mostly just for demo, using a proper method $term->label() is strongly preferred. This $entity->fieldname->propertyname chain can continue: we can write $node->uid->entity->created->value to get the created time for the node author.

The entity query condition closely mirrors this syntax: change the arrows to dots, optionally drop the main property , in this case value and you will get the previously mentioned condition('uid.entity.created', ... to query the same. The Entity API is a really powerful feature of Drupal 8.

Process Plugins

Finally we arrived to the transform method which is the only method required from a process plugin. Migrate works by reading a row from a source plugin then running each property through a pipeline of process plugins and then hand the resulting row to a destination plugin. Each process plugin gets the current value and returns a value. Core provides quite a number of these, a list can be found here. Most process plugins are really small: the average among the core process plugins is a mere 58 LoC (lines of code) and there is only one above 100 LoC: the migration process plugin which is used to look up previously migrated identifiers and even that is only 196 LoC (lines of code).

In our case the actual functionality is just one line of code after all this setup. Of course this doesn’t include error handling etc.

So there you have it: in order to be able to run this single line of code, we needed to put a file in the right directory, containing the right namespace and classname, implement the right interfaces, get a service from the container and run an entity query.