fromJune 2014

Migrate API

Technically Speaking

Magnified digits

The migrate API works with plugins and stores the configuration for those plugins in a configuration entity. There are a number of plugin types offered: source, process, and destination are the most important. Source merely provides an iterator and identifiers, and most of the time the destination plugins provided by core are adequate, so this article will focus on process plugins.

Process plugins

Nothing gets into the destination unless it is specified under the top level process key in the configuration entity. Each key under process is a destination property and the value of it is a process pipeline. Each “stage” of this pipeline is a plugin which receives the output of the previous stage as input, does some transformation on it, and produces the new value of the pipeline.

There are a few plugins which indeed only use the pipeline value as input – for example, the machine name plugin transliterates the input (presumably a human name) and replaces non-alphanumeric characters with underscores. However, if that was all plugins could do they wouldn’t be too useful. Instead, every plugin receives the whole row and the name of the destination property currently being created.

Each stage in the process pipeline is described by an array, where the plugin key is mandatory and the rest is just the plugin configuration. For example:

      plugin: machine_name
      source: name
      plugin: dedupe_entity
      entity_type: taxonomy_vocabulary
      field: vid

The above mentioned machine name transformation is run on name and then the entity deduplication plugin adds a numeric postfix ensuring the vid field of the taxonomy_vocabulary entity is unique. That is the canonical format of the process pipeline.

However, often only a single plugin is enough:

    plugin: migration
    migration: d6_filter_format
    source: format

In this case, instead of a list of plugins, just a single plugin is used. Note the dot in body.format. The system supports the dot notation for source destination; it’s the equivalent of $destination['body']['format'] = ....

Finally, very often something just needs to be copied. This could be implemented as:

  nid: nid
  langcode: language

That is the rough equivalent of $destination['nid'] = $source['nid']; the second is $destination['langcode'] = $source['language']. Internally, both shortcuts demonstrated here get translated to the first canonical format. Here is the langcode: language notation in the canonical format:

      plugin: get
      source: language

Any time the source key is used, the system inserts an additional pipeline stage running the get plugin. In case you’re wondering what the starting value of the pipeline is, it always starts with NULL. Most often the first stage will contain a source and get will provide a value based on the source.

Now let’s see the relevant parts of a process plugin:

namespace Drupal\migrate\Plugin\migrate\process;
 * @MigrateProcessPlugin(
 *   id = "machine_name"
 * )
class MachineName extends ProcessPluginBase {

  public function transform($value, MigrateExecutable $migrate_executable, Row $row, $destination_property) {
    $new_value = $this->getTransliteration()->transliterate($value, Language::LANGCODE_DEFAULT, '_');
    $new_value = strtolower($new_value);
    $new_value = preg_replace('/[^a-z0-9_]+/', '_', $new_value);
    return preg_replace('/_+/', '_', $new_value);

As this is a plugin, the Drupal\modulename\Plugin\migrate\process namespace is mandatory and so is the @MigrateProcessPlugin annotation. These plugins typically extend ProcessPluginBase. There is only one method specified by the interface: transform. The $value is the current value of the pipeline and the return value will be replaced by it. While this particular plugin only depends on $value, it would seriously limit the usefulness of the system if the other parameters were not available. Thankfully, it is possible to peek at other values in the row: $row->getSourceProperty($property) gets other source values. Already calculated destination values are available too by using: $row->getDestinationProperty($property);. The plugin configuration is available as the array $this->configuration. For example, if you look back at the second example, $this->configuration['migration'] would be d6_filter_format.

Handling Lists

All the examples above are simple scalars: there’s one body and it has a single format. Every vocabulary has a single identifier. And so on. Sometimes the value of a source property will be an array. There are two kinds:

  1. A simple list of scalars. Typically, these are strings for the permissions in a role, the recipients of a contact category, etc. The system automatically handles these: if the source is a single property, and yet the value is an array, then the system will iterate this array and call the pipeline for every single value. So the process plugin doesn’t need to handle this case itself, it can just transform scalars. However, if a plugin actually wants to handle arrays – we will see an example in a minute – it can easily do so by adding multiple = TRUE to the annotation.
  2. A list of arrays. For example, the filters of a text format. Every filter has a module, a delta, settings, etc. For this case, we have an iterator plugin which, as the name suggests, iterates over the value and runs a process pipeline for every property of the current filter.

This essentially makes the process subsystem recursive with the usual advantages and disadvantages of recursion: it’s harder to understand but insanely powerful.

There are two more features I’d like to draw attention to; both are in the key: @id row. One, the iterator plugin, can change the key – the source simply gives us a list of filters, but Drupal 8 expects the filter plugin ID to be the key. Second, the @id notation is usable not just for the key but also as a value of any source key. Specifically, @id notion means: use the already calculated id destination value for this. And it works because the key is calculated last.

Image: ©


What in the world are those dashes?

YAML needs to tell apart the stages of the pipeline. In PHP you are looking at array(array('plugin' => ...), array('plugin' => ...))