On Feb. 11, 2014, Drupal.org – flagship site of the Drupal project – joined thousands of other websites in a campaign against state Internet surveillance dubbed “The Day We Fight Back.”
In announcing Drupal.org participation in the campaign, leading Drupal developer Larry Garfield made a strong link between free software and digital freedom: “Both the American and British governments have been found violating the digital privacy of millions of people in their own countries and around the world. That is exactly the sort of attack on individual digital sovereignty that Free Software was created to combat.”
What are the implications of recent surveillance revelations for Drupal site owners? What can and should Drupal site builders and developers be doing to protect user privacy? To find out, I spoke with analysts and developers both within and outside the Drupal community.
User Data and Threat Modeling
“Contemporary websites have almost innumerable places where information can be entered, logged, and accessed, by either the first party or third parties.”
That’s the frank assessment of Chris Parsons, a postdoctoral fellow at The Citizen Lab at the University of Toronto’s Munk School of Global Affairs. Parsons’ current research focus is on state access to telecommunications data, through both overt mechanisms and signals intelligence – covert surveillance.
Parsons recommends an approach to user data protection called threat modeling. “So who are you concerned about, what do you believe your ethical duties of care are, and then how do you both defend against your perceived attackers and apply your duty of care?”
Parsons suggests, “The first step is really just information inventory: what’s collected, why, where’s it going, for how long.”
For Parsons, having strong protections for user data is critical, and not merely from a privacy perspective. Rather, privacy protection is just sound business practice. Imagine this scenario, he suggests: “One of your core databases with customer information gets compromised.” Then, “If you have an auditor that comes in, or if you have the press pounding on your door, you don’t want to be telling either of those parties, ‘Yeah, that’s a good question. I don’t know where any of our data is. We don’t know what we lost.’”
Third Party Services: Analytics
The intersection of privacy protection and Drupal programming is a daily concern for Mark Burdett, senior web developer at Electronic Frontier Foundation (EFF).
Burdett has been a Drupal user and contributor since 2004. These days, his efforts support EFF’s work at the leading edge of digital rights protection.
Burdett highlights third party integration as a key point for data leakage. As embedded widgets spread throughout the internet, logging detailed data to companies like Google, Facebook, and Twitter, these companies are able to compile an ever more complete picture of individuals and their online activities. Intelligence agencies in turn, can benefit from “one stop shopping” for user data. Burdett notes that the problem by no means is unique to Drupal: “Wordpress is certainly just as bad.”
The usage statistics on Drupal.org show that over 300,000 Drupal sites – close to a third of all reporting sites – are running the Google Analytics, the most used third party service.
In Burdett’s view, “You only want to record information about your users that you really need. The only way to safely do that is to not rely on third party services that we can’t control, like Google Analytics.”
Parsons is more pragmatic, acknowledging that when it comes to analytics the battle has already been lost, if it even happened at all. Still, he points to the practical advantages of maintaining your own statistics. “I often avoid using Google Analytics, in part because more and more people are blocking Doubleclick [and other Google] cookies.” Instead, Parsons opts for self-hosted solutions because, “I find that the truth that comes through them can be more useful.”
A leading alternative to Google Analytics is Piwik, an open source analytics software that you can install alongside Drupal, even on shared hosting. The Piwik Web Analytics module integrates Piwik with your Drupal site.
As well as storing data on your own server, where you have control over it, Burdett points out, “You can configure Piwik to only log as much information as you really need.”
Social Media and Alternatives
Beyond analytics, various types of social media integrations also present concerns.
“We shouldn’t really think of Facebook and Twitter as sort of neutral information carriers,” Burdett cautions. “It’s easy to do so because we use those services to communicate with our friends and family, or fans in the case of Twitter.”
But in reality, “Their whole business model is to show ads to people, and, in the course of doing that, to record as much information about people as possible, in order to show them ads most effectively.” Amassing user data isn’t incidental to the operations of the major social media corporations – it’s the whole point.
One module that EFF sponsored for use on their own website is MyTube. “When we post a YouTube video in a blog, it won’t load anything from youtube.com until you click on it,” Burdett explains.
Parsons similarly recommends a tool called Social Share Privacy, which has an associated Drupal module. Like Mytube, Social Share Privacy communicates with the third party website only if a user first clicks a link. Parson comments, “If your content is really great – and most people hope it is – I don’t think that one extra click is going to doom the ability to share [it].”
Burdett points to emerging privacy-conscious social media alternatives like Diaspora, pump.io, and Status.net. While he’s interested in their potential, he’s also frank about the challenges they face. Diaspora, for example, has lots of advantages: “It’s decentralized and federated and you can run your own Diaspora node.”
However, “If you’re just waiting for your friends to magically start using it, that could take a lifetime.”
Meantime, to Burdett, it’s worth asking just how essential share buttons and social media feeds are to your mission. “Website real estate is really important. If you take up a quarter of your page with some kind of Facebook feed, you need to figure out if it’s really worth it for you.”
User Data Security on Drupal.org
It’s on a particular website that abstract concerns about user privacy and data security meet the hard realities of specific threats. To get a feel for Drupal user privacy in action, what better example than Drupal.org itself?
Enter Neil Drumm, senior technologist at the Drupal Association.
Drumm co-founded CivicSpace – the first Drupal shop – and went on to be a lead architect of the Drupal core update and install systems and core maintainer for Drupal 5. Today, he works with a global team to build and maintain the Drupal community’s main collaboration spaces.
One area of concern for Drumm is how to deal with user data when providing development versions of the Drupal.org site to a global team of developers. “We want volunteers to work on the site,” Drumm explains, but “to properly work on a Drupal site, you need actual data.”
To meet this need, the Drupal.org team maintains custom data sanitization scripts. Before a copy of the site database is made available to developers, potentially sensitive user information is automatically removed.
Beyond Drupal.org, while nondisclosure agreements and server and laptop security measures provide protections for user data, some companies are taking the extra step of introducing data sanitization into their everyday workflow – just in case.
Building a web application often involves collecting and displaying potentially sensitive user data. One such use case came up in a 2010 redesign of Drupal.org, which called for an animated front page map of user contributions.
“We had some country data,” Drumm explains, “but that’s not really interesting for contributions from larger countries like the US, Canada, Russia, because we’re not just one point on a map.”
Because locational data can be sensitive, the Drupal Association made privacy a top priority in designing a solution: “From the start, we explicitly made that opt-in, not opt-out, so you have to check the box on your user profile, and then on top of that the browser will ask you to confirm that.”
Users may choose to share their location, but no data is collected without their consent.
When it comes to protecting user data and interactions, both Burdett and Parsons emphasize one key toolset: secure connections.
Originally designed to meet the data protection needs of e-commerce sites, secure sockets layer (SSL) or HTTPS ensures that all communications between a web browser and a server are encrypted, and so have a level of protection from prying eyes.
“The ideal thing is to use HTTPS all the time,” says Burdett, “for all of your users, for every interaction they have with the website, so that their reading habits, or the comments they’re posting – or anything else – can’t be swept up in blanket surveillance, and then analyzed later for whatever patterns the NSA might be looking for.”
While the Edward Snowden leaks detailed ways the NSA is working to undermine SSL, Parsons concurs that HTTPS remains an indispensable tool. He admits though, ruefully, that he has yet to enable it on his own website.
Back at Drupal.org, HTTPS plays a central role in data security. It was introduced first on the Drupal Association and DrupalCon conference sites, due to the need to support credit card transactions. The plan was to extend HTTPS to the rest of the Drupal.org sites, but there were performance implications.
In May 2013, the Drupal.org security team and infrastructure team discovered an unauthorized access incident in which user data was compromised on Drupal.org and the Drupal groups site, groups.drupal.org.
“We were all at DrupalCon Portland,” Drumm recalls. While the problem was traced to third-party software rather than Drupal itself, the incident pushed security hardening of the Drupal.org infrastructure to the top of the agenda. “We decided at that point that, yeah, we should go ahead and push forward with SSL and take the performance hit.”
Beyond just introducing SSL, there are additional details to consider. One is the method of encryption.
Burdett explains that while standard encryption uses a single key that’s used across a server, there is a newer method called forward secrecy: “[It] means that a unique key is generated for each HTTPS session.” If you run an e-commerce bookshop and receive a law enforcement subpoena relating to a particular customer, Parsons says, “You as a bookshop seller do not want to be in a situation where you’re disclosing the decryption key for every person – or every IP address, rather – that has looked at your website and what books they’ve looked at.” Forward secrecy ensures there is no single key that decrypts all users’ communications.
Also, if you stop short of using SSL site-wide, modules like Secure Login (which Burdett co-maintains) and HTTP Strict Transport Security provide out of the box handling to ensure that login information is protected and to prevent “man in the middle” attacks by instructing the browser to always reconnect via HTTPS.
To Parsons, protecting user information should be anything but an afterthought. “Certainly, if there’s any sort of commercial or business interest involved, I think this just flows out of the business plan that you’ve probably developed.”
“The main point to drive home,” Burdett concludes, “is that all users deserve a reasonable degree of security and privacy in their online lives. And as new revelations come out – for example, covert surveillance of WikiLeaks readers – it’s clear that governments around the world are using increasingly powerful tools to track and analyze us.”
While sites with user submissions, like comments, may be especially important to secure, Burdett emphasizes that no site is totally exempt. “We never know which site we visit or article we read could make us fall under suspicion – now or in the future – by someone who monitors the network, whether an employer, school, or government.”
It’s a sobering thought for anyone setting up a Drupal site. But the good news is that simple practices like an information inventory, along with tools like SSL and meaningful privacy policies, can go a long way to protecting user privacy, as can thoughtful choices about what information to collect – and what not to.
For Drupal.org and related sites, the Drupal Association follows a simple precept: “We don’t collect data we don’t use for something,” Drumm says.
No one can steal what you don’t have in the first place.
Drupal modules for privacy protection
Support Secure Connections
- Cryptolog enhances user privacy by logging ephemeral identifiers rather than actual client IP addresses in Drupal's database tables and syslog.
- HTTP Strict Transport Security ensures browsers reconnect using HTTPS, avoiding “man in the middle” attacks.
- Secure Login enables the user login and other forms to be submitted securely via HTTPS, thus preventing passwords and other private user data from being transmitted in clear text.
Minimize User Data Logging and Leakage
- IP Anonymize periodically scrubs user IP addresses from the Drupal database.
- No Referrer enhances privacy by allowing users to avoid leaking referrer information when they click on links to external sites.
- Piwik Web Analytics integrates the Piwik web statistics software, which can be locally hosted and configured to collect only what data is required.
Detect and Address Security Issues
- Paranoia reduces the potential impact of an attacker gaining elevated permission on a Drupal site.
- Security Review automates testing for many of the easy-to-make mistakes that render your site insecure.
Integrate Social Media Without Exposing User Data
- Follow generates social media follow links without embeds.
- Mytube minimizes user tracking by embedding video only when a user clicks on a thumbnail image.
- Service Links minimizes user tracking by generating share links without embedding.
- Social Share Privacy minimizes user tracking by providing share buttons that connect to social media sites only when a user clicks them.
- Twitter Pull, Facebook Pull, Feeds Facebook display Twitter or Facebook updates without embeds or cookies.
Provide Cookie Warning
Free Software Vs. Surveillance: Acquia Certification
The relationship between software freedom and surveillance technology came into sharp focus in community response to the Acquia Certified Developer exam announced this spring. Promoted as a means of certifying expertise in Drupal, the downloadable exam, it turned out, required software that ran only on Windows or Mac. Having to use a proprietary operating system to evaluate free software skills? As one blogger put it: “WTF!”
At issue was not simply the feature set of one or another software product, but the meaning of free software itself. One commenter noted the exam requires “software that monitors all your actions without you being able to tamper with it.” Another observed, “Open source Linux is never going to work for that.”
Grounded in principles of sharing and cooperation, free software like Linux and Drupal includes built-in protections that work against external control and surveillance.
The Acquia exam raises important questions: as free software users and advocates, what are our responsibilities when it comes to practising and defending software freedom – and, by extension, users’ digital rights?