Many of the technologies and ideas contained in the Spy Files are obscure and unfamiliar. Privacy International has compiled this glossary, in which experts explain what these technologies are, how they work, and the implications they carry for privacy and civil liberties.
- Digital forensics
- Deep packet inspection
- Social network analysis
- Data mining
- Backdoor trojans
- Open source intelligence and social media monitoring
- Webmail interception
- Speech and voice recognition
- Safe cities
- IMSI catchers
- Facial recognition
- Mobile monitoring
The ability to analyse the contents of computers, cellphones and digital storage devices, to make use of communications data and to intercept traffic is essential to modern law enforcement.
The word ‘forensic’ implies that the ultimate purpose of the activity is legal proceedings, and therefore that the authority to deploy specific digital forensic tools is granted within a strong framework of lawful powers and proper independent scrutiny of specific warrants, against the tests of necessity and proportionality. It suggests that the activities are subject to audit and produce evidence that can be exhaustively tested.
However, when the underlying technologies are deployed without these controls they can become unconstrained instruments of surveillance. Digital forensic tools have considerable capacity for intrusion.
The most widely used category of digital forensics tool makes a complete copy of a hard disk or other storage medium, including ‘deleted’ areas, and then exhaustively analyses it – a forensic disk image. The tools can: use keyword search for content, recover deleted files, create chronologies of events, locate passwords and, if more than one person uses the computer, attempt to attribute actions to a specific person.
A really well-designed encryption system is difficult to break, but many computer users deploy poorly designed programs or use them without all the associated disciplines so that unencrypted versions of files and/or passphrases can be located on a disk or other device.
Professor Peter Sommer – information security expert, London School of Economics and Open University
Deep packet inspection
Deep packet inspection (DPI) equipment is used to mine, mediate and modify data packets that are sent across the internet. By analysing the packets that carry conversations and commercial transactions and enable online gaming and video-watching, the content of each communication and the medium can be identified.
With this information, the equipment can then apply network-based rules that enable or disable access to certain content, certain communications systems and certain activities. Such rules may prioritise, deprioritise or even block particular communications paths.
As an example, when these rules are applied to a Voice over Internet Protocol (VoIP) telephony system such as Skype, the system may receive excellent service, have its service intentionally degraded or be blocked entirely if it encrypts communications. DPI can also modify communications traffic by adding data to, or removing data from, each packet.
Additions may enable corporate or state-backed surveillance of an individual’s actions and interactions online; subtractions include removing or replacing certain words in an email message, removing certain attachment types from email, or preventing packets from reaching their destinations.
This technology raises considerable privacy and security concerns, especially when employed by repressive governments. It enables covert surveillance of communications and has been used in Iran to ‘encourage’ citizens to use non-encrypted communications by blocking some data encryption and anonymisation services.
After shifting users towards non-encrypted, less secure, communications channels, the DPI equipment can identify log-ins, passwords, and other sensitive information, and can compromise subsequent encrypted communications paths between parties.
While DPI equipment is often sold under the guise of ‘simply’ ensuring efficient distribution of bandwidth between Internet subscribers or maintaining standard ‘lawful intercept’ powers, many of these devices can be re-provisioned for highly invasive mass surveillance, giving the state unparalleled insight into citizens’ communications and activities online.
Christopher Parsons is a PhD student at the University of Victoria who specialises in privacy and the internet
Social network analysis
Social network analysis treats an individual’s social ties as a kind of social graph, such that an analysis of the ties between different members of a social graph can reveal local and large-scale social structures.
Such analysis may focus on the number of connections between different individuals or groups, the proximity of different individuals or groups to one another, or the intensity of these connections (taking the frequency of interaction, for example, as a proxy). It can also reveal the degree of influence a person might have on his or her community, or the manner in which behaviours and ideas propagate across a network.
Police, security, and military analysts tend to rely on this kind of analysis to discover the collaborators of known criminals and adversaries. But less directed searches are also possible, in which analysts simply search for unusual structures or patterns in the network that seem to suggest illicit activity.
Solon Barocas – is a PhD student at New York University with interests in IT ethics and surveillance studies
Data mining – Solan Barocas
Data mining refers to a diverse set of computational techniques that automate the process of generating and testing hypotheses, with the goal of discovering non-intuitive patterns and structure in a dataset.
These discoveries can be findings in their own right because they reveal latent correlations, dependencies, and other relationships in the data—findings that support efforts like predictive policing. But these findings can also serve as a basis upon which to infer additional facts from the available data (a person’s preferences or propensities, for instance), acting as rules in programs that automatically distinguish between cases of interest.
Such programs have been especially attractive in the intelligence community because they would seem to hold the promise of automating the process of evaluating the significance of data by identifying cases of the relevant activity in the flow of information.
Such applications have been met with fierce opposition due to privacy and due process concerns (famously in the case of Total Information Awareness). Critics have also disputed data mining’s efficacy, highlighting problems with false positive rates. A 2008 National Academies report went so far as to declare that, because of inherent limitations, “pattern-seeking data-mining methods are of limited usefulness” in counterterrorism operations.
These permit covert entry into and remote control of any computer connected to the internet. They can be combined with disk analysis tools, so that an entire computer can be searched remotely for content, including passwords.
It is also possible to abuse such techniques to alter and plant files and to masquerade as the legitimate user – such techniques can, however, often be detected by forensic examination. There are a huge number of examples, and many face the technical problem of how to hide their presence on a system.
Professor Peter Sommer
Open source intelligence and social media monitoring
Several tools have been developed to retrieve online images, video and text to help law enforcement or intelligence agencies to discover, track and analyse ‘terrorist content’, the users who post such content and the network in which this content circulates. Such content is defined by the user of these tools, and can consist for instance of text or video instructions on how to make an improvised explosive device (IED), Al Qaeda propaganda videos and images, or threats on an online forum.
The ongoing data deluge results in an ever-increasing demand for and production of such data-mining software that is able to automatically collect, search, and analyse words, sounds, and even sentiments, from open sources such as internet forums and social media.
Some tools claim, for instance, to be able to determine whether a poster on an online forum is getting more aggressive over time, by looking at the combination of writing style, word usage, use of special characters, punctuation and dozens more factors.
The focus on preventing terrorism, rather than investigating past crimes, along with the perceived tendency for terrorist groups to be organized in decentralized networks of ‘cells’, led to an increased interest in the use of social network analysis tools as well. Such tools support statistical investigation of the patterns of communication in groups, or the position of particular individuals in a network.
They can be used, for instance, to distinguish ‘opinion leaders’ and their fans on jihadi forums from other posters, but they can also be used to determine who has suspicious patterns of contacts with a known or suspected terrorist, or with any member of a known or suspected group of criminals from a large database of contacts.
All these technologies aim to discover patterns and content which would otherwise remain unnoticed, often in order to identify potential terrorists and possibly prevent them from committing a crime. While this is a legitimate goal for a society, these tools can easily be abused since the end-user can define what exactly the technology should be looking for.
Automatic text analysis tools can be used to find ‘terrorist content’, or attribute an ancient manuscript to a modern-day writer, but they can equally be used to track regime critics, or members of a suppressed religious or ethnic minority in authoritarian regimes.
It is not difficult to envisage the potential dangers of all these separate technologies, but the danger of abuse becomes even higher when they are combined with each other. Even when there is a legitimate aim, there is always a high risk of false positives attached to the use of this software, which can lead to innocent people being watched and tracked by government authorities.
It is also important to stress that open source information does not only consist of information you choose to make public about yourself online: it can also consist of information that you cannot avoid making public to a governmental authority (such as an application for a visa). All this information can be automatically mined and analysed, and eventually result in an action that ultimately limits your rights.
Mathias Vermeulen is a researcher with an interest in human rights and detection technologies
Hundreds of millions of people around the world communicate using free email services provided by firms such as Google, Yahoo, Microsoft, and others. Although these companies offer similar email services, the security and privacy protections vary in important ways.
Specifically, Google uses HTTPS encryption by default, which ensures that emails and other information stay secret as they are transmitted between Google’s servers and the laptop or mobile phone of a user. In contrast, Yahoo, Microsoft and most other providers do not use encryption by default (or offer it at all in some cases).
The use of encryption technology significantly impacts the ability of law enforcement and intelligence agencies to obtain the emails of individuals under surveillance. Government agencies that wish to obtain emails stored by companies that use HTTPS by default, like Google, must contact the email provider and try to force them to deliver the communications data.
These service providers can, and sometimes do, push back against specific requests if they feel that the request is unlawful. They also generally ignore all requests from some repressive regimes, such as Iran or North Korea, when they do not have an office in that country.
In contrast, when governments wish to obtain emails sent via services like Yahoo and Hotmail that do not use HTTPS encryption, they can either ask the webmail company for the emails, or, because no encryption is used, directly intercept the communications data as it is transmitted over the network.
This requires the assistance of a local internet service provider, but such assistance is often far easier to obtain, particularly if the webmail company has no local office and is refusing to comply with requests. As a result, the decision to use a webmail service that encrypts data in transit can significantly impact your privacy – particularly if surveillance requests from your government would normally be ignored by foreign webmail providers.
Although HTTPS encryption can protect email communications against passive network surveillance by government agencies, this technology does not provide a 100% guarantee that the police or intelligence agencies are not listening. Web encryption technologies depend upon trusted third parties, known as ‘certificate authorities’ which allow browsers to identify the servers with which they are communicating.
Several of these certificate authorities are controlled by governments, and many others can be coerced by governments into creating false credentials that can be used to decode otherwise encrypted communications. Several commercial internet surveillance products specifically advertise their compatibility with such false credentials.
Chris Soghoian is a Washington, DC based Graduate Fellow at the Center for Applied Cybersecurity Research
Speech and voice recognition
In recent decades, security and law enforcement agencies have shown a keen interest in developing technologies that can identify individual voices (speaker recognition) and that can understand and automatically transcribe conversations (speech recognition).
The US National Security Agency currently funds the National Institute of Standards and Technology (NIST) to conduct annual reviews of cutting-edge speech software. Security analysts believe the agency and its overseas partners have used such systems for many years over public communications networks to detect target keywords and to transcribe conversations into text.
Commercial speech recognition systems are widely used but their accuracy reached a plateau more than a decade ago, and accuracy remains barely better than 50%.
However, newer systems being marketed to security agencies employ characteristics such as rhythm, speed, modulation and intonation, based on personality type and parental influence; and semantics, idiolects, pronunciations and idiosyncrasies related to birthplace, socio-economic status, and education level.
The result is a powerful technique that can deliver high accuracy at a speed that makes the software operationally viable for even the smallest agencies.
Simon Davies is director-general of Privacy International.
We all want to feel safe. Our ability to use and enjoy our cities and the relationships we establish with one another depend on this. Moreover, unsafe environments always exclude the most vulnerable.
However, anxiety over urban insecurities is on the increase worldwide, and the attention to risk and efforts to minimise insecurity at the local level seem to be failing to foster trust, security and co-operation.
In order to combat this, many governments have made security and community safety their top priority. Elections are won and lost over this issue, and once in office, the management of fear can make you or break you.
This concern over global and local threats and the perception of risk has taken the issue of security out of police stations, borders or critical infrastructures to embed security in all corners of local policy: urbanism, transport, traffic management, open data and more. The drive to minimise the unexpected and convey a sense of control is today a fixed item in the political agenda.
This increased attention to risk, however, is failing to increase feelings of security. It seems evident that the attention to these issues has failed to make us better at calculating and reacting to potential dangers, and current security policy at the local level is more based on adopting prescriptions from other cities than on an actual diagnosis of the specific sources of danger and insecurity in a given territory.
Moreover, local governments are using security technology traditionally deployed in foreign policy and border control to monitor city life as a way to show political muscle and authority, often overlooking the social and legal consequences of surveillance technologies and the normalisation of control.
Gemma Galdon Clavell is a researcher based at the Universitat Autònoma de Barcelona, where she focuses on public policy, community safety, surveillance and public space
An IMSI catcher is an eavesdropping device used for interception of mobile phones. They are portable devices, now as small as a fist, that pretend to be a legitimate cell phone tower that emits a signal to dupe thousands of mobile phones in a targeted area. Authorities can then intercept SMS messages, phone calls and phone data, such as unique IMSI and IMEI identity codes that allow authorities to track phone users’ movements in real-time, without having to request location data from a mobile phone carrier.
In addition to intercepting calls and messages, the system can be used to effectively cut off phone communication for crowd control during demonstrations and riots where participants use phones to organise.
It is unclear how the use of IMSI catchers can be justified legally, and while evidence that they are a common tool for law enforcement is growing, few have come clean on their use.
In repressive regimes, IMSI catchers are especially concerning, given the ease with which these technologies could unmask and identify thousands of people demonstrating. In the UK, excessive and disproportionate use of IMSI catchers is likely to have chilling effects on the freedom of association and legal public protest.
Eric King is director of policy at Privacy International
Facial recognition technology
Facial recognition technology automates the identification of people – or verification of a claimed identity. It uses sophisticated computer algorithms to measure the distances between various features of a person’s face and compares them to a gallery of images on a database, or, in the case of 1-to-1 verification, against the image of a ‘suspect’.
A camera first captures an image of a person’s face, which sometimes happens surreptitiously and presents serious civil liberties concerns. This image is called the ‘probe’ image. The software then processes this probe image to extract the relevant features of the face, which is compared against a database of previously collected images, or a single image in the case of 1-to-1 verification.
As facial recognition is inherently probabilistic, the algorithm produces a ‘score’ that depicts the likelihood of a match. A human operator is therefore usually required to decide whether the algorithmic match is ‘true’.
While the facial recognition process is largely automated, humans are still required to confirm or reject a potential hit, which may introduce bias into the system.
There are many factors that affect the performance, effectiveness and reliability of facial recognition systems. First, the system can only recognise people whose images are enrolled in the gallery. So an outstanding concern is how databases are populated with records. Second, image quality is a major issue.
Poor lighting conditions, extreme camera angles, camera distance, old images, large gallery size, and obscured faces will affect on system’s performance. Third, facial recognition systems have a ‘sensitivity threshold’ that must be set by the operator.
A low threshold will produce false positives – an innocent person will be subjected to increased scrutiny based on resemblance to a person on the suspect list. And setting the threshold too high will result in false negatives. Facial recognition systems cannot be set to simultaneously give both fewer false positives and fewer false negatives. So error is necessarily a part of facial recognition systems.
The use of these technologies presents numerous privacy concerns. Because images can be collected at a distance, subjects may not be aware that their images are being recorded and included in a gallery. Surreptitious use of facial recognition makes a mockery of the concept of consent, for example when it is combined with video surveillance technology (CCTV).
The use of facial recognition systems in public places, such as during peaceful public protests or marches, may have a chilling effect on free speech and political association. And where operators are poorly trained or systems are misused, there is a risk that people are falsely identified as a criminal or wrongdoer and may face difficulties explaining that the facial recognition software got it wrong.
Aaron K Martin is a privacy and IT policy researcher in the Information Systems and Innovation Group at the London School of Economics.
Mobile phone communication can be monitored either on the device side using software installed clandestinely on the mobile device, or on the network side by monitoring the activity of a particular use.
On the device side, there are numerous vendors who supply ‘spyware’ that, when installed directly or remotely on a person’s mobile device can track calls, SMS, email, contacts, location, photos, videos, events, task, memos, and even remotely activate a mobile’s microphone to act as a clandestine listening device. Data is secretly sent back from the device the vendor’s servers where it resides.
While buying this kind of software is not illegal in many countries, how it is used may be in violation of wiretapping laws without a court order. The use of this ‘spyware’ on devices also raises significant ethical and privacy concerns.
On the network side, mobile operators routinely log information about users, their devices, and their behaviour on the network. Much of this is due to the fact that mobile operators bill their customers for services rendered. In other words, monitoring is inherently built into mobile networks’ core business model.
Network operators log the user’s device unique serial number (IMEI), the SIM card number (IMSI) (which may be uniquely tied to the identity of a user due to SIM card registration requirements now ubiqutous in repressive regimes), call and SMS logs, location of the device on the network either by way of tower triangulation or GPS, and a host of other data.
Much of the data flowing over the network is in plain text (such as SMS, for instance), so can be easily and inexpensively monitored by a network operator or an intelligence agency in cooperation with the network operator. Operators, by their licence terms, in most countries are obliged to provide information for ‘lawful intercept’ purposes – which, in many repressive regimes is a fungible concept.
Katrin Verclas is the co-founder and editor of MobileActive.org, an organisation exploring the ways in which mobile phones can be used for activism and other social activity
The Bureau of Investigative Journalism is a not for profit organisation based at City University in London