Читать книгу Privacy Risk Analysis - Sourya Joyee De - Страница 11
ОглавлениеCHAPTER 2
Terminology
Before getting into the substance of the matter, it is necessary to define precisely the main concepts involved in a privacy risk analysis. Indeed, technical terms are not always used in a consistent way in this area and different authors sometimes use the same words with different meanings. The objective of this chapter is to set the scene and introduce the terminology used throughout this book.
In the following subsections, we define successively the notions of:
1. personal data, which is the object of protection;
2. stakeholders, which relate to or handle personal data at various stages of their lifecycle;
3. risk sources, which may cause privacy breaches;
4. feared events, which may lead to privacy harms; and
5. privacy harms, which are the impacts of privacy breaches on individuals, groups of individuals or society as a whole.
Some of these notions, such as privacy harms, have been extensively discussed by legal scholars even though they have received less attention from law makers. Others, such as personal data, are defined by privacy laws and regulations. Still others, such as feared events, have been used only by certain data protection authorities. However, even for terms that are well-discussed, there is generally no single interpretation of their meaning. Therefore, in the following sections we provide a concise definition of each of these terms (which will be further discussed in the next chapters). For some of them, we agree with one of the existing definitions, while for others we provide our own and justify our choice. In the rest of the book, unless otherwise mentioned, these terms will be used in the sense defined in this chapter.
2.1 PERSONAL DATA
Both the European Union (EU) and the United States (U.S.) privacy regulations rely on notions of “data” or “information” but they follow different approaches. While the EU defines the notion of “personal data,” the U.S. refers to “personally identifiable information” (or “PII”). The use of these terms reveals substantial differences in the ways of considering privacy on each side of the Atlantic.
The notion of personal data used in this book is mainly inspired by the definitions provided by the EU Data Protection Directive (“EU Directive” in the sequel) [47] and the EU General Data Protection Regulation (“GDPR” in the sequel) [48]. The primary reason for this choice is that the EU provides a single, uniform definition, which contrasts with the multiple, competing attempts at defining PII in the U.S. [134, 135].
Article 4(1) of the GDPR [48] defines personal data as follows:
‘“Personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”
The GDPR (Recital 26) adds a clarification about pseudonymization and identification: “Personal data which has undergone pseudonymisation, which could be attributed to a natural person by the use of additional information, should be considered to be information on an identifiable natural person. To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.”
This position is inspired by the Working Party 29.1 Opinion 08/2012 [10] suggesting that “any information allowing a natural person to be singled out and treated differently” should be considered as personal data. Our definition of personal data is in line with the approaches followed by the GDPR and the Working Party 29.
Definition 2.1 Personal Data [10, 47, 48]. Personal data is any information relating to an identified or identifiable natural person2 and any information allowing such a person to be singled out or treated differently.
Considering the fact that a person can be singled out or treated differently makes it possible to take into account data processing that can have privacy impacts, such as discriminatory treatments (e.g., discriminatory ads [38]), without necessarily identifying any individual.
The different approaches followed for the definition of personal data in the EU and the U.S. are further discussed in Chapter 4.
2.2 STAKEHOLDERS
The term “stakeholder” is commonly used in the literature, generally without definition. Even though its meaning may look obvious, we define it as follows to avoid any ambiguity.
Definition 2.2 Stakeholder. A stakeholder is any entity (individual or organization) to which a piece of data relates or that processes3 or gets access (legally or not) to a piece of data at any stage of its lifecycle.
The EU Directive provides comprehensive definitions of different types of stakeholders, whereas the U.S. privacy laws and regulations rely on sectoral definitions. In this book, we follow the same approach as the EU Directive and consider the following stakeholders:
• data controllers,
• data subjects,
• data processors and
• third parties.
We also chose to use definitions inspired by the EU Directive for these terms.
Definition 2.3 Data Subject [10, 32, 47, 48]. A data subject is an identified or identifiable natural person whom the personal data relates to.
Definition 2.4 Data Controller [32, 47]. A data controller is an entity (individual or organization) that, alone or jointly with others, determines the purpose, conditions and means of processing of personal data.
Definition 2.5 Data Processor [47]. A data processor is an entity (individual or organization) that processes personal data on behalf of the data controller.
Definition 2.6 Third Party [47]. A third party is an entity (individual or organization) other than the data subject, the controller, the processor and the persons who, under the direct authority of the controller or the processor, are authorized to process the data.
Typical examples of third parties include ad brokers installing cookies on the computer of the data subject, marketing companies receiving personal data from the data controller, or pairs in a social network.
Some difficulties may arise while applying these definitions in practical scenarios, especially those that involve multi-party processing arrangements and cloud computing. In some cases, the notion of the data controller and the data processor cannot be distinguished very easily.4
The roles defined above are not mutually exclusive. For example, a data controller for one set of data or operations may act as a data processor for another set of data or operations. Moreover, consistently with the approach followed in the EU Directive, the above definitions do not imply the lawfulness of the actions of any entity. A data controller, for example, may legally or illegally process data; it may process data without any legitimate purpose or collect more data than necessary for the purpose. This is in agreement with the opinion of the Working Party 29 [8] clarifying that the data controller only “determines” rather than “lawfully determines” the purpose and the means for data processing.
2.3 RISK SOURCES
One of the first steps in a risk analysis is to identify the potential sources of risks, that is to say the entities whose actions can lead to a privacy breach. These entities are often referred to as “adversaries” or “attackers” in the security literature but we prefer to use the term “risk source” here as it is less security-laden and it is not limited to malicious actors. We define a risk source as follows:
Definition 2.7 Risk source. A risk source is any entity (individual or organization) that may process (legally or illegally) personal data related to a data subject and whose actions may directly or indirectly, intentionally or unintentionally lead to privacy harms.
Any of the stakeholders, apart from the data subject himself,5 may be a risk source. Each risk source should be associated with a number of attributes, including its capabilities, background information, motivations, etc. We discuss risk sources and their attributes in Chapter 6.
2.4 FEARED EVENTS
A feared event is a technical event in the processing system that can lead to a privacy harm. An unauthorized party getting access to the health data of a patient or a controller re-identifying a person from an alleged anonymized dataset are examples of feared events. The occurrence of a feared event depends on the existence of weaknesses (of the system or the organization), which we call privacy weaknesses, and the ability of the risk sources to exploit them.
Definition 2.8 Feared Event. A feared event is an event of the processing system that may lead to a privacy harm.
Definition 2.9 Privacy weakness. A privacy weakness is a weakness in the data protection mechanisms (whether technical, organizational or legal) of a system or lack thereof.
As an illustration, a weak encryption algorithm used to protect personal data is a privacy weakness. Weak anonymization algorithms are other examples of privacy weaknesses. The term “vulnerability” is often used with a close meaning in the area of computer security, but we choose the expression “privacy weakness” here because in some cases privacy harms can stem from the functionality of the system itself6 (which would probably not be considered as a vulnerability in the usual sense of the word). For the same reason, we use the expression “harm scenario” to denote the succession of events leading to a feared event, which is often referred to as an “attack” in the security literature. In the simplest cases (for example an unauthorized employee getting access to unprotected data), the exploitation of the privacy weakness is the feared event itself and the harm scenario boils down to a single event. A more complex harm scenario would be a succession of access attempts using passwords from a dictionary and leading to the discovery of the correct password and the access to the personal data.
Definition 2.10 Harm scenario. A harm scenario is a succession of events or actions leading to a feared event.
2.5 PRIVACY HARMS
Feared events denote events (in a technical sense) that have to be avoided. The ultimate goal of a privacy risk analysis is the study of the impacts of these events on individuals, groups or society, which we call the “privacy harms.” For instance, the unauthorized access to health data (a feared event) by a risk source may cause privacy harms such as discrimination (against a patient or a group of patients) or psychological distress. Similarly, the illegal access to location data such as home address may lead to economic or physical injury (e.g., burglary or murder7).
The characterization of privacy harms is not an easy task as it may depend on many contextual factors (cultural, social, personal, etc.). Obviously, societies in different parts of the world follow different sets of unwritten rules and norms of behavior. For example, a data subject belonging to a certain society may feel uneasy if his religious beliefs (or lack thereof) or sexual preferences are revealed. “Acceptance in society” is generally an important factor for individual well-being and should be considered in the risk analysis.
The definition of privacy harms adopted in this book is inspired by Solove’s vivid description of how feared events may affect individuals and society as a whole [140]. It also bears close similarities with the definition of harms proposed by Center for Information Policy Leadership (CIPL) [26].
Definition 2.11 Privacy Harms. Privacy harm is a negative impact of the use of a processing system on a data subject, or a group of data subjects, or society as a whole, from the standpoint of physical, mental, or financial well-being or reputation, dignity, freedom, acceptance in society, self-actualization, domestic life, freedom of expression or any fundamental right.
The above definition takes into consideration the impact on society, because certain harms, like surveillance, are bound to have global impacts such as chilling effect or loss of creativity which are matters for all society, not just individuals. As discussed in Chapter 1, this definition of privacy harms does not concern the impacts on the data controllers or the data processors themselves, which could be considered in a second stage (as indirect consequences of privacy harms) but are not included in the scope of this book.8
2.6 PRIVACY RISKS
The word “risk” is used in this book (as often in the risk management literature) as a contraction of “level of risk.” Levels of risk are generally defined by two values [17, 32, 55]: likelihood and severity.9
The GDPR also refers explicitly to these two dimensions in its Recital 76:
“The likelihood and severity of the risk to the rights and freedoms of the data subject should be determined by reference to the nature, scope, context and purposes of the processing. Risk should be evaluated on the basis of an objective assessment, by which it is established whether data processing operations involve a risk or a high risk.”
In the context of privacy, the likelihood characterizes the probability that a privacy harm may be caused by the processing system, and the severity represents the magnitude of the impact on the victims. The likelihood should combine the probabilities that a risk source will initiate a harm scenario, the probability that it will be able to carry out the necessary tasks (i.e., perform the scenario, including the exploitation of the privacy weaknesses of the system, to bring about a feared event) and the probability that the feared event will cause a harm [17]. The likelihood and the severity can be defined in a quantitative or qualitative manner (for example, using a fixed scale such as “low,” “medium,” “high”). Risks are often pictured in two dimensional spaces [33] or matrices [17]. They are also sometimes reduced to a single value through the use of rules to calculate products of likelihoods by impacts [55].
2.7 PRIVACY RISK ANALYSIS
The first goals of a privacy risk analysis are the identification of the privacy harms that may result from the use of the processing system and the assessment of their severity and likelihood. Based on this analysis, decision makers and experts can then decide which risks are not acceptable and select appropriate measures10 to address them. The risk analysis can be iterated to ensure that the risks have been reduced to an acceptable level. Considering that risk analyses always rely on certain assumptions (e.g., about the state-of-the art of the technology or the motivations of the potential risk sources), they should be maintained and repeated on a regular basis. Among the challenges facing the analyst, particular attention must be paid to two main difficulties:
1. the consideration of all factors that can have an impact on privacy risks and
2. the appropriate assessment of these impacts and their contribution to the assessment of the overall risks.
To discuss these issues in a systematic way, we propose in the next chapters a collection of six components (respectively: processing system, personal data, stakeholders, risk sources, feared events and privacy harms), each of them being associated with:
1. categories of elements to be considered for the component11 and
2. attributes which have to be defined and taken into account for the evaluation of the risks.12
Even though they are not necessarily comprehensive, categories are useful to minimize the risks of omission during the analysis. They take the form of catalogues, typologies or knowledge bases in existing methodologies [33, 55]. For their part, attributes help analysts identify all relevant factors for each component. The use of templates in certain methodologies [33] fulfill a similar role. Table A.1 in Appendix A provides a summary of the categories and the attributes suggested for each component.
1The Working Party 29, or Article 29 Working Party, is a group set up under the EU Directive. It includes a representative from each European data protection authority. One of its missions is to provide recommendations to the European Commission and to the public with regard to data protection and the implementation of the EU Directive.
2This person is the “data subject” defined in Definition 2.2.
3Here we define “processing” in the same way as in the EU Directive as “any operation or set of operations which is performed upon personal data, whether or not by automatic means, such as collection, recording, organization, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, blocking, erasure or destruction”
4This issue is further discussed in Section 5.1.
5However, a data subject may act as a risk source for another data subject.
6For example, in the case of video-surveillance systems or location-based services.
7This happened for example in the case of the murder of actress Rebecca Schaeffer in 1989 where the murderer extracted her home address from the Department of Motor Vehicle records [104, 140].
8This phase can typically take the form of a more traditional risk/benefit analysis considering the potential consequences of privacy harms for the controller (mostly in financial, reputational and legal terms).
9The severity is sometimes called the “impact” or “adverse impact” [17, 55].
10In general, the decision can be to accept a risk, to avoid or mitigate it, or to share or transfer it. Mitigation or avoidance measures can be combinations of technical, organizational and legal controls.
11For example, the categories of data being processed by a health information system may include health data, contact data, identification data, genetic data, etc.
12For example, the level of motivation of a risk source or the level of precision of location data.