Читать книгу 100 Questions (and Answers) About Research Ethics - Emily E. Anderson - Страница 31

Question #19 What Makes Data De-Identified?

Оглавление

Datasets that have been stripped of all personal identifiers are considered to be de-identified. The federal research regulations do not list specific personal identifiers. Instead, they loosely define identifiable to mean that “the identity of the subject is or may readily be ascertained by the investigator or associated with the information” (45 C.F.R. § 46.102(e)(5)). Although a universal list of personal identifiers does not exist, the 18 identifiers listed in the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule (such as participant name and date of birth) are reasonable identifiers that researchers should consider removing from their datasets when de-identifying them (USHHS, 2015a, 2015b).

The premise of de-identifying datasets is that by removing all personal identifiers, the participants’ identities likely cannot be determined by those who see the data. Even after datasets are de-identified, however, a slight risk remains that participants can be re-identified, if someone had the interest in and means to do so. Therefore, researchers and ethicists debate the extent to which data can truly be de-identified.

Typically researchers must de-identify their datasets when they plan to share them with researchers outside the original study team (such as for secondary data analysis), when the data are to be made publicly available, or when they prepare data for long-term storage. Adequately de-identifying datasets may take considerable effort, depending on the type of information collected. Numerous procedures exist for removing or masking identifiers in quantitative datasets. For example, a specific process is required for removing all HIPAA identifiers from quantitative datasets in research that must follow the HIPAA Privacy Rule (USHHS, 2015a, 2015b).

Processes for de-identifying qualitative data are not as straightforward. Overall, it is very difficult to de-identify qualitative data. Researchers typically modify easily-identifiable data in interview transcripts. For example, proper names said by the participant, such as “my friend Bob,” are removed and replaced with a general description (“my friend”) or a pseudonym. However, that step alone likely does not make qualitative data de-identified. Larger segments, including very specific or unusual experiences, may need to be redacted from transcripts to preserve participants’ identities. Social and behavioral scientists must therefore be mindful of the quality of their data—both quantitative and qualitative—if a large amount of stripping must be done to de-identify them, and whether the necessary context will still remain to allow for valid interpretations to be made by others.

When de-identifying data for sharing or storage, the master list linking personal identifiers to the study data does not necessarily have to be destroyed. Institutional review boards often allow the original researcher to maintain the master list that links the participants’ names to their identification numbers, but that list must be stored securely and not shared.

More questions? See #18, #20, and #24.

100 Questions (and Answers) About Research Ethics

Подняться наверх