Читать книгу Administrative Records for Survey Methodology - Группа авторов - Страница 31

2.2.2 Formal Privacy Models

Оглавление

Computer scientists define a privacy-protected query system as one in which all analyses of the confidential data are passed through a noise-infusion filter before they are published. Some of these systems use input noise infusion – the confidential data are permanently altered at the record level, and then all analyses are done on the protected data. Other formally private systems apply output noise infusion to the results of statistical analyses before they are released.

All formal privacy models define a cumulative, global privacy loss associated with all of the publications released from a given confidential database. This is called the total privacy-loss budget. The budget can then be allocated to each of the released queries. Once the budget is exhausted, no more analysis can be conducted. The researcher must decide how much of the privacy-loss budget to spend on each query – producing noisy answers to many queries or sharp answers to a few. The agency must decide the total privacy-loss budget for all queries and how to allocate it among competing potential users.

An increasing number of modern SDL and formal privacy procedures replace methods like deterministic suppression and targeted random swapping with some form of noisy query system. Over the last decade these approaches have moved to the forefront because they provide the agency with a formal method of quantifying the global disclosure risk in the output and of evaluating the data quality along dimensions that are broadly relevant.

Relatively recently, formal privacy models have emerged from the literature on database security and cryptography. In formal privacy models, the data are distorted by a randomized mechanism prior to publication. The goal is to explicitly characterize, given a particular mechanism, how much private information is leaked to data users.

Differential privacy is a particularly prominent and useful approach to characterizing formal privacy guarantees. Briefly, a formal privacy mechanism that grants ε-differential privacy places an upper bound, parameterized by ε, on the ability of a user to infer from the published output whether any specific data item, or response, was in the original, confidential data (see Dwork and Roth 2014 for an in-depth discussion).

Formal privacy models are very intriguing because they solve two key challenges for disclosure limitation. First, formal privacy models by definition provide provable guarantees on how much privacy is lost, in a probabilistic sense, in any given data publication. Second, the privacy guarantee does not require that the implementation details, specifically the parameter ε, be kept secret. This allows researchers using data published under formal privacy models to conduct fully SDL-aware analysis. This is not the case with many traditional disclosure limitation methods which require that key parameters, such as the swap rate, suppression rate, or variance of noise, not be made available to data users (Abowd and Schmutte 2015).

Administrative Records for Survey Methodology

Подняться наверх