Читать книгу Administrative Records for Survey Methodology - Группа авторов - Страница 57

2.A.2 Concepts

Оглавление

 Analytical validity: It exists when, at a minimum, estimands can be estimated without bias and their confidence intervals (or the nominal level of significance for hypothesis tests) can be stated accurately (Rubin 1987). The estimands can be summaries of the univariate distributions of the variables, bivariate measures of association, or multivariate relationships among all variables.

 Coarsening: A method for protecting data that involves mapping confidential values into broader categories, e.g. a histogram.

 Confidentiality: A “quality or condition accorded to information as an obligation not to transmit […] to unauthorized parties” (Fienberg 2005, as quoted in Duncan, Elliot, and Salazar-González 2011). Confidentiality addresses data already collected, whereas privacy (see below) addresses the right of an individual to consent to the collection of data.

 Data swapping: Sensitive data records (usually households) are identified based on a priori criteria, and matched to “nearby records.” The values of some or all of the other variables are swapped, usually the geographic identifiers, thus effectively relocating the records in each other’s location.

 Differential privacy: A class of formal privacy mechanisms. For instance, ε-differential privacy places an upper bound, parameterized by ε, on the ability of a user to infer from the published output whether any specific data item, or response, was in the original, confidential data (Dwork and Roth 2014).

 Dirichlet-multinomial distribution: A family of discrete multivariate probability distributions on a finite support of nonnegative integers. The probability vector p of the better-known multinomial distribution is obtained by drawing from a Dirichlet distribution with parameter α.

 Input noise infusion: Distorting the value of some or all of the inputs before any publication data are built or released.

 Posterior predictive distribution (PPD): In Bayesian statistics, the distribution of all possible values conditional on the observed values.

 Privacy: “An individual’s freedom from excessive intrusion in the quest for information and […] ability to choose [… what …] will be shared or withheld from others” (Duncan, Jabine, and de Wolf 1993, quoted in Duncan, Elliot, and Salazar-González 2011). See also confidentiality, above.

 Sampling: As part of SDL, works by only publishing a fractional part of the data.

 Statistical confidentiality or SDL – Statistical disclosure limitation: Can be viewed as “a body of principles, concepts, and procedures that permit confidentiality to be afforded to data, while still permitting its use for statistical purposes” (Duncan, Elliot, and Salazar-González 2011, p. 2).

 Suppression: Describes the removal of cells from a published table if its publication would pose a high risk of disclosure.

Administrative Records for Survey Methodology

Подняться наверх