Читать книгу Administrative Records for Survey Methodology - Группа авторов - Страница 23

1.3.2 Uncertainty Evaluation: A Case of Two-Way Data

Оглавление

Let a = 1, …, A and j = 1, …, J form a two-way classification of interest. For example, a may stand for ethnicity (White, Black, and Others), and j election votes for party (Democratic, Republic, Others). Or, let a be the index of a large number of local areas, and j the different household types such as single-person, couple without children, couple with children, etc. Let X = {Xaj} be a known register-based proxy table that is unacceptable as “direct tabulation” of the target table Y = {Yaj}.

For the asymmetric-linked setting, suppose there is available an observed sample two-way classification of (a, j). For survey weighting, let s denote the sample and let di = 1/πi be the sampling weight of unit is, where πi is the inclusion probability. Let yi(a, j) = 1 if sample unit is has classification (a, j) according to the target measure and yi(a, j) = 0 otherwise; let xi(a, j) = 1 if it has classification (a, j) according to the proxy measure and xi(a, j) = 0 otherwise. Post-stratification with respect to X yields then the poststratification weight, say, , where

This is problematic when there are empty and very small sample cells of (a, j). Raking ratio weight can then be given by , where is derived by the IPF of to row and column totals Xa+ and X+j, respectively. Deville, Särndal, and Sautory (1993) provide approximate variance of the raking ratio estimator, say, where

A drawback of the weighting approach above is that no estimate of Yaj will be available in the case of empty sample cell (a, j), and the estimate will have a large sampling variance when the sample cell (a, j) is small in size. This is typically the situation in small area estimation, where, e.g. a is the index of a large number of local areas. Zhang and Chambers (2004) and Luna-Hernández (2016) develop prediction modeling approach.

The within-area composition (Ya1, Ya2, …, YaJ) is related to the corresponding proxy composition (Xa1, Xa2, …, XaJ) by means of a structural equation


where is the area-vector of interactions on the log scale, i.e. where = , and similarly for , and β a matrix of unknown coefficients that sum to zero by row and by column.

The structural equation can be used to specify a generalized linear model of the observed sample cell counts, or their weighted totals, which allows one to estimate β and Y. It is further possible to develop the mixed-effects modeling approach that is popular in small area estimation, by introducing the mixed structural equation


with the same quantities and the additional random effects ua = (ua1, …, uaJ)T, where . The associated uncertainty will now be evaluated under the postulated model. The prediction modeling approach can thus improve on the survey weighting approach in the presence of empty and very small sample cells.

For an example under the asymmetric-unlinked setting, consider the Norwegian register-based household statistics. At the time the household register was first introduced for the year 2005, there were still about 6% persons with missing dwelling identification in the Central Population Register. As the missing rate differed by local areas as well as household types, direct tabulation did not yield acceptable results compared to the Census 2001 outputs. The IPF was applied to the sub-population of households that have the dwelling identification to yield a weight for every such household. The method falls under the benchmarked adjustment approach. However, direct evaluation of the associated uncertainty is not straightforward. Zhang (2009b) extends the prediction modeling approach above to accommodate the informative missing data. By comparison with the model-based predictions, one is able to assess indirectly the benchmarked adjustment results.

Using the IPF for small area estimation is known as structure preserving estimation (SPREE, Purcell and Kish 1980). The model underpinning the SPREE is a special case of the prediction models mentioned above, i.e. by setting β = 1. It does not require linkage between the proxy data X and the data that yield the benchmarks Ya+ and Y+i. While this is convenient for deriving the estimates, a difficulty arises when it comes to uncertainty evaluation directly under the SPREE model. See also Dostál et al. (2016) for a benchmarked adjustment method based on the chi-squared measure in this respect.

Finally, let Y by ethnicity and party votes be the table of interest. Suppose one can obtain and Y+j in an election, but there are no joint observations of the cells (a, j). This can be framed as a problem of statistical matching. Provided a proxy table X, say, ethnicity by party membership, the IPF can be applied to obtain an estimated table . Zhang (2015a) develop an uncertainty measure that combines the identification uncertainty and the sampling uncertainty in this context, which enables one to quantify the relative efficiency of the proxy data X, compared to statistical matching without X. The application of the IPF here is an example of the benchmarked adjustment approach.

Administrative Records for Survey Methodology

Подняться наверх