Читать книгу Administrative Records for Survey Methodology - Группа авторов - Страница 16
1.1.1 A Multisource Data Perspective
ОглавлениеUnder the presumption that the target units and measures are collected in survey data, register data traditionally have two principal uses: to provide the frames for sampling and estimation, to provide the auxiliary data for reducing both sampling and non-sampling survey errors (Särndal, Swensson, and Wretman 1992). The term auxiliary data conveys that register data play a helpful supporting role but is ultimately not indispensable. A broader view is necessary in order to cover the full range of approaches for combining register and survey data, where the two types of data are on an equal footing to each other.
Let us first clarify what we mean by register and survey data. We shall simply refer to statistical data arising from administrative sources as register data. On the one hand, this extends the narrow interpretation of the term register as an authoritative list of objects; on the other hand, it implies that generally some processing may be required in order to transform “raw” administrative data into a state that permits them to be utilized for statistical purposes. Next, we shall simply refer to statistical data collected from samples and censuses as survey data. Our usage of the term survey here is conventional and more limiting, e.g. compared to that of Statistics Canada (2015), where it is used generically to cover any activity that collects or acquires statistical data, including administrative records and estimated data. We do not wish to contend the general interpretation, but we adopt the convention to facilitate the discussion that follows. A central distinction between what we call register and survey data is that the survey data are purposely designed and collected for statistical uses, whilst the register data are originally generated and recorded for purposes other than making statistics. This is also the reason why we refer to both survey sampling and census data as survey data, rather than taking on an even narrower interpretation which equates survey data with survey sampling data.
Brackstone (1987) characterizes the uses of administrative records, i.e. register data, into (i) direct tabulation, (ii) indirect estimation, (iii) survey frames, and (iv) survey evaluation. To appreciate what we shall refer to as the multisource data perspective and by way of introduction, let us consider the following question: Are the four uses (i)–(iv) of register data equally applicable to survey data?
Direct tabulation refers to the situation where statistics are produced based on the relevant register data without any explicit use of survey data. The scope of such register-based statistics has increased greatly in the past decades. A prominent example is the latest round of register-based census-like statistics in a number of European countries (UNECE 2014). See Wallgren and Wallgren (2014), for many other examples. As Zhang and Giusti (2016) point out and illustrate, sometimes relevant survey data are available and used implicitly to define the processing rules or to assess the accuracy of the register data, but are not part of the statistics directly. Clearly, in this sense, one can equally speak of direct tabulation based on survey data, such as the use of the Horvitz–Thompson estimator in survey sampling, or direct census enumeration of the population size.
Brackstone (1987) includes, under indirect estimation, the cases where register data “comprise one of the inputs into an estimation process.” In the split-population or split-data approach (UNECE 2011), register and survey data supplement each other literally. A practical example of the split-population approach is the Unified Enterprise Survey at Statistics Canada, where register data are used for over half of the smaller enterprises with simple structures, and survey data are collected from the remaining units with more complex structures. Under the split-data approach, register data would provide some but not all of the required variables for the whole population, which otherwise would have to be collected in survey questionnaires. For example, at Statistics Norway, it is possible to derive income and education level data from statistical registers, so that these variables are not collected in the European Union Statistics on Income and Living Conditions (EU-SILC) and other social surveys. Imputation for survey nonresponse using register data can be viewed as a hybrid approach, where the units and variables to be substituted are determined post hoc after survey data collection. Indirect estimation beyond the split-population/split-data approach will be discussed in details later on, after we have explained the concept of proxy variables in Section 1.1.2.
Regarding the use of register data to create, supplement, or update frames for sample surveys and censuses, it takes only a moment of reflection to realize that exactly the same can be said of survey data. For instance, a census can be used to create, supplement, or update frames for postcensal sample surveys. The yearly Structural Business Survey and specific quality assurance surveys are used to proof or update the Business Register. As a noteworthy special case, one may include here population size estimation based on Census and Census Coverage Surveys (Nirel and Glickman 2009).
Survey evaluation covers the use of register data for checking, validating, or assessing survey data, whether they are collected in a sample or census. This may be done at both individual and aggregate levels. Reversely, using survey estimates for external validation of register-based statistics has been a natural approach from early on (Myrskyla 1991). Quality survey in a census year is another common approach in Scandinavia (Axelson et al. 2020), which is usually not directed at the population coverage errors of the Central Population Register in those countries, but at the various classification and measurement errors in the register data. Or, as mentioned above, survey data are commonly used implicitly to define the processing rules or to assess the accuracy of the register data.
In summary, one can speak of a multisource data perspective for combining register and survey data on at least two different levels. In the wider sense, it is possible to characterize equally the uses of both register and survey data into four broad categories: (i) single-source estimation, (ii) multisource estimation, (iii) frames, and (iv) evaluation. Both can be treated as statistical data and used as such. In a narrower sense, one can greatly extend the scope of “indirect estimation” under the multisource data perspective, where register and survey data each may comprise part of the inputs on an equal footing provided the proxy variables are present. Indirect estimation will be discussed in more details in Section 1.3. But first we shall explain below what we mean by proxy variables.