Читать книгу Administrative Records for Survey Methodology - Группа авторов - Страница 17

1.1.2 Concept of Proxy Variable

Оглавление

According to Upton and Cook (2008), a proxy variable is “a measured variable that is used in place of a variable that cannot be measured.” We make two observations. Firstly, one may distinguish between the cases where the ideal measure is unobservable in principle and where it is unavailable by chance. For example, per-capita gross domestic product (GDP) is sometimes used as a proxy measure of living standard, where it seems reasonable to acknowledge that the latter is unobservable in principle. For a contrasting example, country of birth can generate a proxy to mother tongue, by referring to the official language in that country. One should think that in this case the ideal measure is unavailable only due to circumstances. Secondly, in order for a proxy to be used in place of the ideal measure, the two should have the same support. Taking the previous example, it is not the birth country that is a proxy to the true mother tongue, but the official language in that country, and the common support of the proxy and ideal measures being all the existing languages in this case.

Zhang (2015a) defines a proxy variable as one that is similar in definition and has the same support as the target variable. It follows that one can regard two variables as proxy to each other, without having to specify one of them to be the target (or ideal) measure. Variables such as age, sex, education, income can be useful auxiliary but not proxy variables for the binary International Labour Organization (ILO) unemployment status. In particular, sex is not a proxy despite it being binary and thus have the same support as the unemployment status, because they do not have similar definitions. The binary register-based job-seeker status is a proxy, and the ILO unemployment status does not have to be the ideal measure for every conceivable purpose. But the job-seeker status is not a proxy variable for the activity status defined as (employed, unemployed, and inactive) because the two have different support.

Proxy variables can arise from survey data. For example, indirect interview yields proxy measures (Thomsen and Villund 2011), where household members respond on behalf of the absentees. Data collected in different modes can be proxy to each other. A variable collected in a census can be proxy to the same variable or a similarly defined one in the postcensal years. Synthetic datasets released for research can contain proxy variables for the target measures, based on which the synthetic ones are modeled and generated. Register data are perhaps the richest source of proxy variables. It is often possible to have both complete coverage and concurrency, or nearly so. As some common examples of register proxy variable one can mention economic activity status, education level, income, family and housing condition, etc. in social statistics; value-added tax (VAT) based turnover, export and import, house price, animal holding, fishing and hunting figures, arable soils, vegetation, etc. in economic and environmental statistics.

Finally, it is useful to reflect on the relationship between a proxy variable and one that can be affected by measurement errors, since one can always envisage a proxy variable as an attempt to measure the target variable, whether the effort is real or imaginary. Measurement errors are commonly decomposed into two components: random errors and systematic errors. By definition random errors occur by chance and has zero expectation. Insofar as one considers random measurement errors to be unavoidable and omnipresent, any measured variable can only be a proxy of the ideal measure. In contrast, many proxy variables will remain the case even when it is acceptable to disregard the potential random errors for practical purposes. Systematic errors due to discrepancy in definition, instrument, time point, etc. are then the cause of imperfect measure, including when the ideal measure is unobservable in principle. Notice that this interpretation of systematic errors differs from the usage of the term in statistical data editing (de Waal, Pannekoek, and Scholtus 2011), where a systematic error is regarded as an error for which a plausible cause can be detected and knowledge of the underlying error mechanism enables then a satisfactory treatment in an unambiguous deterministic manner. Some examples of such systematic errors are typographical, measurement unit or sign errors. In summary, regardless of whether proxy variables may arise due to measurement errors, we are concerned here with the proxy variables that cannot be corrected by data editing methods.

Administrative Records for Survey Methodology

Подняться наверх