Читать книгу Sampling and Estimation from Finite Populations - Yves Tille - Страница 18
1.7 Modeling the Population
ОглавлениеThe absence of tangible general results concerning certain classes of estimators led to the development of population modeling by means of a model called “superpopulation”. In this model‐based approach, it is assumed that the values taken by the variable of interest on the observation units of the population are the realizations of random variables. The superpopulation model defines a class of distributions to which these random variables are supposed to belong. The sample is then derived from a double random experiment: a realization of the model that generates the population and then the choice of the sample. The idea of modeling the population was present in Brewer (1963a), but it was developed by Royall (1970b, 1971, 1976b) (see also Valliant et al., 2000; Chambers & Clark, 2012).
Drawing on the fact that the random sample is an “ancillary” statistic, Royall proposed to work conditionally on it. In other words, he considered that once the sample is selected, the choice of units is no longer random. This new modeling allowed the development of a particular research school. The model must express a known and previously accepted relationship. According to Royall, if the superpopulation model “adequately” describes the population, the inference can be conducted only with respect to the model, conditional to the sample selection. The use of the model then allows us to determine an optimal estimator.
One can object that a model is always an approximate representation of the population. However, the model is not built to be tested for data but to “assist” the estimation. If the model is correct, then Royall's method will provide a powerful estimator. If the model is false, the bias may be so important that the confidence intervals built for the parameter are not valid. This is essentially the critique stated by Hansen et al. (1983).
The debate is interesting because the arguments are not in the domain of mathematical statistics. Mathematically, these two theories are obviously correct. The argument relates to the adequacy of formalization to reality and is therefore necessarily external to the mathematical aspect of statistical development. In addition, the modeling proposed by Royall is particular. Above all, it makes it possible to break a theoretical impasse and therefore provide optimal estimators. However, the relevance of modeling is questionable and will be considered in a completely different way depending on whether one takes the arguments of sociology, demography or econometrics, three disciplines that are intimately related to the methodology of statistics. A comment from Dalenius (see Hansen et al., 1983, p. 800) highlights this problem:
That is not to say that the arguments for or against parametric inference in the usual statistical theory are not of interest in the context of the theory of survey sampling. In our assessment of these arguments, however, we must pay attention to the relevant specifics of the applications.
According to Dalenius, it is therefore in the discipline in which the theory of survey sampling is applied that useful conclusions should be drawn concerning the adequacy of a superpopulation model.
The statistical theory of surveys mainly applies in official statistics institutes. These institutes do not develop a science but have a mission from their states. There is a fairly standard argument by the heads of national statistical institutes: the use of a superpopulation model in an estimation procedure is a breach of a principle of impartiality which is part of the ethics of statisticians. This argument comes directly from the current definition of official statistics. The principle of impartiality is part of this definition as the principle of accuracy was part of it in the 19th century. If modeling a population is easily conceived as a research tool or as a predictive tool, it remains fundamentally questionable in the field of official statistics.