Читать книгу Sampling and Estimation from Finite Populations - Yves Tille - Страница 17
1.6 The Statistical Theory of Survey Sampling
ОглавлениеThe establishment of a new scientific consensus in 1925 and the identification of major lines of research in the following years led to a very rapid development of survey theory. During the Second World War, research continued in the United States. Important contributions are due to Deming & Stephan (1940), Stephan (1942, 1945, 1948) and Deming (1948, 1950, 1960), especially on the question of adjusting statistical tables to census data. Cornfield (1944) proposed using indicator variables for the presence of units in the sample. Cochran (1939, 1942, 1946, 1961) and Hansen & Hurwitz (1943, 1949) showed the interest of unequal probability sampling with replacement. Madow (1949) proposed unequal probability systematic sampling (see also Hansen et al., 1953a,b). This is quickly established that an unequal probability sampling with fixed size without replacement is a complex problem. Narain (1951), Horvitz & Thompson (1952), Sen (1953), and Yates & Grundy (1953) presented several methods with unequal probabilities in two articles that are certainly among the most cited in this field. Devoted to the examination of several designs with unequal probabilities, these texts are mentioned for the general estimator (expansion estimator) of the total, which is also proposed and discussed. The expansion estimator is, in fact, an unbiased general estimator applicable to any sampling design without replacement. However, the proposed estimator of variance has a default. Yates & Grundy (1953) showed that the variance estimator proposed by Horvitz and Thompson can be negative. They proposed a valid variant when the sample is of fixed sample size and gives sufficient conditions for it to be positive. As early as the 1950s, the problem of sampling with unequal probabilities attracted considerable interest, which was reflected in the publication of more than 200 articles. Before turning to rank statistics, Hájek (1981) discussed the problem in detail. A book of synthesis by Brewer & Hanif (1983) was devoted entirely to this subject, which seems far from exhausted, as evidenced by regular publications.
The theory of survey sampling, which makes abundant use of the calculation of probabilities, attracted the attention of university statisticians and very quickly they reviewed all aspects of this theory that have a mathematical interest. A coherent mathematical theory of survey sampling was constructed. The statisticians very quickly came up against a difficult problem: surveys with finite populations. The proposed model postulated the identifiability of the units. This component of the model makes irrelevant the application of the reduction by sufficiency and the maximum likelihood method. Godambe (1955) states that there is no optimal linear estimator. This result is one of the many pieces of evidence showing the impossibility of defining optimal estimation procedures for general sampling designs in finite populations. Next, Basu (1969) and Basu & Ghosh (1967) demonstrated that the reduction by sufficiency is limited to the suppression of the information concerning the multiplicity of the units and therefore of the nonoperationality of this method. Several approaches were examined, including one from the theory of the decision. New properties, such as hyperadmissibility (see Hanurav, 1968), are defined for estimators applicable in finite populations.
A purely theoretical school of survey sampling developed rapidly. This theory attracted the attention of researchers specializing in mathematical statistics, such as Debabrata Basu, who was interested in the specifics of the theory of survey sampling. However, many of the proposed results were theorems of the nonexistence of optimal solutions. Research on the question of the foundations of inference in survey theory was becoming so important that it was the subject of a symposium in Waterloo, Canada, in 1971. At this symposium, the intervention of Calyampudi Radhakrishna Rao (1971, p. 178), began with a very pessimistic statement:
I may mention that in statistical methodology, the existence of uniformly optimum procedures (such as UMV unbiased estimator, uniformly most powerful critical region for testing a hypothesis) is a rare exception rather than a rule. That is the reason why ad hoc criteria are introduced to restrict the class of procedures in which an optimum may be sought. It is not surprising that the same situation is obtained in sampling for a finite situation. However, it presents some further complications which do not seem to exist for sampling from infinite populations.
This introduction announced the direction of current research.
In survey sampling theory, there is no theorem showing the optimality of an estimation procedure for general sampling designs. Optimal estimation methods can only be found by restricting them to particular classes of procedures. Even if one limits oneself to a particular class of estimators (such as the class of linear or unbiased estimators), it is not possible to obtain interesting results. One possible way out of this impasse is to change the formalization of the problem, for example by assuming that the population itself is random.