Читать книгу Sampling and Estimation from Finite Populations - Yves Tille - Страница 53
2.8 Sampling with Replacement
ОглавлениеSampling designs with replacement should not be used except in very special cases, such as in indirect sampling, described in Section 8.4, page 187 (Deville & Lavallée, 2006; Lavallée, 2007), adaptive sampling, described in Section 8.4.2, page 188 (Thompson, 1988), or capture–recapture techniques, also called capture–mark sampling, described in Section 8.5, page 191 (Pollock, 1981; Amstrup et al., 2005).
In a sampling design with replacement, the same unit can be selected several times in the sample. The random sample can be written using a vector , where represents the number of times that unit is selected in the sample. The can therefore take any non‐negative integer value.
The vector is therefore a positive discrete random vector. The value is the expectation under the design of the number of times that unit is selected. We also write , , and . The expectation since can take values larger than 1. We assume that for all . Under this assumption, the Hansen–Hurwitz (HH) estimator is
and is unbiased for (Hansen & Hurwitz, 1949). The demonstration is the same as for Result 2.4.
The variance is
If for all , this variance can be unbiasedly estimated by
Indeed,
There are two other possibilities for estimating the total without bias. To do this, we use the reduction function from to :
(2.8)
This function removes the multiplicity of units in the sense that units selected more than once in the sample are kept only once.
We then write , the first‐order inclusion probability
and , the second‐order inclusion probability
By keeping only the distinct units, we can then simply use the expansion estimator:
Obviously, if the design with replacement is of fixed sample size, in other words, if
the sample of distinct units does not necessarily have a fixed sample size. The expansion estimator is not necessarily more accurate than the Hansen–Hurwitz estimator.
A third solution consists of calculating the so‐called Rao–Blackwellized estimator. Without going into the technical details, it is possible to show that in the design‐based theory, the minimal sufficient statistic can be constructed by removing the information concerning the multiplicity of units. In other words, if a unit is selected several times in the sample with replacement, it is conserved only once (Basu & Ghosh, 1967; Basu, 1969; Cassel et al., 1977, 1993; Thompson & Seber, 1996, p. 35). Knowing a minimal sufficient statistic, one can then calculate the augmented estimator (also called the Rao–Blackwellized estimator) by conditioning an estimator with respect to the minimal sufficient statistic.
Concretely, we calculate the conditional expectation for all where . Since implies that , we can define the Rao–Blackwellized estimator (RB):
(2.9)
This estimator is unbiased because . Moreover, since
and
we have
The Hansen–Hurwitz estimator should therefore in principle never be used. It is said that the Hansen–Hurwitz estimator is not admissible in the sense that it can always be improved by calculating its conditional expectation. However, this conditional expectation can sometimes be very complex to calculate. Rao–Blackwellization is at the heart of the theory of adaptive sampling, which can lead to multiple selections of the same unit in the sample (Thompson, 1990; Félix‐Medina, 2000; Thompson, 1991a; Thompson & Seber, 1996).