Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis - Страница 44
2.7 ESTIMATION AND ESTIMATORS
ОглавлениеThe goal of statistical inference is, in general, to estimate parameters of a population. We distinguish between point estimators and interval estimators. A point estimator is a function of a sample and is used to estimate a parameter in the population. Because estimates generated by estimators will vary from sample to sample, and thus have a probability distribution associated with them, estimators are also often random variables. For example, the sample mean is an estimator of the population mean μ. However, if we sample a bunch of from a population for which μ is the actual population mean, we know, both from experience and statistical theory, that will vary from sample to sample. This is why the estimator is often a random variable, because its values will each have associated with them a given probability (density) of occurrence. When we use the estimator to obtain a particular number, that number is known as an estimate. An interval estimator provides a range of values within which the true parameter is hypothesized to exist within some probability. A popular interval estimator is that of the confidence interval, a topic we discuss later in this chapter.
More generally, if T is some statistic, then we can use T as an estimator of a population parameter θ. Whether the estimator T is any good depends on several criteria, which we survey now.
On average, in the long run, the statistic T is considered to be an unbiased estimator of θ if
That is, an estimator is considered unbiased if its expected value is equal to that of the parameter it is seeking to estimate. The bias of an estimator is measured by how much E(T) deviates from θ. When an estimator is biased, then E(T) ≠ θ, or, we can say E(T) − θ ≠ 0. Since the bias will be a positive number, we can express this last statement as E(T) − θ > 0.
Good estimators are, in general, unbiased. The most popular example of an unbiased estimator is that of the arithmetic sample mean since it can be shown that:
An example of an estimator that is biased is the uncorrected sample variance, as we will soon discuss, since it can be shown that
However, S2 is not asymptotically biased. As sample size increases without bound, E(S2) converges to σ2. Once the sample variance is corrected via the following, it leads to an unbiased estimator, even for smaller samples:
where now,
Consistency6of an estimator means that as sample size increases indefinitely, the variance of the estimator approaches zero. That is, as n → ∞. We could also write this using a limit concept:
which reads “the variance of the estimator T as sample size n goes to infinity (grows without bound) is equal to 0.” Fisher called this the criterion of consistency, informally defining it as “when applied to the whole population the derived statistic should be equal to the parameter” (Fisher, 1922a, p. 316). The key to Fisher's definition is whole population, which means, theoretically at least, an infinitely large sample, or analogously, n → ∞. More pragmatically, when we have the entire population.
An estimator is regarded as efficient the lower is its mean squared error. Estimators with lower variance are more efficient than estimators with higher variance. Fisher called this the criterion of efficiency, writing “when the distributions of the statistics tend to normality, that statistic is to be chosen which has the least probable error” (Fisher, 1922a, p. 316). Efficient estimators are generally preferred over less efficient ones.
An estimator is regarded as sufficient for a given parameter if the statistic “captures” everything we need to know about the parameter and our knowledge of the parameter could not be improved if we considered additional information (such as a secondary statistic) over and above the sufficient estimator. As Fisher (1922a, p. 316) described it, “the statistic chosen should summarize the whole of the relevant information supplied by the sample.” More specifically, Fisher went on to say:
If θ be the parameter to be estimated, θ1 a statistic which contains the whole of the information as to the value of θ, which the sample supplies, and θ2 any other statistic, then the surface of distribution of pairs of values of θ1 and θ2, for a given value of θ, is such that for a given value of θ1, the distribution of θ2 does not involve θ. In other words, when θ1 is known, knowledge of the value of θ2 throws no further light upon the value of θ.
(Fisher, 1922a, pp. 316–317)