Читать книгу End-to-end Data Analytics for Product Development - Chris Jones - Страница 27

Stat Tool 1.14 Estimation of Population Parameters and Confidence Intervals

Оглавление

Let's introduce the problem of the estimation of a population parameter.

Because it is often impractical or impossible to gather data on the entire population, we must estimate the population parameters using sample statistics.

Statistics, such as the sample mean and standard deviation, are called point estimators.

A point estimate is a single sample value that approximates the true unknown value of a population parameter.

 Point estimators:

sample mean sample proportion p sample standard deviation S

 Population parameters:

population mean μ population proportion π population standard deviation σ

Point estimates, such as the sample mean or standard deviation, provide a lot of information, but they don't give us the full picture.

As it is highly unlikely that, for example, the sample mean and standard deviation we obtain are exactly the same as the population parameters, and to get a better sense of the true population values, we can use confidence intervals.

A confidence interval is a range of likely values for a population parameter, such as the population mean or standard deviation.

Usually, a confidence interval is a range:


Using confidence intervals, we can say that it is likely that the population parameter is somewhere within this range.

 Example 1.3. To illustrate this point, suppose that a research team wants to know the mean satisfaction score (from 0: completely not satisfied, to 10: completely satisfied) for the population of people who use a new formulation of a product.From a random sample of consumers, the sample mean is 6.8, and the confidence interval is CI = (6.2; 7.4).Mean satisfaction score (population parameter) = ?So the true unknown population mean satisfaction score is likely to be somewhere between 6.2 and 7.4.The central point of the confidence interval is the sample mean: = 6.8 (point estimate of μ).

There's always a chance that the confidence interval won't contain the true population mean.

When we use confidence intervals, we must decide how sure we need to be that the confidence interval contains the actual population parameter value, taking into account that we cannot be 100% sure.

We quantify how sure we need to be with a value called the confidence level, usually denoted by (1 − α).

The confidence level is set by the researcher before calculation of a confidence interval.

The most common confidence level is 95% (0.95). Other common levels are 90% and 99%.

The confidence level is how sure we are that the confidence interval contains the actual population parameter value.

 Example 1.4. To illustrate the meaning of the confidence level, let's return to the previous example and suppose we drew 100 samples from the same population and calculated the confidence interval for each sample.If we used 95% confidence intervals, on average 95 out of 100 of the confidence intervals will contain the population parameter, while 5 out of 100 will not.In practice when we calculate a 95% confidence interval for our sample, we are confident that our sample is one of the 95% samples for which CI covers the true parameter value.

End-to-end Data Analytics for Product Development

Подняться наверх