Читать книгу Applied Biostatistics for the Health Sciences - Richard J. Rossi - Страница 18
1.2 Populations, Samples, and Statistics
ОглавлениеIn every biomedical study there will be research questions to define the particular population that is being studied. The population that is being studied is called the target population. The target population must be a well-defined population so that it is possible to collect representative data that can be used to provide information about the answers to the research questions. Finding the actual answer to a research question requires that the entire target population be observed, which is usually impractical or impossible. Thus, because it is generally impractical to observe the entire target population, biomedical researchers will use only a subset of the population units in their research study. A subset of the population is called a sample, and a sample may provide information about the answer to a research question but cannot definitively answer the question itself. That is, complete information on the target population is required to answer the research question, and because a sample is only a subset of the target population, it can only provide information about the answer. For this reason, statistics is often referred to as “the science of describing populations in the presence of uncertainty.”
The first thing a biostatistician generally must do is to take the research question and determine a particular set of characteristics of the target population that are related to the research question being studied. A biostatistician then must determine the relevant statistical questions about these population characteristics that will provide answers or the best information about the research questions. A characteristic of the target population that can be summarized numerically is called a parameter. For example, in a study of the body mass index (BMI) of teenagers, the average BMI value for the target population is a parameter, as is the percentage of teenagers having a BMI value less than 25. The parameters of the target population are based on the information about the entire population, and hence, their values will be unknown to the researcher.
To have a meaningful statistical analysis, a researcher must have well-defined research questions, a well-defined target population, a well-designed sampling plan, and an observed sample that is representative of the target population. When the sample is representative of the target population, the resulting statistical analysis will provide useful information about the research questions; however, when the observed sample is not representative of the target population the resulting statistical analysis will often lead to misleading or incorrect inferences being drawn about the target population, and hence, about the research questions, also. Thus, one of the goals of a biostatistician is to obtain a sample that is representative of the target population for estimating or testing the unknown parameters.
Once a representative sample is obtained, any quantity computed from the information in the sample and known values is called statistic. Thus, because any estimate of the unknown parameters will be based only on the information in the sample, the estimates are also statistics. Statements made by extrapolating from the sample information (i.e., statistics) about the parameters of the population are called statistical inferences, and good statistical inferences will be based on sound statistical and scientific reasoning. Thus, the statistical methods used by a biostatistician for making inferences need to be based on sound statistical and scientific reasoning. Furthermore, statistical inferences are meaningful only when they are based on data that are truly representative of the target population. Statistics that are computed from a sample are often used for estimating the unknown values of the parameters of interest, for testing claims about the unknown parameters, and for modeling the unknown parameters.