Читать книгу Biostatistics Decoded - A. Gouveia Oliveira - Страница 12
1.4 Sampling
ОглавлениеSampling is such a central issue in biostatistics that an entire chapter of this book is devoted to discussing it. This is necessary for two main reasons: first, because an understanding of the statistical methods requires a clear understanding of the sampling phenomena; second, because most people do not understand at all the purpose of sampling.
Sampling is a relatively recent addition to statistics. For almost two centuries, statistical science was concerned only with census, the study of entire populations. Nearly a century ago, however, people realized that populations could be studied easier, faster, and more economically if observations were used from only a small part of the population, a sample of the population, instead of the whole population. The basic idea was that, provided a sufficient number of observations were made, the patterns of interest in the population would be reproduced in the sample. The measurements made in the sample would then mirror the measurements in the population.
This approach to sampling had, as a primary objective, to obtain a miniature version of the population. The assumption was that the observations made in the sample would reflect the structure of the population. This is very much like going to a store and asking for a sample taken at random from a piece of cloth. Later, by inspecting the sample, one would remember what the whole piece was like. By looking at the colors and patterns of the sample, one would know what the colors and patterns were in the whole piece (Figure 1.5).
Now, if the original piece of cloth had large, repetitive patterns but the sample was only a tiny piece, by looking at the sample one would not be able to tell exactly what the original piece was like. This is because not every pattern and color would be present in the sample, and the sample would be said not to be representative of the original cloth. Conversely, if the sample was large enough to contain all the patterns and colors present in the piece, the sample would be said to be representative (Figure 1.6).
This is very much the reasoning behind the classical approach to sampling. The concept of representativeness of a sample was tightly linked to its size: large samples tend to be representative, while small samples give unreliable results because they are not representative of the population. The fragility of this approach, however, is its lack of objectivity in the definition of an adequate sample size.
Figure 1.5 Classical view of the purpose of sampling.
Figure 1.6 Relationship between representativeness and sample size in the classic view of sampling. The concept of representativeness is closely related to sample size.
Some people might say that the sample size should be in proportion to the total population. If so, this would mean that an investigation on the prevalence of, say, chronic heart failure in Norway would require a much smaller sample than the same investigation in Germany. This makes little sense. Now suppose we want to investigate patients with chronic heart failure. Would a sample of 100 patients with chronic heart failure be representative? What about 400 patients? Or do we need 1000 patients? In each case, the sample size is always an almost insignificant fraction of the whole population.
If it does not make much sense to think that the ideal sample size is a certain proportion of the population (even more so because in many situations the population size is not even known), would a representative sample then be the one that contains all the patterns that exist in the population? If so, how many people will we have to sample to make sure that all possible patterns in the population also exist in the sample? For example, some findings typical of chronic heart failure, like an S3‐gallop and alveolar edema, are present in only 2 or 3% of patients, and the combination of these two findings (assuming they are independent) should exist in only 1 out of 2500 patients. Does this mean that no study of chronic heart failure with less than 2500 patients should be considered representative? And what to do when the structure of the population is unknown?
The problem of lack of objectivity in defining sample representativeness can be circumvented if we adopt a different reasoning when dealing with samples. Let us accept that we have no means of knowing what the population structure truly is, and all we can possibly have is a sample of the population. Then, a realistic procedure would be to look at the sample and, by inspecting its structure, formulate a hypothesis about the structure of the population. The structure of the sample constrains the hypothesis to be consistent with the observations.
Taking the above example on the samples of cloth, the situation now is as if we were given a sample of cloth and asked what the whole piece would be like. If the sample were large, we probably would have no difficulty answering that question. But if the sample were small, something could still be said about the piece. For example, if the sample contained only red circles over a yellow background, one could say that the sample probably did not come from a Persian carpet. In other words, by inspecting the sample one could say that it was consistent with a number of pieces of cloth but not with other pieces (Figure 1.7).
Therefore, the purpose of sampling is to provide a means of evaluating the plausibility of several hypotheses about the structure of the population, through a limited number of observations and assuming that the structure of the population must be consistent with the structure of the sample. One immediate implication of this approach is that there are no sample size requirements in order to achieve representativeness.
Let us verify the truth of this statement and see if this approach to sampling is still valid in the extreme situation of a sample size of one. We know that with the first approach we would discard such a sample as non‐representative. Will we reach the same conclusion with the current approach?
Figure 1.7 Modern view of the purpose of sampling. The purpose of sampling is the evaluation of the plausibility of a hypothesis about the structure of the population, considering the structure of a limited number of observations.