Читать книгу A Companion to Medical Anthropology - Группа авторов - Страница 69

Sample Size

Оглавление

Sample size, Bernard (2018, p. 127) notes, is a function of four things: (1) how much variation exists in the population, (2) the number of subgroups you want to compare, (3) how big the differences are between subgroups, and (4) how precise your estimates need to be. These principles apply to studies large and small and are relevant to collecting either attribute or cultural data.

Procedures for estimating sample size in confirmatory survey or experimental research are well established (Cohen 1992). In exploratory research, the theoretical and empirical basis for evaluating sample size is relatively less developed (Onwuegbuzie and Leech 2007) but actively under development. Until recently, all we had were rules of thumb. Morse (1994) proposed sample sizes of 5–50 informants, depending on the purpose of the study. Charmaz (2014) suggested 20–30 for a grounded theory study. Creswell (2007, pp. 126–128) recommended one or two participants in narrative research, 3–10 in phenomenological research, 20–30 in grounded theory research, 4–5 cases in case study research, and “numerous artifacts, observations, and interviews…until the workings of the cultural-group are clear” in ethnography (p. 128).

As Creswell’s advice for ethnographers suggests, a guiding principle is theoretical saturation: Your sample is large enough when you stop getting new information. But how can you estimate in advance how large that will be? Guest et al. (2006) addressed this issue in a study of HIV prevention in Ghana and Nigeria. They interviewed a total of 60 female sex workers. After every six interviews, they tracked which new themes appeared, how frequently each theme occurred, and how much codebook definitions changed. By these measures, Guest et al. reached saturation after only 12 interviews. This finding is consistent with rule-of-thumb guidelines and with predictions from cultural consensus theory (Romney et al. 1986). But Guest et al. note two important caveats. First, the semistructured interview guide was narrowly focused, and all women answered the same questions. In fully unstructured interviews, it would be harder to reach saturation, because new themes would appear as researchers introduced new questions over time. Second, the sample included only one, relatively homogenous subgroup: young, urban, female sex workers. Because sample size is a function of heterogeneity in the phenomenon of interest, adding other subgroups likely would have increased the sample size necessary to reach saturation.

Hagaman and Wutich (2016) showed that to be the case. They analyzed semistructured ethnographic interviews from a cross-cultural study on water issues in four research sites: one each in Bolivia, Fiji, New Zealand, and the United States. The question was how many interviews were necessary to reach data saturation for themes and metathemes within and across sites. Hagaman and Wutich operationalized saturation as having identified a theme in three separate interviews. Most themes appeared for the first time quickly – only 3–5 interviews – but it generally took up to 10 interviews for the second instance of a theme and 16 for the third. These numbers are just averages. Hagaman and Wutich found that even 30 interviews wasn’t enough in the U.S. site and that, to identify metathemes cross-culturally, it took up to 39 interviews. These findings underscore the principle that the more heterogeneous the population, the larger the sample you will need.

Cultural consensus theory (Romney et al. 1986) formalizes the relationship between heterogeneity and sample size. The theory draws on a cognitive view of culture as shared and socially transmitted knowledge; it then provides a formal model for measuring the extent to which knowledge is shared or contested. The implication for sample size is that the higher the sharing, the smaller the sample necessary to detect consensual beliefs. If we wanted to know how Americans carve up the calendar into days of the week, a handful of informants would do, because this cultural knowledge is widely shared. But if we wanted to understand how days of the week relate to more complex domains – eating, drinking, family life, or sources of stress – we would need a larger sample to capture the variation. Consensus theory formalizes this intuition, and Weller (2007) provides tables for calculating necessary sample sizes to achieve desired levels of accuracy and validity, given varying levels of agreement among informants. To use this table in designing a study, you’d have to make some assumptions about how much agreement you expect to find.

Baer et al. (2003) used this approach to calculate subsample sizes in their cross-cultural study of nervios. They anticipated a moderate level of consensus (.50) and used stringent criteria for accuracy (.95) and level of confidence (.999). Using these conservative assumptions, the tables in Weller (2007) show that at least 29 informants were necessary in each research site. Baer et al. went a bit beyond the minimum and set subsample sizes at 40 per site “to be sure that we had sufficient individuals for comparative purposes within samples” (p. 323).

Table 4.2 shows how Christopher McCarty and I incorporated consensus theory and emerging evidence about saturation into a quota sampling design for semistructured interviews with African Americans in Tallahassee, FL. The goal of this study was to identify how the experience of racism and other social stressors shapes the risk of high blood pressure among African Americans. The exploratory phase included a round of semistructured interviews with a projected sample size of 48. We arrived at this sample size by identifying four individual attributes related to the experience of racism among African Americans: gender, age, skin tone, and socioeconomic status (SES). We then set quotas for all possible combinations of these attributes, treating them simplistically as dichotomous variables. The resulting sample size allows for comparisons between groups of 8–12 informants with different attributes relevant to experiences of racism. This work led to a new measure of vicarious racism, which we subsequently found to be associated with blood pressure through different biological pathways than for one’s own, direct experience of discrimination (Quinlan et al. 2016).

Table 4.2 Stratified quota sampling design for semistructured interviews on experiences of racism among African Americans in Tallahassee, FL

Age 25–34 Age 35–54 Age 55–65
Dark Light Dark Light Dark Light Total
Men
Low SES 2 2 2 2 2 2 12
High SES 2 2 2 2 2 2 12
Women
Low SES 2 2 2 2 2 2 12
High SES 2 2 2 2 2 2 12
Total 8 8 8 8 8 8 48

Guest and colleagues (2006) stimulated an explosion of research on the meaning and measurement of saturation. In subsequent work, they extended the question to focus groups (Guest et al. 2017; see also Hennink et al. 2019) and recently proposed a simple measure for reporting on saturation (Guest et al. 2020). Others have produced their own estimates of minimum sample size (Francis et al. 2010), examined different conceptual dimensions of saturation (Hennink et al. 2016; Weller et al. 2018), developed statistical approaches for estimating saturation (Galvin 2015), and questioned whether saturation is the right criterion at all (Braun and Clarke 2021; Leese et al. 2021; Sebele-Mpofu 2020). Work in this area is certain to continue in coming years.

A Companion to Medical Anthropology

Подняться наверх