Читать книгу An Introduction to Text Mining - Gabe Ignatow - Страница 67

Deductive Logic

Оглавление

Deductive logic is the form of inferential logic most closely associated with the scientific method. Deductive research designs start with theoretical abstractions (see Figure 4.2), derive hypotheses from those theories, and then set up research projects that test the hypotheses on empirical data. The purest form of a deductive research design is the laboratory experiment, which in principle allows the researcher to control all variables except for those of theoretical interest and then to determine unequivocally whether hypotheses derived from a theory are supported or not.

Deductive inferential logic has been applied in many text mining studies. An early example is Hirschman’s (1987) study “People as Products,” which tested an established theory of resource exchange on male- and female-placed personal advertisements. In total, Hirschman derived 16 hypotheses from this theory and tested these hypotheses on a year’s worth of personal dating advertisements collected from New York and Washingtonian magazines. Hirschman selected at random 100 male-placed and 100 female-placed advertisements, as well as 20 additional advertisements that she used to establish content categories for the analysis. One male and one female coder coded the advertisements in terms of the categories derived from the 20 additional advertisements. The data were transformed to represent the proportionate weight of each resource category coded (e.g., money, physical status, occupational status) for each sample, and the data were analyzed with a 2 × 2 analysis of variance (ANOVA) procedure. As is discussed in Appendix I, ANOVA is a collection of statistical models used to analyze variation between groups. In Hirschman’s 1987 study, gender of advertiser (male or female) and city (New York or Washington) were the factors analyzed in the ANOVA procedure, while Cunningham, Sagas, Sartore, Amsden, and Schellhase (2004) used ANOVAs to compare news coverage of women’s and men’s athletic teams.


Figure 4.2 ∎ Deductive Logic

Management researchers Gibson and Zellmer-Bruhn’s 2001 study of concepts of teamwork across national organizational cultures is another example of the use of deductive inferential logic in a text mining project. This study’s goal was to test an established theory of the influence of national culture on employees’ attitudes. Gibson and Zellmer-Bruhn tested this theory on data from four organizations in four different countries (France, the Philippines, Puerto Rico, and the United States), conducting interviews that they transcribed to form their corpora. They used QSR NUD*IST (which subsequently evolved into NVivo; see Appendix D) and TACT (Popping, 1997) to organize qualitative coding of five frequently used teamwork metaphors (see Chapter 12), which were then used to create dependent variables for hypothesis testing using multinomial logit and logistic regression (multiple regression).

Cunningham and colleagues’ (2004) analysis of coverage of women’s and men’s sports in the newsletter NCAA (National Collegiate Athletic Association) News is another example of a deductive research design. Cunningham and his colleagues tested theories of organizational resource dependence on data from 24 randomly selected issues of the NCAA News. One issue of the magazine was selected from each month of the year from the years 1999 and 2001 (see Chapter 5 on systematic sampling). From these issues, the authors chose to analyze only articles specifically focused on athletics, coaches, or their teams, excluding articles focused on committees, facilities and other topics (see Chapter 5 on relevance sampling). Two researchers independently coded each of 5,745 paragraphs in the sample for gender and for the paragraph’s location within the magazine and content. Reliability coefficients including Cohen’s kappa and the Pearson product-moment coefficient were calculated. As is discussed in Appendix I, reliability coefficients are used to measure the degree of agreement among raters. Interrater reliability is useful for determining if a particular scale is appropriate for measuring a particular variable. If the raters do not agree, either the scale is defective or the raters need retraining.

Cunningham and his colleagues also calculated word use frequencies, ANOVA, and chi-square statistics. The chi-square statistic, also discussed in Appendix I, is a very useful statistic in text mining research. It allows for comparisons of observed versus expected word frequencies across documents or groups of documents that may differ in size.

The extreme complexity of user-generated textual data poses challenges for the use of deductive logic in social science research. One cannot perform laboratory experiments on the texts that result from interactions among members of large online communities, and it is difficult, and often unethical, to use manipulation to perform field experiments in online communities (see Chapter 3). And even researchers who are immersed in the relevant literatures in their field may not know precisely what they want to look for when they begin their analysis. For this reason, many researchers who work with text mining tools advocate for abductive inferential logic, a more forensic logic that is commonly use in social science research but also in natural science fields such as geology and astronomy where experiments are rarely performed.

An Introduction to Text Mining

Подняться наверх