An Introduction to Text Mining
Реклама. ООО «ЛитРес», ИНН: 7719571260.
Оглавление
Gabe Ignatow. An Introduction to Text Mining
An Introduction to Text Mining
Brief Contents
Detailed Contents
Acknowledgments
Preface
Note to the Reader
About the Authors
1 Text Mining and Text Analysis. Learning Objectives
Introduction
Predicting the Stock Market With Twitter
Six Approaches to Text Analysis
Conversation Analysis
Analysis of Discourse Positions
Critical Discourse Analysis
Combining Critical Discourse Analysis and Corpus Linguistics
Content Analysis
Foucauldian Analysis
Analysis of Texts as Social Information
Challenges and Limitations of Using Online Data
Social Surveys
Ethnography
Historical Research Methods
Key Terms (see Glossary)
Review Questions
Discussion Questions
2 Acquiring Data. Learning Objectives
Introduction
Online Data Sources
Advantages and Limitations of Online Digital Resources for Social Science Research
Examples of Social Science Research Using Digital Data
Key Term
Discussion Questions
3 Research Ethics. Learning Objectives
Introduction
Respect for Persons, Beneficence, and Justice
Ethical Guidelines
Institutional Review Boards
Privacy
Informed Consent
Manipulation
Publishing Ethics
Scenario 1
Scenario 2
Scenario 3
Key Terms
Review Questions
Discussion Questions
4 The Philosophy and Logic of Text Mining. Learning Objectives
Introduction
Ontological and Epistemological Positions
Correspondence Theory
Coherence Theory
Pragmatism
Constructionism
Critical Realism
Metatheory
Grand Theory and Philosophical Positions
Meso Theory
Models
Substantive Theory
Making Inferences
Inductive Logic
An Inductive Approach to Media Framing
Deductive Logic
Abductive Logic
Key Terms
Discussion Questions
5 Designing Your Research Project. Learning Objectives
Introduction
Critical Decisions
Idiographic and Nomothetic Research
Levels of Analysis
The Textual Level
The Contextual Level
The Sociological Level
Texts as Social Information
Texts as Ideological Products
Qualitative, Quantitative, and Mixed Methods Research
Discourse Analysis
Content Analysis
Mixed Methods
Choosing Data
Data Selection
Data Sampling
Formatting Your Data
Key Terms
Review Questions
Discussion Questions
6 Web Scraping and Crawling. Learning Objectives
Introduction
Web Statistics
Web Crawling
Processing Steps in Web Crawling
Traversal Strategies
Crawler Politeness
Web Scraping
Software for Web Crawling and Scraping
Key Terms
Discussion Questions
7 Lexical Resources. Learning Objectives
Introduction
WordNet
WordNet Domains
WordNet-Affect
Roget’s Thesaurus
Linguistic Inquiry and Word Count
General Inquirer
Wikipedia
Wiktionary
BabelNet
Key Terms
8 Basic Text Processing. Learning Objectives
Introduction
Basic Text Processing. Tokenization
Stop Word Removal
Stemming and Lemmatization
Language Models and Text Statistics. Language Models
Text Statistics
More Advanced Text Processing
Part-of-Speech Tagging
Collocation Identification
Syntactic Parsing
Named Entity Recognition
Word Sense Disambiguation
Word Similarity
Key Terms
Discussion Topics
9 Supervised Learning. Learning Objectives
Introduction
Feature Representation and Weighting
Feature Weighting
Supervised Learning Algorithms
Regression
Decision Trees
Instance-Based Learning
Support Vector Machines
Deep Learning With Neural Networks
Evaluation of Supervised Learning
Key Terms
Discussion Topics
10 Analyzing Narratives. Learning Objectives
Introduction
Approaches to Narrative Analysis
Planning a Narrative Analysis Research Project
Analyzing Relationship Breakups
Qualitative Narrative Analysis
Mixed Methods and Quantitative Narrative Analysis Studies
Key Terms
Review Questions
11 Analyzing Themes. Learning Objectives
Introduction
How to Analyze Themes
Coulson’s Studies of Online Support Groups
Analyzing Climate Change Doubt
Examples of Thematic Analysis
Key Terms
Review Questions
12 Analyzing Metaphors. Learning Objectives
Introduction
Cognitive Metaphor Theory
Approaches to Metaphor Analysis
Qualitative, Quantitative, and Mixed Methods
Qualitative Methods Studies
Metaphors in Leadership Communication
Mixed Methods Studies
Quantitative Methods Studies
Key Terms
Review Questions
13 Text Classification. Learning Objectives
Introduction
What Is Text Classification?
A Brief History of Text Classification
Applications of Text Classification
Topic Classification
E-Mail Spam Detection
Sentiment Analysis/Opinion Mining
Gender Classification
Deception Detection
Other Applications
Approaches to Text Classification
Representing Texts for Supervised Text Classification
Feature Weighting and Selection
Text Classification Algorithms
Naive Bayes
Rocchio Classifier
Bootstrapping in Text Classification
Evaluation of Text Classification
Key Terms
Discussion Topics
14 Opinion Mining. Learning Objectives
Introduction
What Is Opinion Mining?
Studying Mood in the Humanities
Resources for Opinion Mining
Lexicons
Corpora
Eshbaugh-Soha’s Study of Presidential News Coverage
Approaches to Opinion Mining
Hand Coding Sentiment in Media
Key Terms
15 Information Extraction. Learning Objectives
Introduction
Entity Extraction
Relation Extraction
Web Information Extraction
Template Filling
Key Terms
16 Analyzing Topics. Learning Objectives
Introduction
What Are Topic Models?
Comparing the Language of Politicians and the Public
Studying Psychological Adaptation to Extreme Environments
How to Use Topic Models
Examples of Topic Modeling. Digital Humanities
Journalism Research
Political Science
Sociology
Key Terms
Review Questions
17 Writing and Reporting Your Research. Learning Objectives
Introduction: Academic Writing
Evidence and Theory
The Structure of Social Science Research Papers
Introduction
Literature Review
Methods
Results
Discussion
Conclusion
References
Appendices
Key Terms
General Undergraduate Research Journals
Anthropology Undergraduate Research Journals
Political Science Undergraduate Research Journals
Psychology Undergraduate Research Journals
Sociology Undergraduate Research Journals
Appendix A Data Sources for Text Mining. The American Presidency Project
arXiv Bulk Data Access
Category:Dataset
CMU Movie Summary Corpus
Congressional and Federal Government Web Harvests
Congressional Record
Consumer Complaint Database
Corpus of Contemporary American English
DocumentCloud
EBSCO Newspaper Source
GloWbE: Corpus of Global Web-Based English
HathiTrust
Internet Archive
JSTOR for Research
LexisNexis Academic
Observatory on Social Media
OpenLibrary
Public.Resource.Org
PubMed
Robots Reading Vogue
Text Creation Partnership
the @unitedstates project
University of Oxford Text Archive
Yahoo Webscope Program
Appendix B Text Preparation and Cleaning Software
Find and Replace
Regexes
Software
Adobe Acrobat
BBEdit
OpenRefine
TextCleanr
TextPipe
TextSoap
Trifacta Wrangler
UltraEdit
Appendix C General Text Analysis Software
Leximancer
Linguistic Inquiry and Word Count
RapidMiner
TextAnalyst
WordStat
Using TextAnalyst to Study Collective Identity
Appendix D Qualitative Data Analysis Software
Commercial Software. ATLAS.ti
Dedoose
f4analyse
HyperRESEARCH
Kwalitan
MAXQDA
NVivo
QDA Miner
Qualrus
Quirkos
Free and Open Source Qualitative Data Analysis Software
AQUAD
Cassandre
Coding Analysis Toolkit
CATMA
Compendium
FreeQDA
libreQDA
Open Code
QDA Miner Lite
RQDA
Saturate
Text Analysis Markup System
Text Analysis Markup System Analyzer
QDAS Tips
Internet Resources. CAQDAS Networking Project
Loughborough University’s CAQDAS Site
Appendix E Opinion Mining Software
Lexicoder
OpinionFinder
RapidMiner Sentiment Analysis
SAS Sentiment Analysis Studio
Appendix F Concordance and Keyword Frequency Software
Adelaide Text Analysis Tool
AntConc
Simple Concordance Program
TextSTAT
Wmatrix
WordSmith
Appendix G Visualization Software
Word Clouds
Word Trees and Phrase Nets
Matrices and Maps
Internet Resources. The Collaboration Site of Viégas and Wattenberg
“Visualizing the Future of Interaction Studies”
The Word Tree, an Interactive Visual Concordance
Wordle
TagCrowd
Appendix H List of Websites. General Text Mining Websites. The DiRT Directory
Loughborough University’s CAQDAS Site
The National Centre for Text Mining
The QDAS Networking Project
Text Analysis Portal for Research
Social Science Ethics Websites. Ethical Decision-Making and Internet Research: Recommendations From the AoIR Ethics Working Committee
The American Psychological Association Report Psychological Research Online: Opportunities and Challenges
The British Psychological Society’s Ethics Guidelines for Internet-Mediated Research
The Davis–Madsen Ethics Scenarios From the Academy of Management Blog Post “Ethics in Research Scenarios: What Would YOU Do?”
The Ethicist Blog From the Academy of Management
The Office of Research Integrity, U.S. Department of Health and Human Services
Social Science Writing Websites. The Social Science Writing Project
“What Is a Social Science Essay?”
“Becoming a ‘Stylish’ Writer: Attractive Prose Will Not Make You Appear Any Less Smart”
Open Access Journal Articles “Opening up to Big Data: Computer-Assisted Analysis of Textual Data in Social Sciences”
“Hypertextuality, Complexity, Creativity: Using Linguistic Software Tools to Uncover New Information about the Food and Drink of Historic Mayans”
“Text Mining Tools in the Humanities: An Analysis Framework”
“Mapping Texts: Visualizing American Newspapers”
Appendix I Statistical Tools
Reliability Coefficients
Analysis of Variance
Chi-Square Tests
Regression
Glossary
References
Index
Отрывок из книги
Research Design, Data Collection, and Analysis
Last but by no means least we thank our spouses and children Neva, Alex, and Sara, and Mihai, Zara, and Caius, for their patience with us and their encouragement over the many years of research, writing, and editing that went into this textbook.
.....
The philosopher and historian Foucault (1973) developed an influential conceptualization of intertextuality that differs significantly from Fairclough’s conceptualization in CDA. Rather than identifying the influence of external discourses within a text, for Foucault the meaning of a text emerges in reference to discourses with which it engages in dialogue. These engagements may be explicit or, more often, implicit. In Foucauldian intertextual analysis, the analyst must ask each text about its presuppositions and with which discourses it dialogues. The meaning of a text therefore derives from its similarities and differences with respect to other texts and discourses and from implicit presuppositions within the text that can be recognized by historically informed close reading.
Foucauldian analysis of texts is performed in many theoretical and applied research fields. For instance, a number of studies have used Foucauldian intertextual analysis to analyze forestry policy (see Winkel, 2012, for an overview). Researchers working in Europe (e.g., Berglund, 2001; Franklin, 2002; Van Herzele, 2006), North America, and developing countries (e.g., Asher & Ojeda, 2009; Mathews, 2005) have used Foucauldian analysis to study policy discourses regarding forest management, forest fires, and corporate responsibility.
.....