Читать книгу An Introduction to Text Mining - Gabe Ignatow - Страница 32

Advantages and Limitations of Online Digital Resources for Social Science Research

Оглавление

The use of online digital resources, and in particular of social media, comes with its plusses and minuses. Salganik (in press) provided a good summary of the characteristics of big data in general, many of which apply to social media in particular. He grouped characteristics into those that are good for research and those that are not good for research.

Among the characteristics that make big data good for research are (a) its size, which can allow for the observation of rare events, for causal inferences, and generally for more advanced statistical processing that is not otherwise possible when the data are small; (b) its “always-on” property, which provides a time dimension to the data and makes it suitable to study unexpected events and produce real time measurements (e.g., capture people’s reactions during a tornado, by analyzing the tweets from the affected area); and (c) its nonreactive nature, which implies that the respondents behave more naturally due to the fact that they are not aware of their data being captured (as it is the case with surveys).

Then there are also characteristics that make big data less appealing to research, such as (a) its incompleteness—that is, often digital data collections lack demographics or other information that is important for social studies; (b) its inherent bias, in that the contributors to such online resources are not a random sample of the people—consider, for instance, the people who tweet many tweets a day versus those who choose to never tweet; they represent different types of populations with different interests, personalities, and values, and even the largest collection of tweets will not capture the behaviors of those who are not users of Twitter; (c) its change over time, in terms of users (who generates social media data and how it generates it) and platforms (how is the social media data being captured), which makes it difficult to conduct longitudinal studies; and (d) finally its susceptibility to algorithmic confounds, which are properties that seem to belong to the data being studied which in fact are caused by the underlying system used to collect the data—as in the seemingly magic number of 20 friends that many people seem to have on Facebook, which turns out to be an effect of the Facebook platform that actively encourages people to make friends until they reach 20 friends (Salganik, in press). In addition, some types of digital data are inaccessible—for example, e-mails, queries sent to search engines, phone calls, and so forth, which makes it difficult to conduct research on behaviors associated with those data types.

An Introduction to Text Mining

Подняться наверх