Читать книгу An Introduction to Text Mining - Gabe Ignatow - Страница 34

Key Term

Оглавление

 Unstructured data 19

Highlights

 Social science research has been traditionally conducted based on surveys, but new computational approaches have enabled the use of unstructured data sources as a way to learn information about people.

 Surveys are structured data sets that include clear, targeted information collected in controlled settings. They have the disadvantage of being expensive to run, which limits the frequency and number of surveys that can be collected for a study.

 Unstructured data sets are very large, “always on” naturally occurring digital resources, which can be used to extract or infer information on people. They have their own disadvantages, which include the fact that the information that can be obtained from these resources is often inexact and incomplete as well as subject to the biases associated with the groups of people who generate these data sources.

 Digital resources can be accessed either as collections available through institutional memberships (e.g., LexisNexis), via APIs provided by various platforms (e.g., Twitter API), or otherwise through scraping and crawling as described in Chapter 6.

An Introduction to Text Mining

Подняться наверх