Читать книгу Innovations in Digital Research Methods - Группа авторов - Страница 55

2.4.2 Data Quality – Reliability, Validity and Generalizability

Оглавление

Data quality is a key issue in any form of social science research. Data quality includes the reliability, validity and completeness of the data. In the rush to use new data there is a risk that the core values of social science, including rigorous research design and hypothesis testing, are put to one side. Orthodox social science research has developed quality control mechanisms over the long term to test the reliability, validity and generalizability of its explanations but at present these mechanisms do not easily extend to many new data types.

Reliability and Validity. A key data quality issue relates to understanding the motivations of the producers of the data and how accurate the data is in relation to its use and the claims that are made from it. For example, a tweet might be generated for fun, to provide information or to persuade or mislead; the motivation obviously affects the meaning of the tweet. With survey data and even, to some extent, administrative data, the impact of respondent motivations is, at least in principle, structured by (or perhaps mediated by) the data collection instrument itself (see Chapter 4). Thus, a well-designed social science research instrument can constrain motivational impact. But this is not so with Twitter data; here people’s motivations are given full rein – a tweet might be designed to manipulate or obfuscate, to attract truth or to repel it. It might be designed to fantasize or ‘try out an opinion’, to provoke a response or simply to create controversy.

As we have outlined above, the issue of the interpretability of tweets is subject to some debate. Verification techniques can be used to check the quality of the data or to profile a person’s tweets in order to assess their veridicality and some media are already wise to this.72 This can involve collating and analysing individual people’s tweets over time to look for consistency and changes in attitudes.

Generizability. A common concern for social science researchers is what can be claimed on the basis of the data and, specifically, the question of generalizability. Since the development of sampling theory, more data is not necessarily better in terms of its explanatory power. A good illustration of this is the development of random sample opinion polls, in particular by the Gallup Organization in the USA in the early twentieth century, which led to greater accuracy in estimating. Gallup’s market share grew on the basis of better predicting election results on the basis of a random sample survey of several thousand voters compared to a survey of millions of Readers Digest readers in which no particular sampling strategy was in place. In the same way, at present, Twitter data is, at best, only representative of Twitter users (including fake accounts and performative issues) rather than a wider population. As such, depending on the research question, Twitter data can be either very useful or potentially misleading.

A tweet in 2013 by a researcher reads: ‘Twitter is of great value to historians as you can analyse and archive public reaction to events’. The question is which public’s reaction? Estimates suggest that over 7 million adults in the UK (14 per cent) have never used the Internet and this is particularly evident amongst those aged 65 years and over and for those on lower incomes (ONS, 2013; Ofcom, 2010). Only a proportion of Internet users are regular Twitter users (there are 15 million Twitter users in the UK (see Curtis, 2013; Wang, 2013) and hence tweets must be used with great care if bias is to be avoided. Conversely, for a study of young people’s attitudes towards drug use, Twitter postings might provide a useful resource for framing a more conventional study involving interviews or a questionnaire survey. Looking forward, it is reported that nearly half of teenagers in the UK have a smart phone and this figure continues to increase (Ofcom, 2011), highlighting that social media usage might become more prevalent in the future.

In this context, orthodox forms of social science data will continue to be important for research questions that rely on statistical inference, for in-depth studies requiring intensive qualitative techniques, and for topics and populations that new types of data and data generation processes do not cover (see Chapter 4). It is important to understand that, even in the age of data, there are still gaps in the evidence base and there is still a need for purpose-specific data and bespoke research design including for hard-to-reach groups. It is also notable that in a recent consultation with over 300 social science researchers in the UK, nearly three quarters thought that methods such as surveys would not be used any less in the future (Elliot et al., 2013).

Innovations in Digital Research Methods

Подняться наверх