Читать книгу An Introduction to Text Mining - Gabe Ignatow - Страница 41

Privacy

In 1996, the Internet researchers Sudweeks and Rafaeli argued that social scientists should treat “public discourse on computer-mediated communication as just that: public” and that, therefore, “such study is more akin to the study of tombstone epitaphs, graffiti, or letters to the editor. Personal? Yes. Private? No” (p. 121). Sudweeks and Rafaeli’s position may be convenient for the practice of research, but it has proved not to always be sufficient for research using data from contemporary social media platforms. In many cases there is a lack of consensus about whether people who have posted messages on the Internet should be considered “participants” in research or whether research that uses their messages as data should be viewed as involving the analysis of secondary data that already existed in the public domain.

Some researchers have argued that publicly available data carry no expectation of privacy, while many researchers who have carried out studies of online messages (e.g., Attard & Coulson, 2012; Coulson, Buchanan, & Aubeeluck, 2007) have deemed the data to be in the public domain yet have sought IRB approval from within their own institutions anyway.

A number of Internet researchers have concluded that where data can be accessed without site membership, such data can be considered as public domain (Attard & Coulson, 2012; Haigh & Jones, 2005, 2007). Therefore, if data can be accessed by anyone, without website registration, it would be reasonable to consider the data to be within the public domain of the Internet.

There appears to be agreement that websites that require registration and password-protected data should be considered private domain (Haigh & Jones, 2005) because users posting in password-protected websites are likely to have expectations of privacy. Websites that require registration are often copyrighted, which raises a legal issue of ownership of the data and whether posts and messages can be legally and ethically used for research purposes.

The Cornell–Facebook study is widely seen as having invaded the privacy of Facebook users. Some websites and social media platforms have privacy policies that set expectations for users’ privacy, and these can be used by researchers as guidelines for whether it is ethical to treat the site’s data as in the public domain or else whether informed consent may be required. But in most cases, such guidelines are insufficient and at best provide minimum standards that may not meet the standards set by universities’ IRBs. For example, the European Union has stringent privacy laws that may have been violated by the Facebook study. Adding to the difficulties for researchers attempting to follow privacy laws, it is unclear whether laws governing data protection are the laws in the jurisdiction where research participants reside, the jurisdiction where the researchers reside, the jurisdiction of the IRB, the location of the server, the location where the data are analyzed, or some combination of these.

Because acquiring users’ textual data from online sources is a passive method of information gathering that generally involves no interaction with the individual about whom data are being collected, for the most part text mining research is not as ethically challenging as experiments and other methods that involve recruiting participants and that may involve deception. Nevertheless, universities’ IRBs are increasingly requiring participant consent (see the next section) in cases where users can reasonably expect that their online discussions will remain private. At the very least, in almost all cases social scientists are required to anonymize (use pseudonyms for) users’ user names and full names.

It has also been suggested that although publicly available online interactions exist within the public domain, site members may view their online interactions as private. Hair and Clark (2007) have warned researchers that members of online communities often have no expectation that their discussions are being observed and may not be accepting of being observed.

In order for text mining research using user-generated data to progress, researchers must make several determinations. First, they must use all available evidence to determine whether the data should be considered to be in the public or private domain. Second, if data are in the public domain, the researcher must determine whether users have a reasonable expectation of privacy. In order to make these determinations, researchers should note whether websites, apps, and other platforms require member registration and whether they include privacy policies that specify users’ privacy expectations.

Подняться наверх