Читать книгу Social Monitoring for Public Health - Michael J. Paul - Страница 11

Оглавление

CHAPTER 1

A New Source of Big Data

We can only see a short distance ahead, but we can see plenty there that needs to be done.

Alan Turing

Protecting Health, Saving Lives—Millions at a Time

Mission of the Johns Hopkins Bloomberg School of Public Health

You’ve likely seen a public health awareness campaign. Perhaps you’ve seen an advertisement from New York Health (the Department of Health and Mental Hygiene) on the subway warning about the dangers of synthetic drugs. Maybe you’ve seen a billboard in Baltimore warning that children with influenza should stay home from school. You may have seen a social media advertisement from Los Angeles’s “Break Up With Tobacco” campaign.

These are just some of the advertisements you may come across as part of public health awareness campaigns. These programs promote breast cancer screenings, testing for HIV, counseling for depression. Public health awareness campaigns are organized efforts to promote awareness of a health issue through the use of advertising, news and social media. There are hundreds of public health awareness campaigns organized every year, from well-known topics like “World Immunization Week,” “World AIDS Day” or “The Great American Smokeout,” to lesser known ones like “Global Handwashing Day” or the “National Bone Health Campaign.” All share the same goal: increase awareness in the hopes of combating a public health problem. A simple question: do these campaigns work?

For the moment, let’s consider another topic: vaccines. One of the great public health victories of the last century has been the development and dissemination of a wide range of vaccines. Thanks to vaccines, we’ve saved 5 million lives a year by eliminating smallpox. We’ve essentially eliminated many other diseases in the developed world, including diphtheria, whooping cough, measles and polio. In the United States, with the introduction of the first measles vaccine in 1962, the number of measles cases went from roughly half a million a year to only a handful by the end of the 20th century [Orenstein et al., 2004].

Yet this great public health victory is slowly being eroded with an uptick in cases over the past 5 years, including 667 measles infections in 2014.1 The return of the measles can be attributed to the growing vaccine refusal movement, which advocates against childhood vaccination, including the MMR vaccine (measles, mumps, and rubella). While many of us have heard the arguments of this movement against vaccines, why are they so effective with a small but significant fraction of parents? What reasons for skipping childhood vaccines are most convincing to different types of parents? How can physicians best address the concerns of parents?

One final topic. One of the leading causes of death in the United States is suicide. It’s a staggering figure, but over 40,000 Americans die by suicide each year.2 While our understanding of mental health disorders and factors that influence suicide has advanced tremendously, we remain especially poor at predicting who will follow through on a suicide attempt. We have been unable to identify unique predictors of suicide [Murphy, 1984]. Instead, we can identify a large at-risk population, a small percentage of which will actually attempt. Treating this group is generally effective for suicide prevention, but too many cases are missed since we cannot further focus our efforts. With such a large number of deaths each year, it is natural to ask: are there other unknown predictors of suicide we are missing?

These are just a few of the numerous questions for which we need better answers. Given the importance of these public health topics, issues that effect millions of lives, why don’t we have an answer? Why can’t we do the research necessary to provide actionable information?

Like all scientific pursuits, our ability to answer health questions depends on our access to relevant data. Without evidence from data, we can’t provide meaningful answers. What about “big data” research, the popular buzzword that encompasses all manner of new research efforts from physics to psychology, from linguistics to literature? Where might we find big data for public health?

A patient visits a doctor, and the interaction is documented in a clinical record. This interaction happens over a billion times in the United States each year.3 Surely this is enough to qualify as big data! These clinical records taken together have the potential to answer many important questions in medicine. Among the many goals of the Affordable Care Act passed by the United States Congress in 2010 was to digitize these records by incentivizing physicians to switch to electronic health records (EHRs). While the primary goal of the initiative was to reduce costs, an additional goal was to create a vast digital resource for health research [Adler-Milstein et al., 2014]. In large part, this has worked—the number of physician offices using EHRs has grown from around 50% in 2010 [Hsiao et al., 2012] to nearly 87% in 2015.4 Millions of digital records for patients throughout the United States have created opportunities for secondary use of electronic medical records [Safran et al., 2007] that can help answer questions about adverse drug events or measure the quality of health care delivery.

Yet even if we had full access to an EHR with a billion clinical visits each year, we may not be able to answer the questions for the three topics posed above. Increased awareness of a health topic doesn’t necessitate a clinical visit, parents come to believe in the dangers of vaccines outside of doctors’ offices, and the indicators that may suggest suicide are likely not being recorded by a health professional. Where can we find big data to answer these and many other public health questions? What digital records can be analyzed to support research on these topics?

Perhaps surprisingly, we already have a large source of patient information outside of the doctor’s office: user-generated content from the Web. This type of data includes, but is not limited to, blogs and microblogs, forum discussions, online reviews of products and services, and queries issued to search engines. But how does social media tell us anything about health? How can any of these online activities be used to answer important public health questions?

That is the topic of this book: how can large quantities of (often freely and publicly accessible) social media data inform public health? Public health—the area of medicine focused on the health of a population as a whole—depends on people’s behaviors: what people do in their everyday lives. Public health topics are often more about what happens outside than inside of a doctor’s office. Social media chronicles the lives of a population, recording their beliefs, attitudes, and behaviors on a wide variety of topics. Since health is an important part of people’s lives, social media reflects these health topics. By analyzing social media we can gain new insights into public health.

Who is this Book for?

Analyzing social media for public health requires two broad areas of expertise: computer science and public health. We hope that academics, researchers, and practitioners from both areas will find value in this book. Maybe you’re a data scientist who knows machine learning or natural language processing and wants to learn how to apply it to public health, or a health informaticist who wants to learn more about harnessing social media as an alternative data source, or a public health researcher who wants to learn about how new technologies offer new research possibilities. If so, you’re the intended audience for this book.

For computer scientists, we expect that Chapter 2 will provide a summary of the core principles of public health, and Chapter 5 will survey the areas of public health most suited for work in social monitoring. For public health experts, we hope that Chapters 3 and 4 will summarize the major types of social media data and relevant analytics. All readers should benefit from Chapter 6, which describes limitations and concerns of this type of research. Of course, we encourage you to read the entire book and share in our amazement over what has been achieved so far, and what new research may yield.

We expect that you’re coming into this field with one set of training and expertise, either on the computational side or public health side, and want to start learning more about the other area. This book is aimed at people in this stage, who want to know a little bit about the other side and how it can intersect with their own background. What this book will not do is make you an expert in a new area—this field is too broad and diverse to cover everything comprehensively in one book. For instance, this book won’t teach you enough to go off and build a machine learning system if you don’t already have that expertise—but it will introduce you to the common types of tools that are available and how they are used in social monitoring, which in turn will inform you about solutions available for your problems. And while this book can’t possibly do justice to decades of public health research in so many areas, it will at least make you aware of the major areas of public health, why they are important, and how social media can help. The goal is to equip you with enough knowledge to start thinking and having conversations about how you can benefit from, or contribute to, this rapidly growing field.

Why a Book? Why Now?

This new field of social monitoring for public health is quite new, with the earliest foundational papers barely ten years old. In fact, many of the data sources we discuss in this book haven’t even been around for that long. So why write a book now? While research in this area is fast paced, with new avenues of research yet unexplored, clear patterns have emerged to form a recognizable research landscape. We have some idea of what works, and what doesn’t work. What characteristics of public health questions are best suited for social media analysis, and which computational tools are most suited for answering these questions. Our goal is to provide a firm footing on which new researchers, as well as experienced experts, can base new research projects that build on what we’ve learned so far. We cannot possibly foresee all of the exciting new advances in this field, but we hope this book provides a basis on which these advances can start.

Another goal of this book is to promote rigor when working with social data. Methods for careful study design and validation that are common in traditional public health research have sometimes been ignored in research using social media, especially in earlier work, in part due to disciplinary differences in methodologies and a lack of community norms and expectations about how this kind of research should be done. The entire field came under scrutiny after it was noticed in a widely publicized study that Google Flu Trends, a popular digital flu monitoring system that we discuss throughout this book, had started performing inaccurately and severely misfired in a recent year [Lazer et al., 2014b]. Researchers have made a lot of progress in addressing the limitations of social data, but there are unresolved concerns about reliability, validation, and ethics with this kind of research. We raise these issues in this book, particularly in Chapter 6, and we hope our discussion of these issues will encourage more thoughtful work in this area.

The Scope of this Book

This book focuses on public health surveillance applications: tasks in which we can learn about public health topics by passively analyzing existing social media data. We term this social monitoring, a term that is inclusive of a wide range of online data sources, from new social media platforms, to more traditional web forums, and to search engine queries.

There is a growing and promising area of research that examines how social media and electronic interventions can change health behaviors and improve health outcomes. However, while related in spirit, the tools, topics, and approaches of interventions have significant differences with public health surveillance and social monitoring. This book focuses on the latter to ensure a more comprehensive presentation.

1 http://www.cdc.gov/mmwr/publications/index.html

2 http://www.cdc.gov/nchs/fastats/leading-causes-of-death.htm

3 http://www.cdc.gov/nchs/fastats/physician-visits.htm

4 http://www.cdc.gov/nchs/fastats/electronic-medical-records.htm

Social Monitoring for Public Health

Подняться наверх