Читать книгу Do No Harm - Matthew Webster - Страница 50

Big Data

Оглавление

Big data has been around since at least 1937, on a project that Franklin Roosevelt's administration had in relation to the Social Security Act, whose goal was to keep track of 26 million Americans. IBM developed the punch card to keep track of the process.20 It wouldn't be until 2005 when the term “big data” would be coined by Roger Mougalas.21 Big data is exactly what you might think it is—very large sets of data. From a hyperconnected perspective, it ties into many different data sets. The more data from more sources, as long as it is accurate, the better discoveries that can be made as a result of the assessment. It is generally accepted that there are four Vs that go along with big data—volume, velocity, variety, and veracity. All are critically important to the accuracy of information and helping advertisers more accurately target individuals.

Volume is really critical to big data because the more information you have, the better chance you have of having a particular piece of data. Think about it from a COVID-19 perspective. If you had only two people and those two people died as a result of COVID-19, you might come to the erroneous conclusion that COVID-19 was 100% fatal. While that example is absurd, having a large volume of data helps to weed out the statistical improbabilities that a small volume of data might indicate. The larger the data set, the more reliable that data tends to be.

Velocity, generally speaking, centers around the analysis of streaming data. The more sources of information—the more sensors that are on a person (patient or not), the better overall picture the data brokers or hospitals concerning the person or patient. The more real time the data is, the more useful that data can be to an organization because near-real-time judgment calls can be made. The store-and-forward technique discussed in the previous chapter means that decision making has a lag and may not be as relevant depending on the circumstances. When we talk about the instantaneity of the world, this is what people are talking about.

Variety is also key from a big data perspective. Having a single type of data source is good, but having more data sources is even better. Let us use COVID-19 data as an example. If all we had was the data on young children, our view on the disease would be different. We know that it disproportionately affects the elderly in terms of severity. The larger the variety of sources, the better analysis we have overall.

Veracity pertains to the accuracy of data. If our data set was very diverse when analyzing COVID-19, but it was wildly inaccurate to the point where it looked like everyone was affected the way the elderly are, we probably would be taking very different actions. Having accurate data really matters. If any one of the four Vs fails, we are provided with less than optimal information.

Today we have data scientists who work with these large volumes of data to extract patterns and knowledge. The buzzword for what they do is called “data mining.” While technically inaccurate, it is the most common and easiest way to explain to a general audience what data scientists do. In reality, data mining is an interdisciplinary field that combines both statistics and computer science. While there are a host of other processes that go into what they do, sifting through that data to create accurate data models and trends is crucial. Visualization of that data is the ultimate goal because they need to communicate to others what trends they are discovering.

Big data has a tremendous number of advantages for things other than healthcare data. Cost savings alone is a very strong motivator. It is used to identify better ways of doing business. Quick, actionable information is very critical to the heart of many businesses. From a marketing perspective, it can help to understand market conditions and the sentiment of people online. Toward this end, companies can better target marketing strategies to help boost customer acquisition and retention. All of these can be used to fuel better product innovations.22 These are just the beginning, however. Almost every industry is reaping the rewards of big data. In 2017, Forbes identified that 53% of companies are adopting big data analytics.23

Healthcare tends to be a little less mature in its data analysis techniques, but richer in its data sources, especially when considering IoMT devices.24 Now many of those data-rich healthcare companies are eager to utilize that data, not only to improve their own practices and knowledge, but to sell. In fact, all of the data sources that IoMT brings to the table have seen an explosive 878% growth since 2016.25 With 80% of healthcare executives investing in big data, big data is just not going away without additional influence. In fact, there is a hefty supply of big data—some of which has been in place for decades.

QuintilesIMS, a company dedicated to improving patient outcomes through the analysis of data, was created in the 1950s and now collects data on most prescription sales in the United States and many other countries.26 Health insurance companies are also involved in selling this data. Blue Health Intelligence, part of Blue Cross Blue Shield, has data on at least 165 million people dating back to 2005 and helps to supply QuintilesIMS. Big data also pulls data from IoMT, EHRs, providers, patient registries, private players, government health plan claims, and pharmacy claims.27

Today, anonymized health data is being bought, sold, and used by large corporations to get more information and improve their products and/or services. What is concerning about the data brokers is that they are able to add disparate pieces of information to the anonymized data collection that allow big data companies to determine an individual's identity.28 Anonymizing the data in the fashion that HIPAA requires is simply insufficient in today's world.

What is a concerning is that there are no federal laws against re-identification of information.29 From a HIPAA standpoint, once it is anonymized, it is no longer HIPAA data. If the data is de-anonymized, it has the same structure as HIPAA data, but it is no longer has the HIPAA compliance requirements—even if that data has all the same elements. For many this is a concerning loophole. Many organizations, even if they legally anonymize the data, are, in effect, giving out HIPAA data. They follow the letter of the law, but not the spirit of the law. The law was intended to keep people's data private, but with modern data mining techniques that data is no longer protected. What is worse, that data may be bought, sold, and traded without consent or even anyone's knowledge that this is going on.

Brokers now sell very detailed information about population segments, including name, address, phone number, email address, and such information as people with cancer, erectile disfunction, bladder control, STDs, etc. Not all brokers provide information that is this specific, but it does allow for targeted advertising campaigns.30

It will be interesting to see how this plays out as more and more people become aware of the roles of data brokers. Many of these data brokers were unknown until Vermont created a law to govern them in 2018.31 Since then, we have started to discover many of the companies that are in the data broker market. By March of 2019, 121 companies were identified—obviously not all of them are interested in HIPAA data. In terms of being aware of where your data is, it is much more challenging than ever before because that data could literally be anywhere on planet Earth.

If only anonymized data were the only concern. Kevin O'Reilly, a news reporter for the American Medical Association, reported about Project Nightingale, which puts patient data from the 2,600 hospitals that are part of Ascension health into the hands of Google. Google's intent is to use artificial intelligence on the data. In fact, prior to that it spent $2.1 billion to acquire healthcare data on its users.32 Of course, data provided in this form is HIPAA data, and the requirements for HIPAA must be followed. From a privacy standpoint, providing that data is done without informed consent, people do not have control of their own data.33 Given some of the flagrant violations of data usage by Facebook and other technical giants, there is understandably some concerns related to that data.

In the previous section we talked about applications that have medical that are not validated by science or contain false information. Given that volume is important, the more sources of data presented, the better. If one of those applications sends erroneous data, the data stream may be polluted. Remember that volume, velocity, and variety are extremely important from a sales standpoint. Veracity, to an extent, can be validated by stating that the data is top notch. Undoubtably, when it comes to the buying data, some companies will do a better job validating the data prior to purchase. Given the volume of potential data sources, this can be a daunting task for many organizations.

Another challenge is that oftentimes this means sharing data globally, which means data can literally be anywhere. Health data can physically be located in any country. Although frowned upon, there is no law requiring U.S. health data to remain in the United States. Oftentimes, depending on the platform, that is exactly what happens. Some data brokers, not all, send data throughout the planet to ensure that, in case of an emergency, it is backed up. Unless a thorough investigation is performed about the platform and someone thinks to ask that question, the hospital or doctor's office may be blissfully unaware that the data is being spread throughout the world.

In the end, big data is about sharing of data and aggregating the right data sets in the right way. That data may or may not be HIPAA data, but may have all the markers of HIPAA data. The data may be collected from applications and shared in ways that we, as consumers, may not be aware of. It also holds the promise of expanding our scientific understanding and taking us into future directions we have only begun to imagine today. Big data is not about the data itself. There are goals and objectives from many different angles that make it important. There are also tools that data scientists use to sort through the volumes of data.

Do No Harm

Подняться наверх