Читать книгу The Tao of Statistics - Dana K. Keller - Страница 11

3. Fodder—Data

Оглавление

 Observe

 Record

 More

Data are what we hear, see, smell, taste, touch, and more. Data can even be what we sense. Data can represent anything and everything that we can discriminate well enough to distinguish from something else. In short, if it can be perceived, it can be coded and used as data.

Data are the fodder of measurement, the backbone of statistics. Through a context, data become transformed into information. That context is a fusion of substantive knowledge of a topic with a methodological approach to gathering the data and the statistics used to derive meaning. A large part of the misuse of statistics is a nonreflective, uncritical crunching of numbers (i.e., data) to generate other, somewhat context-free, numbers. These uncritically examined results are then granted trusted status based on unfounded validity (discussed later in some detail). The result could be a poor decision or an ineffective policy, yet the statistics eventually are blamed. To become useful information, good data need to be placed in relevant contexts, with clear understandings of the strengths and weaknesses of the statistics and results.

This relevant context is the frame of reference from which relative meaning is derived. To know whether something is big or small, there needs to be a question of compared with what? A blue whale is small compared with the planet. An ant is huge compared with atoms. This same issue of needing a frame of reference, a comparison point, is important to most types of knowledge that might be acquired through statistics. Several types of frames of reference exist in statistics, as we will see.

One brief side note on the word data: Data is a plural word. Until very recently, the only proper grammatical use was as a plural noun, such as geese. Correctly, then, data are transformed through a context into information. A single piece of data is called a datum. With all that said, a recent English dictionary has recognized the common use of data as a singular noun and grants that use as a secondary preference.

Modern databases can contain dozens of gigabytes of information—an amount that is truly staggering to consider. High-speed office computers can need hours just to run through the data once. Census data are now available across the Internet. From course catalogs to recent golf scores to real-time stock prices, data surround us as oceans surround fish. Data are everywhere and generally too common even to notice.

Here is where the tao of statistics starts to take shape. Curiosity gives birth to questions that create the need for data that come from measures that people design to create meaning. We open our eyes with questions and perceive contextually rich data as probabilistic answers. Depending on how we ask our questions, how we look for and process the data, and how we place results in a meaningful context, the tone and the texture of the results will differ. Even with the most evenhanded intentions, unconscious biases can creep into even the best of research designs and processes. We will touch on this point several times in later chapters.

The high school principal has student records in an electronic form, meaning that his data collection will be inexpensive. Having electronic student records also means that the principal has access to a wide variety of data for his students. Throughout the years, the school system has collected demographic data on its students. The principal also has the funds for conducting a survey on his essentially captive audience. Although he is not new as a principal, the extent of the electronic data available has him a bit intimidated. When he used to have to get the data from students’ physical files in the office, his “research” questions were quite modest and constrained. Now that he can get hundreds of times the amount of information with only a few mouse clicks, he is somewhat more reflective, less impulsive, less likely to “just run the data” than he had thought that he would be.

The director of public health has all of the state’s Medicaid information available to her electronically, which has greatly expanded since the 2014 provisions of the ACA were implemented. She also is authorized to conduct a single, limited survey if it can be seamlessly appended to one that is currently required by the state. She has less information on each person than does the high school principal, but the information she has is for a much larger number of people. When she accesses the data warehouse, she always pulls highly detailed data (i.e., disaggregated). She knows that she can always collapse (i.e., aggregate) it later, but not the reverse.

Having data for large numbers of people and access to computers allows her to address important public health questions that would have gone unanswered not many years ago. Just as the principal has access to far more information than he used to have, the director of public health had that increase in access several years earlier. She is used to the amount and has started to understand the data’s strengths and limitations.

Both the principal and the director of public health face the issue of data privacy for the individuals for whom they have data, although being in public health, the director is under far more scrutiny for confidentiality than the principal is due to ever more challenging provisions of the Health Insurance Portability and Accountability Act (HIPAA). Well-established protocols exist for the proper handling of these issues for the principal, but the director finds herself challenged with a need to update her protocols almost annually. Remember, data privacy has ethical and legal standing. Expected processes and procedures exist and are also regularly updated for research involving people and their data. Keep the importance of this issue in mind when using or when reporting human subjects’ data. Ignorance is not a valid excuse, and the penalties for knowingly, or even unknowingly, releasing personal health information or personally identifiable information can be severe.


The Tao of Statistics

Подняться наверх