Читать книгу Becoming a Data Head - Alex J. Gutman - Страница 52
Observational vs. Experimental Data
ОглавлениеData can be described as observational or experimental, depending on how it's collected.
Observational data is collected based on what's seen or heard by a person or computer passively observing some process.
Experimental data is collected following the scientific method using a prescribed methodology.
Most of the data in your company, and in the world, is observational. Examples of observational data include visits to a website, sales on a given date, and the number of emails you receive each day. Sometimes it's saved for a specific purpose; other times, for no purpose at all. We've also heard the phrase “found data” to reference this type of data; it's often created as byproducts from things like sales transactions, credit card payments, Twitter posts, or Facebook likes. In that sense, it's sitting in a database somewhere, waiting to be discovered and used for something. Sometimes observational data is collected because it's free and easy to collect. But it can be deliberately collected, as with customer surveys or political polls.
Experimental data, on the other hand, is not passively collected. It's collected deliberately and methodically to answer specific questions. For these reasons, experimental data represents the gold standard of data for statisticians and researchers. To collect experimental data, you must randomly assign a treatment to someone or something. Clinical drug trials present a common example that generates experimental data. Patients are randomly split into two groups—a treatment group and a control group—and the treatment group is given the drug while the control group is given a placebo. The random assignment of patients should balance out information not relevant to the study (age, socioeconomic status, weight, etc.) so that two groups are as similar as possible in every way, except for the application of the treatment. This allows researchers to isolate and measure the effect of the treatment, without having to worry about potential confounding features that might influence the outcome of the experiment.2
This setup can span across industries, from drug trials to marketing campaigns. In digital marketing, web designers frequently experiment on us by designing competing layouts or advertisements on web pages. When we shop online, a coin flip happens behind the scenes to determine if you are shown one of two advertisements, call them A and B. After several thousand unknowing guinea pigs visit the site, the web designers see which had led to more “click-throughs.” And because ads A and B were shown randomly, it's possible to determine which ad was better with respect to click-through rates because all other potential confounding features (time of day, type of web surfer, etc.) have been balanced out through randomization. You might hear experiments like this called “A/B tests” or “A/B experiments.”
We will talk more about why this discrepancy matters in Chapter 4, “Argue with the Data.”