Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics Using Python - Daniel J. Denis - Страница 22

1.7 Using Abstract Systems to Describe Physical Phenomena: Understanding Numerical vs. Physical Differences

One of the key starting points to using and applying statistics to real phenomena is to understand and appreciate the difference between the tool you are using and the “stuff” you are applying it to. They are often not one-to-one. Simply because we represent a difference numerically does not imply that the difference exists on a physical level. Making this distinction is extremely important, especially in today’s age where everything is about “data” and hence it is simply taken for granted that what we choose to measure is “real” and our measuring tool and system can capture such differences. In some cases, it can, but in others, automatically equating numerical differences with actual substantive differences is foolish.

As an example, suppose I developed a questionnaire to assess your degree of pizza preference. Suppose I scaled the questionnaire from 0 to 10, where “0” indicates a dislike for pizza and “10” indicates a strong preference. Suppose you circle “7” as your choice and your friend circles “5.” Does that mean you prefer pizza more than your friend? Not necessarily. Simply because you have selected a higher number may not mean you enjoy pizza more. It may simply mean you selected a higher number. The measured distance between 5 and 7 may not equate to an actual difference in pizza preference.

Scales of measurement (Stevens, 1946) have been developed to try to highlight these and other issues, but, as we will see, they are far from adequate in solving the measurement problem. Everything we measure is based on a scale. We attempt to capture the phenomena and assign a numerical measurement to it. A nominal scale is one in which labels are simply given to values of the variable. For example, “short” vs. “tall” when measuring height would represent a variable measurable on a nominal scale. However, we can do better. Since “tall” presumably contains more height than “short,” we can say tall > short (i.e. tall is greater than short) and assign the variable measurable on an ordinal scale. The next level of measurement is that of an interval scale in which distances between values on the scale are presumed to be equal. For example, the difference in the number of coins in my pocket from 0 to 5 is the same distance between the number of coins from 5 to 10. If the scale has an absolute zero point, meaning that a measurement of “0” actually means “zero coins,” then the scale takes on the extra property of being a ratio scale.

A lot has been made of scales of measurement historically. Their importance is probably overstated in the literature. Where they are especially useful is in helping the researcher understand and better appreciate that simply because they obtain a number to represent something, or a difference in numbers to represent a difference in that “something,” it does not necessarily mean a precise correspondence between numbers and reality has occurred. In many social sciences especially, assuming that such a correspondence exists is a very unrealistic idea. True, that a difference in weight from 100 pounds to 150 pounds represents the same distance as between 150 to 200 pounds, both numerically and physically, for many social variables this correspondence is likely to simply not exist, or, at a minimum, be tremendously difficult to justify. What is more, associating change on an x-axis with change on a y-axis can be done quite easily numerically, but whether it means something physically is an entirely different question.

As an example, in chemistry and nutrition, the oxidative stability of an oil is a measure of how quickly the oil starts to degrade when heated and exposed to light. Presumably, consumers would prefer, on this basis, an oil with more oxidative stability than less (frying at very high temperatures can apparently degrade the oil). In a recent study (Guillaume and Ravetti, 2018), it was found that the oxidative stability for olive oil was higher than the oxidative ability of, say, sunflower oil. Hence, one might be tempted to select olive oil instead of sunflower oil on this basis. However, does the difference in oxidative values translate into anything meaningful, or is it simply a measure of numerical difference that for all purposes is somewhat academic? Olive oil may be more stable, but is that “more” amount really worth not using sunflower oil if you indeed prefer sunflower? It’s very easy when analyzing and interpreting data to fall into the ranking trap, where simply because one element ranks higher than another falsely implies a pragmatic or even meaningful increase on a physical level. The headline may be that “Olive oil is #1,” but is #10 practically pretty much the same anyway, or is the utility of the difference in oils enough to influence one’s decision? The ranking differences may be inconsequential to the decision. For example, if I told you your primary doctor ranked 100th out of 100 individuals graduating out of his or her graduating class, you may at first assume your doctor is not very good. However, the differences between ranking quantities may be extremely slight or so small when translated on a practical level to not matter at all or, at minimum, be negligible. Differences may even be due to measurement error and hence not exist beyond chance. Likewise, the pilot of your aircraft may be virtually as competent as the best pilot out there, but still ranks lower on an imperfect measure. Do not simply assume that the numerical change in what is being assessed represents a meaningful difference when applied to change on a scientific (as opposed to numerical) level. Numerical differences do not necessarily equate to equivalent physical changes. Instead of being eager to include a bunch of measures into your thesis, dissertation or publication, a good idea might be to work on, and deeply validate, what is being measured in the first place. Can something like self-esteem be measured? That is not a small or inconsequential question. You can pick up an existing questionnaire that purports to measure it or you can first critically evaluate whether it is something measurable at all. Regardless of whether we can correlate it with an existing measure does not provide fundamental validity. It only provides statistical validity. The ultimate psychometric issue may still remain. For instance, how will you convince your committee that what you have measured is actually a good measure of self-esteem?

Applied Univariate, Bivariate, and Multivariate Statistics Using Python

Подняться наверх