Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis - Страница 14

1.2 WHAT IS A “MODEL”?

Оглавление

The word “model” is perhaps the most popular word featured in textbooks, tutorials, and lectures having anything to do with the application of quantitative methods. Attempting to define just what is a model in statistics can be a bit challenging. We discuss the concept by referring to Everitt's definition:

A description of the assumed structure of a set of observations that can range from a fairly imprecise verbal account to, more usually, a formalized mathematical expression of the process assumed to have generated the observed data. The purpose of such a description is to aid in understanding the data.

(Everitt, 2002, p. 247)

Models, are, essentially, and perhaps somewhat crudely, equations. They are equations fit to data that attempt to account for how the data came about or were generated in the first place. For example, if for every hour a student studied for an exam corresponded to exactly a 1‐point increase in a student's grade, the model that would best explain how this data was generated would be a linear model. Even if the relationship between hours studied and student grade was not perfect, a perfect line might still be the “best” summary. Models are often used to account for messy or imperfect data.


Figure 1.3 Hebbian Yerkes–Dodson performance–arousal curve.

Source: Diamond et al. (2007). Licensed under CC by 3.0.

Another example of a model is the classic Hebbian version of the Yerkes–Dodson curve expressing the relationship between performance and arousal, depicted in Figure 1.3.

The curve is an inverted “U” shape (an approximate parabola) that provides a useful model relating these two attributes (i.e., performance and arousal). If one exhibits very low arousal, performance will be minimal. If one exhibits a very high degree of arousal, performance will likely also suffer. However, if one exhibits a moderate range of arousal, performance will likely be optimal. The model in this case, as in most cases, does not account for all the data one might collect. The extent to which it accounts for most of the data is the extent to which the model may be, in general, deemed “useful.” The use of a model is also enhanced if it can make accurate predictions of future behavior.

As another example of a model, consider the number of O‐ring incidents on NASA's space shuttle (the fleet is officially, and sadly, retired now) as a function of temperature (Figure 1.4). At very low or high temperatures, the number of incidents appears to be elevated. A square function seems to adequately model the relationship. Does it account for all points? No. But nonetheless, it provides a fairly good summary of the available data. Some have argued that had NASA had such a model (i.e., essentially the line joining the points) available before Challenger was launched on January 28, 1986, the launch may have been delayed and the shuttle and crew saved from disaster.2 We feature this data in our chapter on logistic regression.


Figure 1.4 Number of O‐ring incidents on boosters as a function of temperature.

Why did George Box say that all models are wrong, some are useful? The reason is that even if we obtain a perfectly fitting model, there is nothing to say that this is the only model that will account for the observed data. Some, such as Fox (1997), even encourage divorcing statistical modeling as accounting for deterministic processes. In discussing the determinants of one's income, for instance, Fox remarks:

I believe that a statistical model cannot, and is not literally meant to, capture the social process by which incomes are “determined” … No regression model, not even one including a residual, can reproduce this process … The unfortunate tendency to reify statistical models – to forget that they are descriptive summaries, not literal accounts of social processes – can only serve to discredit quantitative data analysis in the social sciences. (p. 5)

Indeed, psychological theory, for instance, has advanced numerous models of behavior just as biological theory has advanced numerous theories of human functioning. Two or more competing models may each explain observed data quite well. Sometimes, and unfortunately, the model we adopt may have more to do with our sociological (and even political) preferences than anything to do with whether one is more “correct” than the other. Science (and mathematics, for that matter) is a human activity, and often theories that are deemed valid or true have much to do with the spirit of the times (the so‐called Zeitgeist) and what the scientific community will actually accept and tolerate as being true.3 Of course, this is not true in all circumstances, but you should be aware of the factors that make theories popular, especially in fields such as social science where “hard evidence” can be difficult to come by. The reason the experiment is often considered the “gold standard” for evidence is because it often (but not always) helps us narrow down narratives to a few compelling possibilities. In strictly correlational research, isolating the correct narrative can be exceedingly difficult or nearly impossible, despite which narrative we wish upon our data the most. Good science requires a very critical eye. Whether the theory is that of the Big Bang, the determinants of cancer, or theories of bystander intervention, all of these are narratives to help account for observed data.

Applied Univariate, Bivariate, and Multivariate Statistics

Подняться наверх