Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics Using Python - Daniel J. Denis - Страница 20
1.6 Continuous vs. Discrete Variables
ОглавлениеA variable in mathematics is usually represented by a symbol such as x or y, etc. It is usually indexed by a subscript such as “i” to indicate that it represents the complete set of possible values that the symbol can take on. For example, for a variable xi, the “i” implies that the variable in question can take on any of the ith values in the given set of possibilities. For instance, for a sample of 10 individuals in a room, weight is a variable. It is a variable because not everyone in the room has the same weight. Hence, xi in this case implies that there are i = 1 to i = 10 values for weight, not all necessarily distinct from one another. Now, if everyone in the room had the same weight, such that it was a constant instead of a variable, then the “sub-i” index would not be required, at least not for describing this particular set of individuals.
There are two types of variables in mathematics that are requisite knowledge for understanding applied statistics, especially when it comes to generating statistical models and conducting moderate to advanced techniques. A variable is generally considered to be either discrete or continuous. A discrete variable, crudely defined, can take on values for which in between those values none are possible. For example, a variable with numbers 0, 1, 2 is discrete if there is no possibility of values between 0 and 1 or between 1 and 2. If, on the other hand, any values between 0 and 1 and 1 and 2 are theoretically possible, then the variable is no longer discrete. Rather, it is continuous. For a continuous variable then, any values are possible, even if only theoretically, as depicted in the following (below) graphical and more formal definition of continuity.
In this plot, continuity is said to exist at the given point f (x0) on the y-axis if for small changes on the x-axis, either above or below x0 (i.e. x0 + δ2 or x0 – δ2), we have an equally allowable small change on the y-axis (i.e. f(x0) + ε2 or f(x0) − ε2). These changes can be made extremely small, and actually as small as we wish them to be on a theoretical level. That is, changes in delta δ2 and epsilon ε2 can be made infinitesimally small right up to the point f(x0). Informally, continuity implies a sense of narrowing infinitely on a given point, such that we can make smaller and smaller, in fact infinitesimally smaller, divisions. Though we have only skimmed the formal definition of continuity here, this general idea is the most formal definition that currently exists for what continuity is mathematically. The point is to emphasize that true continuity is something that exists in theory only, and is, or at least can be, rigorously defined. For further details on this precise definition of continuity, see Bartle and Sherbert (2011), who discuss the foundations of calculus in much more detail. The foundations of calculus come generally under the topic of the branch of mathematics called real analysis. The essence of continuity does not lie in the mathematical definition of it, however, and has likely existed since early thought. Mathematics provided a precise definition for it so it could be used by, and communicated between, other mathematicians and scientists. Continuity, however, is first and foremost an idea. If you understand what we are getting at here, you will start to see mathematics in many cases as a bit more of a surface layer to deeper concepts, rather than simply the “mathematics” you may have associated earlier in your studies.
Now, in practice, one cannot measure to an infinite number of decimal places in practical research. One can never keep narrowing in on a given value by an infinite number of refined slices. Hence, while researchers may sometimes like to believe their variables have an underlying continuity to them, they are always far from being truly continuous. Only philosophically speaking (read: theoretical mathematics and its underlying philosophy) can a variable be continuous. So why is this brief discussion of continuity vs. discreteness important to the scientist? It is important since the starting point to any statistical analysis is in determining whether one’s variables are best considered discrete or continuous. Though there is much more flexibility in statistical models than the following may suggest, it is nonetheless useful to give a broad overview of where traditional statistical models use continuous vs. discrete variables. When using z-tests and t-tests for means, as well as most ANOVA-type models, it is understood that the dependent or response variable is continuous in nature, or at minimum, has sufficient distribution such that we may consider it to be of a general continuous nature. The independent variables in these models are discrete or categorical, indicating the different populations on the response.
As an example, suppose we wanted to measure the VO2-max of participants treated with a new COVID-19 medication vs. those not treated. VO2-max is essentially a measure of oxygen uptake during exercise of greater intensity (e.g. a Tour de France cyclist has better VO2-max than you and I). The VO2-max variable is the response, which is considered continuous, as a function of the independent variable treatment vs. control. For this, we are in the realm of z-tests or t-tests for means, or we could also perform an ANOVA on these variables. A regression analysis is also an option since we can operationalize the independent variable as a binary dummy-coded predictor. When we flip things around, such that the grouping variable is now the response and VO2-max is the predictor, we are in the realm of discriminant analysis or logistic regression on two groups. Here, we would like to predict group membership based on the continuous predictor. Notice that these models are answering different research questions, but at their core, it stands that they must have great technical similarity. As we will see as we progress, indeed they do. Within a t-test, for example, can be considered, at least on a conceptual level, to house a very primitive discriminant function! When doing a t-test, we don’t “see” the idea of a discriminant function simply because it is not a question we are asking. Nonetheless, it is there in concept underlying the technique. Once you understand the commonality of what underlies virtually all of these models, they will quickly lose their mystery. You will be less inclined to survey a decision-tree using statistical methods and see different procedures. What you will rather see is one larger model with special cases and peculiarities in each method.
Most statistical models, even if used for different research purposes and to answer different research questions, are quite technically similar at their core. One of the goals of learning and understanding statistical modeling is to grasp as quickly as possible this similarity so that you realize that differences in approach often have more to do with differences in research questions rather than differences in underlying technical details.