Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis - Страница 28

2.1 DENSITIES AND DISTRIBUTIONS

Оглавление

When we speak of density as it relates to distributions in statistics, we are referring generally to theoretical distributions having area under their curves. There are numerous probability distributions or density functions. Empirical distributions, on the other hand, rarely go by the name of densities. They are in contrast “real” distributions of real empirical data. In some contexts, the identifier normal distribution may be given without reference as to whether one is referring to a density or to an empirical distribution. It is usually evident by the context of the situation which we are referring to. We survey only a few of the more popular densities and distributions in our discussion that follows.

The univariate normal density is given by:


where,

 μ is the population mean for the given density,

 σ2 is the population variance,

 π is a constant equal to approximately 3.14,

 e is a constant equal to approximately 2.71,

 xi is a given value of the independent variable, assumed to be a real number.

When μ is 0 and σ2 is 1, which implies that the standard deviation σ is also equal to 1 (i.e., ), the normal distribution is given a special name. It is called the standard normal distribution and can be written more compactly as:

(2.1)

Notice that in (2.1), where μ is now 0 and σ2 is now 1. Note as well that the density depends only on the absolute value of xi, because both xi and −xi give the same value ; the greater is xi in absolute value, the smaller the density at that point, because the constant e is raised to the negative power .

The standard normal distribution is the classic z‐distribution whose areas under the curve are given in the appendices of most statistics texts, and are more conveniently computed by software. An example of the standard normal is featured in Figure 2.1.

Scores in research often come in their own units, with distributions having means and variances different from 0 and 1. We can transform a score coming from a given distribution with mean μ and standard deviation σ by the familiar z‐score:


A z‐score is expressed in units of the standard normal distribution. For example, a z‐score of +1 denotes that the given raw score lay one standard deviation above the mean. A z‐score of −1 means that the given raw score lay one standard deviation below the mean. In some settings (such as school psychology), t‐scores are also useful, having a mean of 50 and standard deviation of 10. In most contexts, however, z‐scores dominate.


Figure 2.1 Standard normal distribution with shaded area from −1 to +1 standard deviations from the mean.

A classic example of the utility of z‐scores typically goes like this. Suppose two sections of a statistics course are being taught. John is a student in section A and Mary is a student in section B. On the final exam for the course, John receives a raw score of 80 out of 100 (i.e., 80%). Mary, on the other hand, earns a score of 70 out of 100 (i.e., 70%). At first glance, it may appear that John was more successful on his final exam. However, scores, considered absolutely, do not allow us a comparison of each student's score relative to their class distributions. For instance, if the mean in John's class was equal to 85% with a standard deviation of 2, this means that John's z‐score is:


Suppose that in Mary's class, the mean was equal to 65% also with a standard deviation of 2. Mary's z‐score is thus:


As we can see, relative to their particular distributions, Mary greatly outperformed John. Assuming each distribution is approximately normal, the density under the curve for a normal distribution with mean 0 and standard deviation of 1 at a score of 2.5 is:

> dnorm(2.5, 0, 1) [1] 0.017528

where dnorm is the density under the curve at 2.5. This is the value of f(x) at the score of 2.5. What then is the probability of scoring 2.5 or greater? To get the cumulative density up to 2.5, we compute:

> pnorm(2.5, 0, 1) [1] 0.9937903

The given area is represented in Figure 2.2. The area we are interested in is that at or above 2.5 (the area where the arrow is pointing). Since we know the area under the normal density is equal to 1, we can subtract pnorm(2.5, 0, 1) from 1:

> 1-pnorm(2.5, 0, 1) [1] 0.006209665


Figure 2.2 Shaded area under the standard normal distribution at a z‐score of up to 2.5 standard deviations.

We can see then the percentage of students scoring higher than Mary is in the margin of approximately 0.6% (i.e., multiply the proportion by 100). What proportion of students scored better than John in his class? Recall that his z‐score was equal to −2.5. Because we know the normal distribution is symmetric, we already know the area lying below −2.5 is the same as that lying above 2.5. This means that approximately 99.38% of students scored higher than John. Hence, we see that Mary drastically outperformed her colleague when we consider their scores relative to their classes. Be careful to note that in drawing these conclusions, we had to assume each score (that of John's and Mary's) came from a normal distribution. The mere fact that we transformed their raw scores to z‐scores in no way normalizes their raw distributions. Standardization standardizes, but it does not normalize.

One can also easily verify that approximately 68% of cases in a normal distribution lie within −1 and +1 standard deviations, while approximately 95% of cases lie within −2 and +2 standard deviations.

Applied Univariate, Bivariate, and Multivariate Statistics

Подняться наверх