Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis - Страница 29
2.1.1 Plotting Normal Distributions
ОглавлениеWe can plot normal densities in R by simply requesting the lower and upper limit on the abscissa:
> x <- seq(from = -3, to = +3, length.out = 100) > plot(x, dnorm(x))
Distributions (and densities) of a single variable typically go by the name of univariate distributions to distinguish them from distributions of two (bivariate) or more variables (multivariate).
For example, we consider some of Galton's data on parent and child heights (the height of the children were measured when they were adults, not actual toddlers). Some of Galton's data appears below, retrieved from the HistData
package (Friendly, 2014) in R:
> install.packages(“HistData”) > library(HistData) > attach(Galton) > Galton parent child 1 70.5 61.7 2 68.5 61.7 3 65.5 61.7 4 64.5 61.7 5 64.0 61.7 6 67.5 62.2 7 67.5 62.2 8 67.5 62.2 9 66.5 62.2 10 66.5 62.2
We first install the package using the install.packages
function. The library
statement loads the package HistData
into R's search path. From there, we attach
the Galton data to insert the object (dataframe) into the search list. We generate a histogram of parent height:
> hist(parent, main = "Histogram of Parent Height")
One can overlay a normal density over an empirical plot to show how closely observed data match that of a theoretical normal distribution, as was done by Fisher in 1925 displaying a distribution of the heights of 1375 women (see Figure 2.3, taken from Classics in the History of Psychology1). R.A. Fisher is usually regarded as the father of modern statistics and among his greatest contributions was the publication of Statistical Methods for Research Workers in 1925 in which he discussed such topics as tests of significance, correlation coefficients, and the analysis of variance.
We can see that the normal density serves as a close, and very convenient, approximation to empirical data. Indeed, the normal density has figured prominent in the history of statistics largely because it serves as a useful model for many phenomena, and also because it provides a very convenient starting point for much work in theoretical statistics. Oftentimes the assumption of normality will be invoked in a derivation because it makes the problem simpler and easier to solve.