Читать книгу Probability with R - Jane M. Horgan - Страница 53

3.2 HISTOGRAMS

Оглавление

A histogram is a graphical display of frequencies in categories of a variable and is the traditional way of examining the “shape” of the data.

hist(prog1, xlab ="Marks (%)", main = "Programming Semester 1")

yields Fig. 3.8.


Figure 3.8 A Histogram with Default Breaks

As we can see from Fig. 3.8, hist gives the count of the observations that fall within the categories or “bins” as they are sometimes called. R chooses a “suitable” number of categories, unless otherwise specified. Alternatively, breaks may be used as an argument in hist to determine the number of categories. For example, to get five categories of equal width, you need to include breaks = 5 as an argument.

hist(prog1, xlab = "Marks (%)", main = "Programming Semester 1", breaks = 5)

gives Fig. 3.9


Figure 3.9 A Histogram with Five Breaks of Equal Width

Recall that par can be used to represent all the subjects in one diagram. Type

par (mfrow = c(2,2)) hist(arch1, xlab = "Architecture", main = "Semester 1", ylim = c(0, 35)) hist(arch2, xlab = "Architecture", main = "Semester 2", ylim = c(0, 35)) hist(prog1, xlab = "Programming", main = " ", ylim = c(0, 35)) hist(prog2, xlab = "Programming", main = " ", ylim = c(0, 35))

to get Fig. 3.10. The ylim = c(0, 35) ensures that the ‐axis is the same scale for all the four subjects.


Figure 3.10 Histogram of Each Subject in Each Semester

Up until now, we have invoked the default parameters of the histogram, notably the bin widths are equal and the frequency in each bin is calculated. These parameters may be changed as appropriate. For example, you may want to specify the bin break‐points to represent the failures and the various classes of passes and honors.

bins <- c(0, 40, 60, 80, 100)hist(prog1, xlab ="Marks (%)", main = "Programming Semester 1", breaks = bins)

yields Fig. 3.11.


Figure 3.11 A Histogram with Breaks of a Specified Width

In Fig. 3.11, observe that the ‐axis now represents the density. When the bins are not of equal length, R returns a normalized histogram, so that its total area is equal to one.

To get a histogram of percentages, write in R

h <- hist(prog1, plot = FALSE, breaks = 5) #this postpones the plot display h$density <- h$counts/sum(h$counts)*100 #this calculates percentages plot(h, xlab = "Marks (%)", freq = FALSE, ylab = "Percentage", main = "Programming Semester 1")

The output is given in Fig. 3.12. The # allows for a comment. Anything written after # is ignored.


Figure 3.12 Histogram with Percentages

Probability with R

Подняться наверх