Читать книгу Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP - Bhisham C. Gupta, Irwin Guttman - Страница 61

Definition 2.4.2

Оглавление

Suppose that we have a set of values, obtained by measuring a certain variable, say times. Then, the median of these data, say , is the value of the variable that satisfies the following two conditions:

1 at most 50% of the values in the set are less than , and

2 at most 50% of the values in the set are greater than .

We now turn our attention to the stem‐and‐leaf plot invented by John Tukey. This plot is a graphical tool used to display quantitative data. Each data value is split into two parts, the part with leading digits is called the stem, and the rest is called the leaf. Thus, for example, the data value 5.15 is divided in two parts with 5 for a stem and 15 for a leaf.

A stem‐and‐leaf plot is a powerful tool used to summarize quantitative data. The stem‐and‐leaf plot has numerous advantages over both the frequency distribution table and the frequency histogram. One major advantage of the stem‐and‐leaf plot over the frequency distribution table is that from a frequency distribution table, we cannot retrieve the original data, whereas from a stem‐and‐leaf plot, we can easily retrieve the data in its original form. In other words, if we use the information from a stem‐and‐leaf plot, there is no loss of information, but this is not true of the frequency distribution table. We illustrate the construction of the stem‐and‐leaf plot with the following example.

Example 2.4.7 (Spare parts supply) A manufacturing company has been awarded a huge contract by the Defense Department to supply spare parts. In order to provide these parts on schedule, the company needs to hire a large number of new workers. To estimate how many workers to hire, representatives of the Human Resources Department decided to take a random sample of 80 workers and find the number of parts each worker produces per week. The data collected is given in Table 2.4.5. Prepare a stem‐and‐leaf diagram for these data.

Table 2.4.5 Number of parts produced per week by each worker.

73 70 68 79 84 85 77 75 61 69 74 80 83 82 86 87 78 81 68 71
74 73 69 68 87 85 86 87 89 90 92 71 93 67 66 65 68 73 72 83
76 74 89 86 91 92 65 64 62 67 63 69 73 69 71 76 77 84 83 85
81 87 93 92 81 80 70 63 65 62 69 74 76 83 85 91 89 90 85 82

Solution: The stem‐and‐leaf plot for the data in Table 2.4.5 is as shown in Figure 2.4.13.

The first column in Figure 2.4.13 gives the cumulative frequency starting from the top and from the bottom of the column but ending at the stem that lies before the stem containing the median. The number in parentheses indicates the stem that contains the median value of the data, and the frequency of that stem.


Figure 2.4.13 Stem‐and‐leaf plot for the data in Example 2.4.7 with increment 10.


Figure 2.4.14 Stem‐and‐leaf plot for the data in Example 2.4.7 with increment 5.

Carefully examining the stem‐and‐leaf plot in Figure 2.4.13, we note that the data are clustered together; each stem has many leaves. This situation is the same as when we have too few classes in a frequency distribution table. Thus having too many leaves on the stems makes the stem‐and‐leaf diagram less informative. This problem can be resolved by splitting each stem into two, five, or more stems depending on the size of the data. Figure 2.4.14 shows a stem‐and‐leaf plot when we split each stem into two stems.

The first column in the above stem‐and‐leaf plots counts from the top, and at the bottom is the number of workers who have produced up to and beyond certain number of parts. For example, in Figure 2.4.14, the entry in the third row from the top indicates that 35 workers produced fewer than 75 parts/wk, whereas the entry in the third row from the bottom indicates that 37 workers produced at least 80 parts/wk. The number within parentheses gives the number of observations on that stem and indicates that the middle value or the median of the data falls on that stem. Furthermore, the stem‐and‐leaf plots in Figure 2.4.14 is more informative than Figure 2.4.13. For example, the stem‐and‐leaf plot in Figure 2.4.14 clearly indicates that the data is bimodal, whereas Figure 2.4.13 fails to provide this information. By rotating the stem‐and‐leaf plot counterclockwise through 90°, we see that the plot can serve the same purpose as a histogram, with stems as classes or bins, leaves as class frequencies, and columns of leaves as rectangles or bars. Unlike the frequency distribution table and histogram, the stem‐and‐leaf plot can be used to answer questions such as, “What percentage of workers produced between 75 and 83 parts (inclusive)?” Using the stem‐and‐leaf plot, we readily see that 20 of 80, or 25% of the workers, produced between 75 and 83 parts (inclusive). However, using the frequency distribution table, this question cannot be answered, since the interval 80–85 cannot be broken down to get the number of workers producing between 80 and 83 per week. It is clear that we can easily retrieve the original data from the stem‐and‐leaf plot.

Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP

Подняться наверх