Читать книгу Probability with R - Jane M. Horgan - Страница 26

Example 1.2 Reading data from a file into a data frame

Оглавление

The examination results for a class of 119 students pursuing a computing degree are given on our companion website (www.wiley.com/go/Horgan/probabilitywithr2e) as a text file called . The complete data set is also given in Appendix A.

gender arch1 prog1 arch2 prog2 m 99 98 83 94 m NA NA 86 77 m 97 97 92 93 m 99 97 95 96 m 89 92 86 94 m 91 97 91 97 m 100 88 96 85 f 86 82 89 87 m 89 88 65 84 m 85 90 83 85 m 50 91 84 93 m 96 71 56 83 f 98 80 81 94 m 96 76 59 84 ....

The first row of the file contains the headings, gender and arch1, prog1, arch2, prog2, which are abbreviations for Architecture and Programming from Semester 1 and Semester 2, respectively. The remaining rows are the marks (%) obtained for each student. NA denotes that the marks are not available in this particular case.

The construct for reading this type of data into a data frame is read.table.

results <- read.table ("F:/data/results.txt", header = T)

assuming that your data file is stored in the folder on the F drive. This command causes the data to be assigned to a data frame called results. Here header = T or equivalently header = TRUE specifies that the first line is a header, in this case containing the names of the variables. Notice that the forward slash () is used in the filename, not the backslash (\) which would be expected in the windows environment. The backslash has itself a meaning within R, and cannot be used in this context: / or \\ are used instead. Thus, we could have written

results <- read.table ("F:\\data\\results.txt", header = TRUE)

with the same effect.

The contents of the file results may be listed on screen by typing

results

which gives

gender arch1 prog1 arch2 prog2 1 m 99 98 83 94 2 m NA NA 86 77 3 m 97 97 92 93 4 m 99 97 95 96 5 m 89 92 86 94 6 m 91 97 91 97 7 m 100 88 96 85 8 f 86 82 89 87 9 m 89 88 65 84 10 m 85 90 83 85 11 m 50 91 84 93 12 m 96 71 56 83 13 f 98 80 81 94 14 m 96 76 59 84 ....

Notice that the gender variable is a factor with two levels “f” and “m,”while the remaining four variables are numeric. The figures in the first column on the left are the row numbers, and allows us to access individual elements in the data frame.

While we could list the entire data frame on the screen, this is inconvenient for all but the smallest data sets. R provides facilities for listing the first few rows and the last few rows.

head(results, n = 4)

gives the first four rows of the data set.

gender arch1 prog1 arch2 prog2 1 m 99 98 83 94 2 m NA NA 86 77 3 m 97 97 92 93 4 m 99 97 95 96

and

tail(results, n = 4)

gives the last four lines of the data set.

gender arch1 prog1 arch2 prog2 116 m 16 27 25 7 117 m 73 51 48 23 118 m 56 54 49 25 119 m 46 64 13 19

The convention for accessing the column variables is to use the name of the data frame followed by the name of the relevant column. For example,

results$arch1[5]

returns

[1] 89

which is the fifth observation in the column labeled arch1.

Usually, when a new data frame is created, the following two commands are issued.

attach(results) names(results)

which give

[1] "gender" "arch1" "prog1" "arch2" "prog2"

indicating that the column variables can be accessed without the prefix results. For example,

arch1[5]

gives

[1] 89

The command read.table assumes that the data in the text file are separated by spaces. Other forms include:

read.csv, used when the data points are separated by commas;

read.csv2, used when the data are separated by semicolons.

It is also possible to enter data into a spreadsheet and store it in a data frame, by writing

newdata <- data.frame() fix(newdata)

which brings up a blank spreadsheet called newdata, and the user may then enter the variable labels and the variable values.

Right click and close creates a data frame newdata in which the new information is stored.

If you subsequently need to amend or add to this data frame write

fix(newdata)

which retrieves the spreadsheet with the data. You can then edit the data as required. Right click and close saves the amended data frame.

Probability with R

Подняться наверх