Читать книгу The Big R-Book - Philippe J. S. De Brouwer - Страница 245
Hint
ОглавлениеBe aware of the saying “They have to recognize that great responsibility is an inevitable consequence of great power.”10 It is not because you can do something that you must. Indeed, you can use a numeric column names in a tibble and the following is valid code.
tb <- tibble(`1` = 1:3, `2` = sin(`1`), `1`*pi, 1*pi) tb ## # A tibble: 3 x 4 ## `1` `2` `\`1\` * pi` `1 * pi` ## <int> <dbl> <dbl> <dbl> ## 1 1 0.841 3.14 3.14 ## 2 2 0.909 6.28 3.14 ## 3 3 0.141 9.42 3.14
However, is this good practice?
So, why use a tibble instead of a data frame?
1 It will do less things (such as changing strings into factors, creating row names, change names of variables, no partial matching, but a warning message when you try to access a column that does not exist, etc.).
2 A tibble will report more errors instead of doing something silently (data type conversions, import, etc.), so they are safer to use.
3 The specific print function for the tibble, print.tibble(), will not overrun your screen with thousands of lines, it reports only on the ten first. If you need to see all columns, then the traditional head(tibble) will still work, or you can tweak the behaviour of the print function via the function options().print()head()
4 The name of the class itself is not confusing. Where the function print.data.frame() potentially can be the specific method for the print function for a data.frame, it can also be the specific method for the print.data function for a frame object. The name of the class tibble does not use the dot and hence cannot be confusing.
To illustrate some of these differences, consider the following code:
# -- data frame -- df <- data.frame(“value” = pi, “name” = “pi”) df$na # partial matching of column names ## [1] pi ## Levels: pi # automatic conversion to factor, plus data frame # accepts strings: df[,“name”] ## [1] pi ## Levels: pi df[,c(“name”, “value”)] ## name value ## 1 pi 3.141593 # -- tibble -- df <- tibble(“value” = pi, “name” = “pi”) df$name # column name ## [1] “pi” df$nam # no partial matching but error msg. ## Warning: Unknown or uninitialised column: ‘nam’. ## NULL df[,“name”] # this returns a tibble (no simplification) ## # A tibble: 1 x 1 ## name ## <chr> ## 1 pi df[,c(“name”, “value”)] # no conversion to factor ## # A tibble: 1 x 2 ## name value ## <chr> <dbl> ## 1 pi 3.14
This partial matching is one of the nicer functions of R, and certainly was an advantage for interactive use. However when using R in batch mode, thismight be dangerous. Partialmatching is especially dangerous in a corporate environment: datasets can have hundreds of columns and many names look alike, e.g. BAL180801, BAL180802, and BAL180803. Till a certain point it is safe to use partial matching since it will only work when R is sure that it can identify the variable uniquely. But it is bound to happen that you create new rows and suddenly someone else's code will stop working (because now R got confused).