Читать книгу The Big R-Book - Philippe J. S. De Brouwer - Страница 240
7.2.1 The Core Tidyverse
ОглавлениеThe core tidyverse includes some packages that are commonly used in data wrangling and modelling. Here is a word of explanation already. Later we will explore some of those packages more in detail.
tidyr provides a set of functions that help you get to tidy up data and make adhering to the rules of tidy data easier.tidyrThe idea of tidy data is really simple: it is data where every variable has its own column, and every column is a variable. For more information, see Chapter 17.3 “Tidying Up Data with tidyr” on page 277.
dplyr provides a grammar of data manipulation, providing a consistent set of verbs that solve the most common data manipulation challenges. For more information, see Chapter 17 “DataWrangling in the tidyverse” on page 265.
ggplot2 is a system to create graphics with a philosophy: it adheres to a “Grammar of Graphics” and is able to create really stunning results at a reasonable price (it is a notch more abstract to use than the core-R functionality). For more information, see Chapter 31 “A Grammar of Graphics with ggplot2” on page 687.ggplot2For both reasons, we will talk more about it in the sections about reporting: see Chapter 31 on page 687.
readr expands R's standard5 functionality to read in rectangular6 data.readrIt is more robust, knows more data types and is faster than the core-R functionality. For more information, see Chapter 17.1.2 “Importing Flat Files in the Tidyverse” on page 267 and its subsections.
purrr is casually mentioned in the section about the OO model in R (see Chapter 6 on page 87), and extensively used in Chapter 25.1 “Model Quality Measures” on page 476.purrrIt is a rather complete and consistent set of tools for working with functions and vectors. Using purrr it should be possible to replace most loops with call to purr functions that will work faster.
tibble is a new take on the data frame of core-R. It provides a new base type: tibbles.tibbleTibbles are in essence data frames, that do a little less (so there is less clutter on the screen and less unexpected things happen), but rather give more feedback (showwhat went wrong instead of assuming that you have read all manuals and remember everything). Tibbles are introduced in the next section.
stringr expands the standard functions to work with strings and provides a nice coherent set of functions that all start with str_.stringiThe package is built on top of stringi, which uses the ICU library that is written in C, so it is fast too. For more information, see Chapter 17.5 “String Manipulation in the tidyverse” on page 299.stringr
forcats provides tools to address common problems when working with categorical variables7.forcats