Читать книгу The Big R-Book - Philippe J. S. De Brouwer - Страница 234

♣7♣ Tidy R with the Tidyverse 7.1. The Philosophy of the Tidyverse

Оглавление

R is Free and Open Source Software (FOSS), that implies that it is free to use, but also that you have access to the code – if desired. As most FOSS projects, R is also easy to expand. Fortunately, it is also a popular language and some of thesemillions of R users1 might have created a packages and enhance R's functionality to do just what you need. This allows any R users to stand on the shoulders of giants: you do not have to re-invent the wheel, but you can just pick a package and expand your knowledge and that of humanity. That is great, and that is one of the most important reasons to use R. However, this has also a dark side: the popularity and the ease to expand the language means that there are literally thousands of packages available. It is easy to be overwhelmed by the variety and vast amount of packages available and this is also one of the key weaknesses of R.

Most of those packages will require one or more other packages to be loaded first. These packages will in their turn also have dependencies on yet other (or the same) packages. These dependenciesmight require a certain version of the upstreampackage. This package maintenance problem used to be known as the “dependency hell.” The package manager of R does, however, a good job and it usually will work as expected.

Using the same code again after a few years, is usually more challenging. In the meanwhile you might have updated R to a newer version and most packages will be updated too. It might happen that some packages have become obsolete and are not maintained anymore and therefore, the new version is not available. This can cause some other packages to fail.

Maintaining code is not a big challenge if you just write a project for a course at the university and will never use it again. Code maintenance becomes an issue when you want to use the code later …but it becomes a serious problem if other colleagues need to review your work, expand it and change it later (while you might not be available).

Another issue is that because of this flexibility, core R is not very consistent (though people will argue that while Linux does even a worse job here and still is the best OS).

OS

operating system

Consistency does matter and it follows from a the choice of a programming philosophy. For example, R is a software to do things with data, so each function should have a first argument that refers to the data. Many functions will follow this rule, but not all. Similar issues exist for arguments to functions, names of objects and classes (e.g. there is vector and Date, etc.)

Then there is the tidyverse. It is a recent addition to R that is both a collection of often used functionalities and a philosophy.

The developers of tidyverse promote2:

 Use existing and common data structures. So all the packages in the tidyverse will share a common S3 class types; this means that in general functions will accept data frames (or tibbles). More low-level functions will work with the base R vector types.

 Reuse data structures in your code. The idea here is that there is a better option than always over-writing a variable or create a new one in every line: pass on the output of one line to the next with a “pipe”: %>%. To be accepted in the tidyverse, the functions in a package need to be able to use this pipe.3pipe

 Keep functions concise and clear. For example, do not mix side-effects and transformations, function names should be verbs where ever possible (unless they become too generic or meaningless of course), and keep functions short (they do only one thing, but do it well).

 Embrace R as a functional programming language. This means that reflexes that youmight have from say C++, C#, python, PHP, etc., will have to be mended. This means for example that it is best to use immutable objects and copy-on-modify semantics and avoid using the refclass model (see Section 6.4 “The Reference Class, refclass, RC or R5 Model” on page 113). Use where possible the generic functions provided by S3 and S4. Avoid writing loops (such as repeat and for but use the apply family of functions (or refer to the package purrr).

 Keep code clean and readable for humans. For example, prefer meaningful but long variable names over short but meaningless ones, be considerate towards people using auto-complete in RStudio (so add an id in the first and not last letters of a function name), etc.

Tidyverse is in permanent development as core R itself and many other packages. For further and most up-to-date information we refer to the website of the Tidyverse: http://tidyverse.tidyverse.org.

The Big R-Book

Подняться наверх