Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 36

2.1 R

R [1] began at the University of Auckland, New Zealand, in the early 1990s. Ross Ihaka and Robert Gentleman needed a statistical environment to use in their teaching lab. At the time, their computer labs featured only Macintosh computers that lacked suitable software. Ihaka and Gentleman decided to implement a language based on an S‐like syntax [2]. R's initial versions were provided to Statlib at Carnegie Mellon University, and the user feedback indicated a positive reception.

R's success encouraged its release under the Open Source Initiative (https://opensource.org/). Developers released the first version in June 1995. A software system under the open‐source paradigm benefits from having “many pairs of eyes to develop the software.” R developed a huge following, and it soon became difficult for the developers to maintain. As a response, a 10‐member core group was formed in 1997. The core team handles any changes to the R source code. The massive R community provides support via online mailing lists (https://www.r‐project.org/mail.html) and statistical computing forums – such as Talk Stats (http://www.talkstats.com/), Cross Validated (https://stats.stackexchange.com/), and Stack Overflow (https://stackoverflow.com/). Often users receive responses within a matter of minutes.

Since humble beginnings, R has developed into a popular, complete, and flexible statistical computing environment that is appreciated by academia, industry, and government. R's main benefits include support on all major operating systems and comprehensive package archives. Further, R integrates well with document formats (such as LaTeX (https://www.latex‐project.org/), HTML, and Microsoft Word) through R Markdown (https://rmarkdown.rstudio.com/) and other file formats to enhance literate programming and reproducible data analysis.

R provides extensive statistical capacity. Nearly any method is available as an R package – the trick is locating the software. The base package and default included packages perform most standard analyses and computation. If the included packages are insufficient, one can use CRAN (the comprehensive R archive network) that houses nearly 13 000 packages (visit https://cran.r‐project.org/ for more information). To help navigate CRAN, “CRAN Task Views” organizes packages into convenient topics (https://cran.r‐project.org/web/views/). For bioinformatics, over 1500 packages reside on Bioconductor [3]. Developers also distribute their packages via git repositories, such as github (https://github.com/). For easy retrieval from github, the devtools package allows direct installation.

Computational Statistics in Data Science

Подняться наверх