Читать книгу R For Dummies - Vries Andrie de - Страница 2

Introduction

Оглавление

Welcome to R For Dummies, the book that helps you learn the statistical programming language R quickly and easily.

We can’t guarantee that you’ll be a guru if you read this book, but you should be able to

✔ Perform data analysis by using a variety of powerful tools.

✔ Use the power of R to do statistical analysis and data-processing tasks.

✔ Appreciate the beauty of using vector-based operations (rather than loops) to do speedy calculations.

✔ Appreciate the meaning of the following line of code:

knowledge <– apply(theory, 1, sum)

✔ Know how to find, download, and use code that has been contributed to R by its very active community of developers.

✔ Know where to find extra help and resources to take your R coding skills to the next level.

✔ Create beautiful graphs and visualizations of your data.

About This Book

R For Dummies is an introduction to the statistical programming language known as R. We start by introducing the interface and work our way from the very basic concepts of the language through more sophisticated data manipulation and analysis.

We illustrate every step with easy-to-follow examples. This book contains numerous code snippets, several write-it-yourself functions you can use later on, and complete analysis scripts. All these are for you to try out yourself.

We don’t attempt to give a technical description of how R is programmed internally, but we do focus as much on the why as on the how. R has many features that may seem surprising at first, so we believe it’s important to explain both how you should talk to R, and how the R engine interprets what you say. After reading this book, you should be able to manipulate your data in the form you want and understand how to use functions we didn’t cover in the book (as well as the ones we do cover).

This book is a reference. You don’t have to read it from beginning to end. Instead, you can use the table of contents and index to find the information you need. We cross-reference other chapters where you can find more information.

Changes in the Second Edition

Since the publication of the first edition, R has kept evolving and improving. To keep the book accurate, we updated the code to reflect any changes in the latest version of R (version 3.2.0). With the feedback from readers, students, and colleagues we could rework some sections to clarify issues and correct inaccuracies. For example, we modified the code to use double quotes instead of single quotes when using text strings. We also refer to the fundamental units of lists as components, rather than elements.

The new rfordummies package contains code examples in the book. Read all about it in Appendix B.

R and RStudio

R For Dummies can be used with any operating system that R runs on. Whether you use Mac, Linux, or Windows, this book will get you on your way with R.

R is more a programming language than an application. When you download R, you automatically download a console application that’s suitable for your operating system. However, this application has only basic functionality, and it differs to some extent from one operating system to the next.

RStudio is a cross-platform application, also known as an Integrated Development Environment (IDE) with some very neat features to support R. In this book, we don’t assume you use any specific console application. However, RStudio provides a common user interface across the major operating systems. For this reason, we use RStudio to demonstrate some of the concepts rather than any specific operating-system version of R.

Conventions Used in This Book

Code snippets appear like this example, where we simulate 1 million throws of two six-sided dice:

> set.seed(42)

> throws <– 1e6

> dice <– replicate(2,

+                   sample(1:6, throws, replace = TRUE)

+ )

> table(rowSums(dice))

     2     3     4      5      6      7      8

 28007 55443 83382 110359 138801 167130 138808

     9    10    11    12

110920 83389 55816 27945

Each line of R code in this example is preceded by one of two symbols:

>: The prompt symbol, >, is not part of your code, and you should not type this when you try the code yourself.

+: The continuation symbol, +, indicates that this line of code still belongs to the previous line of code. In fact, you don’t have to break a line of code into two, but we do this frequently, because it improves the readability of code and helps it fit into the pages of a book.

Lines that start without either the prompt or the continuation symbol are output produced by R. In this case, you get the total number of throws where the dice added up to the numbers 2 through 12. For example, out of 1 million throws of the dice, on 28,007 occasions the numbers on the dice added to 2.

You can copy these code snippets and run them in R, but you have to type them exactly as shown. There are only three exceptions:

✔ Don’t type the prompt symbol, >.

✔ Don’t type the continuation symbol, +.

✔ Where you put spaces or tabs isn’t critical, as long as it isn’t in the middle of a keyword. Pay attention to new lines, though.

Instructions to type code into the R console has the > symbol to the left:

> print("Hello world!")

If you type this into a console and press Enter, R responds with:

[1] "Hello world!"

For convenience, we collapse these two events into a single block, like this:

> print("Hello world!")

[1] "Hello world!"

Functions, arguments, and other R keywords appear in monofont. For example, to create a plot, you use the plot() function. Function names are followed by parentheses – for example, plot(). We don't add arguments to the function names mentioned in the text, unless it’s really important.

On some occasions we talk about menu commands, such as File⇒Save. This just means that you open the File menu and choose the Save option.

What You’re Not to Read

You can use this book however works best for you, but if you’re pressed for time (or just not interested in the nitty-gritty details), you can safely skip anything marked with a Technical Stuff icon. You also can skip sidebars (text in gray boxes); they contain interesting information, but nothing critical to your understanding of the subject at hand.

Foolish Assumptions

This book makes the following assumptions about you and your computer:

You know your way around a computer. You know how to download and install software. You know how to find information on the Internet and you have Internet access.

You’re not necessarily a programmer. If you are a programmer, and you’re used to coding in other languages, you may want to read the notes marked by the Technical Stuff icon – there, we fill you in on how R is similar to, or different from, other common languages.

You’re not a statistician, but you understand the very basics of statistics. R For Dummies isn’t a statistics book, although we do show you how to do some basic statistics using R. If you want to understand the statistical stuff in more depth, we recommend Statistics For Dummies, 2nd Edition, by Deborah J. Rumsey, PhD (Wiley).

You want to explore new stuff. You like to solve problems and aren’t afraid of trying things out in the R console.

How This Book Is Organized

The book is organized in six parts. Here’s what each of the six parts covers.

Part I: Getting Started with R Programming

In this part, you write your first script. You use the powerful concept of vectors to make simultaneous calculations on many variables at once. You work with the R workspace (in other words, how to create, modify, or remove variables). You find out how to save your work and retrieve and modify script files that you wrote in previous sessions. We also introduce some fundamentals of R (for example, how to install packages).

Part II: Getting Down to Work in R

In this part, we fill you in on the three R’s: reading, ’riting, and ’rithmetic – in other words, working with text and numbers (and dates for good measure). You also get to use the very important data structures of lists and data frames.

Part III: Coding in R

R is a programming language, so you need to know how to write and understand functions. In this part, we show you how to do this, as well as how to control the logic flow of your scripts by making choices using if statements, as well as looping through your code to perform repetitive actions. We explain how to make sense of and deal with warnings and errors that you may experience in your code. Finally, we show you some tools to debug any issues that you may experience.

Part IV: Making the Data Talk

In this part, we introduce the different data structures that you can use in R, such as lists and data frames. You find out how to get your data in and out of R (for example, by reading data from files or the Clipboard). You also see how to interact with other applications, such as Microsoft Excel.

Then you discover how easy it is to do some advanced data reshaping and manipulation in R. We show you how to select a subset of your data and how to sort and order it. We explain how to merge different datasets based on columns they may have in common. Finally, we show you a very powerful generic strategy of splitting and combining data and applying functions over subsets of your data. When you understand this strategy, you can use it over and over again to do sophisticated data analyses in only a few small steps.

After reading this part, you’ll know how to describe and summarize your variables and data using R. You’ll be able to do some classical tests (for example, calculating a t-test). And you’ll know how to use random numbers to simulate some distributions.

Finally, we show you some of the basics of using linear models (for example, linear regression and analysis of variance). We also show you how to use R to predict the values of new data using models that you’ve fitted to your data.

Part V: Working with Graphics

They say that a picture is worth a thousand words. This is certainly the case when you want to share your results with other people. In this part, you discover how to create basic and more sophisticated plots to visualize your data. We move on from bar charts and line charts, and show you how to present cuts of your data using facets.

Part VI: The Part of Tens

In this part, we show you how to do ten things in R that you probably use Microsoft Excel for at the moment (for example, how to do the equivalent of pivot tables and lookup tables). We also give you ten tips for working with packages that are not part of base R.

Icons Used in This Book

As you read this book, you’ll find little pictures in the margins. These pictures, or icons, mark certain types of text:

When you see the Tip icon, you can be sure to find a way to do something more easily or quickly.

You don’t have to memorize this book, but the Remember icon points out some useful things that you really should remember. Usually this indicates a design pattern or idiom that you’ll encounter in more than one chapter.

When you see the Warning icon, listen up. It points out something you definitely don’t want to do. Although it’s really unlikely that using R will cause something disastrous to happen, we use the Warning icon to alert you if something is bound to lead to confusion.

The Technical Stuff icon indicates technical information you can merrily skip over. We do our best to make this information as interesting and relevant as possible, but if you’re short on time or you just want the information you absolutely need to know, you can move on by.

Beyond the Book

R For Dummies includes the following goodies online for easy download:

Cheat Sheet: You can find the Cheat Sheet for this book here:

www.dummies.com/cheatsheet/r

Extras: We provide a few extra articles here:

www.dummies.com/extras/r

Example code: We provide the example code for the book here:

www.dummies.com/extras/r

If we have updates to the content of the book, look here for it:

www.dummies.com/extras/r

Where to Go from Here

There’s only one way to learn R: Use it! In this book, we try to make you familiar with the usage of R, but you’ll have to sit down at your PC and start playing around with it yourself. Crack the book open so the pages don’t flip by themselves, and start hitting the keyboard!

R For Dummies

Подняться наверх