Читать книгу Sports Analytics in Practice with R - Ted Kwartler - Страница 15
The R Programming Language
ОглавлениеR is an open-source, freely available programming language used throughout this book. R is a powerful and longstanding programming language developed more than 20 years ago. It is a derivative of the “S” programming language for statistics originating in the mid-1990s developed by AT&T and Lucent Technologies. Unlike other programming languages, R is optimized specifically for statistics including but not limited to simulation, machine learning, visualizations, and traditional statistical modeling (linear regression) as well as tests. Due to the open-source nature of R, many developers, academics, and enthusiasts have contributed to its development for their specific needs. As a result, the language is extensible meaning it can be easily used for various purposes. For example, through R markdown, simple websites and presentations can be created. In another use case, R can be used for traditional linear modeling or machine learning and can draw upon various data types for analysis including audio files, digital images, text, numeric, and various other data files and types. Thus, it is widely used and nonspecialized other than to say R is an analysis language. This differs from other languages which specialize in web development like Ruby or python which has extended its functionality to building applications not just analysis.
In this textbook, the R language is applied specifically to sports contexts. Of course, the code in this book can be used to extend your understanding of sports analytics. It may give you insights to a particular sport or analytical aspect within the sport itself such as what statistics should be focused on to win a basketball game. However, learning the code in this book can also help open up a world of analytical capabilities beyond sports. One of the benefits of learning statistics, programming, and various analysis methods with sports data is that the data is widely available and outcomes are known. This means that your analysis, models, and visualizations can be applied, and you can review the outcomes as you expand upon what is covered in this book. This differs from other programming and statistical examples which may resort to boring, synthetic data to illustrate an analytical result. Using sports data is realistic and can be future oriented, making the learning more challenging yet engaging. Modeling the survivors of the Titanic pales in comparison since you cannot change the historical outcome or save future cruise ship mates. Thus, modeling which team will win a match or which player is a good draft pick is a superior learning experience.
If you are new to programming don’t be intimidated. R is a forgiving language in that things like spacing an indentation are ignored. Further, the R community is well supported and a simple online search of any error message usually finds an answer quickly on any number of sites.
To begin your R and sports analytics journey, please download the “base-R” distribution for your operating system. The “Comprehensive R Archive Network,” CRAN, is the home of the official R distribution as well as officially supported packages (more on that in a bit). The site to download base-R is https://cran.r-project.org.
Unfortunately, base-R, having started in the nineties, looks abysmal and lacks some modern day functionality. Thus, you will need to next download the R-Studio Integrated Development Environment, or IDE. An IDE is software that consolidates many of the aspects needed to code into one place. For example, you will need to write code which could be done in a simple notepad like program, a place to execute the code written, a place to visualize plots that were output from the code, and so on. These individual components are assembled into the IDE for ease of use and fast development. R and many other languages have IDEs. In fact, R has multiple IDE optimized for the type of analysis you are performing such as biostatistics or working with another language like Java. The most popular and easily supported IDE for base-R is the R-Studio software. There are server and desktop versions available. The code executed in this book should work for either cloud or local but installation of base-R and R-Studio on a server is not covered. Therefore, please download the R-Studio desktop IDE by navigating to https://www.rstudio.com/products/rstudio.
The R-Studio IDE, or Integrated Development Environment, adds functionality and modern user interface to base-R. The IDE aggregates common functionality used for software development and statistical analysis.
Essentially R-Studio sits on top of base-R. The IDE provides a modern GUI expected of today’s computer users while also adding functionality including the use of version control, terminal access and perhaps most importantly an easy way to create and view visualizations for easy export and saving to disk. Figure 1.1 illustrates the basic relationship for base-R and R-Studio. As you can see without base-R, the IDE will not function because none of the computational functions exist in the IDE itself.
Figure 1.1 The relationship between base-R and R-studio.
Now that you have both base-R and R-Studio, let’s start to explore the programming environment. Think of an R environment as a relatively generic statistical piece of software. Once downloaded it can perform all tasks programmatically found in many of the popular spread sheet programs either online or for a laptop. The advantage of R is its extensibility mentioned earlier. R can be specialized from a generic statistical set of tools into a more interesting and nuanced piece of software. This is done through the download of specialized packages and called in the console by loading the package for the task at hand.
Figure 1.2 shows the IDE itself without a “script” to be executed. For now, focus on the “console” section in Figure 1.2. This is the lower left-hand side containing a “>” symbol. This is the section where code will be executed and results are returned.
Figure 1.2 The R-Studio IDE console.
The next step is to navigate to “File > New File > R Script” in the upper left of the IDE. This will open another pane in the IDE. The script pane will be located in the upper left section of the IDE and will shrink the console on the lower left-hand side. While the console is where code is executed and computation enacted, the scripting section is where you will write code that is then run within the console. Think of an R script as merely a lightweight text file that can be saved and repeated by running in the console. A script is nothing more than a set of instructions that have not been enacted yet. To save an R script, navigate to “File > Save” and then simply follow the IDE dialog. The rest of the book provides R scripts for you to execute along with explanations along the way. Figure 1.3 shows the new script pane with some basic example code.
Figure 1.3 The upper left R script with basic commands and comments.
Of particular note in the script shown in Figure 1.3 are two comments and two code examples. A comment begins with a `#
`. This tells R to ignore everything on that line. As you begin your learning journey programming in R, it is a best practice to add comments to remind yourself the nuances of the code to be executed. Thus, feel free to make a copy of any scripts throughout the book, add comments, and save for yourself.
The first code to be executed, beginning on a non-commented line, is a simple arithmetic operation shown below.
2 + 2
Since this is in a script, it will not be run until you declare it within the console. Further, as you can guess the operation `2 + 2
` has a single result `4
`. An easy way to run the script is to place your cursor on the line you want to execute and click the “run” icon on the upper right-hand side of the script. When this is done the code is transferred to the console and executed, returning the single answer as expected. Figure 1.4 illustrates the transfer between script and console.
Figure 1.4 Showing the code execution on line 2 of the script being transferred to the console where the result 4 is printed.
Next, let’s execute another command which will illustrate another pane of the IDE. If your cursor is on line 5 of the R script, `plot(x = 1, y = 2)
` and you click the “run” icon you will now see a simple scatter plot visual appear in the lower right utility pane titled “Plots.” Each tab of the utility pane is described below:
Files—This is a file navigation view, where you can review folders and files to be used in analysis or saved to disk.
Plots—For reviewing any static visualizations the R code creates. This pane can also be used for resizing the image using a graphical user interface (GUI) and saving the plots to disk.
Packages—Since R needs to be specialized for a particular task, this pane lists your local package library with official documentation and accompanying examples, vignettes, and tutorials.
Help—Provides various resources for obtaining help with R and its many tasks.
Viewer—This pane allows you to view the small webpages and dynamic interactive plots which R can create.
Figure 1.5 shows the result of a basic, yet not visually appealing scatter plot with a single point. Rest assured the plots throughout the book are more compelling than this simplistic example. The x,y coordinate points are defined in code as `x = 1
` and `y = 2
`.
Figure 1.5 The basic scatter plot is instantiated in the lower right, “Plots” plane.
Next, let’s focus on the remaining upper right pane of the IDE. The primary tab of interest is the “Environment” tab. R works by creating objects which are stored data objects. When an object is created, it is held in active memory, your computer’s RAM. Any active objects in your R session will be shown in the “Environment” tab in the upper right. Add the following code in the script (upper left) pane, then click “run” to instantiate an object in your environment. Notice the first bit follows a `#
` so the non-code comment “Create an object” acts as a signpost for you while the next line actually creates the object. Specifically, the object name is `xVal
` and it is declared have a value of `1
`. Moreover, the declaration of the object name to value is done with the assignment operate `<-
`. In the R language you can also use an equal sign for the object name assignment. However, most R style guides use the `<-
` operator and this book follows that direction.
# Create an object xVal <- 1
When run the upper right environment tab will now show an object, `xVal
`, that is held in memory for use later in the script. Of course, these objects can become much more complex than a single value. Next add more code to your script utilizing the `xVal
` object rather than declaring the value explicitly. The following code can be added to your script and then run to recreate the simple scatter plot from before. The difference is that R has substituted the `x = xVal
` input to `x = 1
` since that is the object’s actual value. The only difference in the plots is that the second one has a different x-axis title because the value was derived from the object name. Figure 1.6 now shows the additional code chunks, the new object in the environment, and the recreated plot in the utility pane.
Figure 1.6 The renewed plot with an R object in the environment.
# Create a plot with an object value plot(x = xVal, y = 2)
The basic functionality of R is underpinned by functions and objects. Each package that specializes R comes with a set of functions usually coordinated for a particular task like data manipulation, obtaining sports data or similar. Functions accept inputs, including objects, and manipulate the inputs most often to create new objects or to overwrite and replace existing objects. For example, the following code creates a new object `newObj
` using the assignment operator and on the right-hand side employs a base-R function. Base-R functions do not require any libraries to be loaded, so there is no need to specialize the R environment for a particular task. The `newObj
` variable is declared as a result of a function `round
` with two input parameters. The first parameter accepts the number to be rounded, `1.23
`. The second parameter `digits = 0
` is a tuning parameter which changes the behavior of the `round
` function declaring the number of decimals to round the input to. Thus, when you add the following code to the script and then execute it in the console, the resulting `newObj
` variable has a corresponding value of 1. As before, the `newObj
` object will be stored actively and shown in the “Environment” tab. Keep in mind the inputs themselves can be objects not just declared values. As a result of this behavior, scripts manipulate objects and often pass them to another function later in the script.
# Create a new object with a function newObj <- round(1.23, digits = 0)
This book will illustrate many functions both in base-R and within specialized packages applied in a sports context. R has many tens of thousands of packages with corresponding functions. Often the rest of this book will defer to base-R functions in an effort for standardization, stability, and ease of understanding rather than utilize an esoteric package. This is a deliberate choice to improve conceptual understanding but does leave room for code optimization and improvement.
There are additional intermediate programming operators that are employed in this book. In fact, there are multiple types of logical and arithmetic operators but for the most part the scripts in this book are focused on one use case at a time, with linear thinking, so you can focus on the concepts and applications more so than concise code. However, Table 1.1 describes the three control flow operators used in the book with a code example for you to try in your script and console. Within the FOR loop, a set of code is run repeatedly with a variable that changes each time through. For the latter two, the IF and IFELSE control flows, a logical statement is evaluated and controls the code’s behavior. If the statement is run and returns TRUE, then the code is executed otherwise it is ignored.
Table 1.1 Three simple control flows in R including the FOR loop, IF and IFELSE statement.
Name | Code | Description |
---|---|---|
FOR loops | for (i in 1:4){ print(i + 2) } | The FOR loop has a dynamic variable `i ` which will update a number of times. Here, the `i ` value loop will repeat from 1, 2, 3, and 4. The code within the curly brackets executes with the updated `i ` value. The first time through the loop `i ` equals `1 ` and with `+ 2 ` the value 3 is printed to the console. The second time through `i ` updates to `2 ` and is once again added with `+ 2 ` so that the value `4 ` is printed. This continues in the loop 4 times because of the `1:4 ` parameter |
IF statement | if(xVal == 1){ print('xVal is equal to one.') } | The IF statement is a control operator. After the `if ` code, a statement is created to check its validity. If the statement inside parentheses evaluates to TRUE, then the code within the curly brackets is executed. In this example, the statement checks whether a variable `xVal ` is equal to `1 `. Since it does, the code in the curly brackets executes and a message is printed to the console state “xVal is equal to one.” If the statement does not evaluate to TRUE, the code inside the curly brackets is ignored. For example, if `xVal == 2 `, then the code block is not run |
IF ELSE statement | if(xVal == 1){ print('xVal is equal to one.') } else { print('xVal is not equal to one.') } | The IF-ELSE control flow adds another layer to the previous IF statement. Now a new set of curly brackets is added along with the `else ` function. This statement will execute one of the two code chunks within the curly brackets based on the TRUE or FALSE result of the logical statement. Here, if `xVal == 1 `, then the first message is printed, same as before. However, for any other value of `xVal `, the second bit of code is run. For example, if `xVal == 2 `, then the IF statement evaluates to FALSE and the second message “xVal is not equal to one” will be printed to the console. |
Another aspect of R programming is that it can utilize various data object types referred to as classes. Previously, the `xVal
` object was a single numeric value, however can analyze and work the other common data types. First R can understand the difference between an integer, a whole number, and a numeric value. The distinction is that a numeric data type can be a number with a decimal. Although this difference can seem subtle in some computational work, this has an impact. If you’ve been following the simple code examples in this chapter, you should have `xVal
`, `newObj
` and an `i
` variable from the previous FOR loop. Reviewing the “Environment” pane you will note the `i
` variable has a `4L
` instead of just 4. This denotes that the variable is a whole number without a decimal. In contrast, the `xVal
` object has a `1
` without the “L.” This means R is understanding this value to be a decimal or floating-point number. You can check the class difference using the `class
` function applied to any object. Notice how the third `class` function call switches the returned value to “numeric” when a decimal is added. Often this distinction is not impactful but there are times as you will learn in this book that functions expect specific object types.
class(i) class(xVal) class(i +.01)
In addition to integers and numeric values, common R data types include “Boolean” values known in R as “logical” object types. Boolean data types are merely TRUE or FALSE. R can interpret these values as occurring or not occurring as shown in the IF statements. Additionally, for some operations, Boolean values can be interpreted as 1 and 0 for TRUE and FALSE, respectively. For example, in R `TRUE + TRUE
` will return a value of `2
` while `TRUE – FALSE
` will return `1
`, because R interprets the Boolean as 1 – 0. Let’s create a Boolean object called `TFobj
` in the code below for use later.
TFobj <- TRUE
Another data type R often utilizes is a “factor.” A factor is a non-unique description of information. For example, a sports team may be assigned to a conference. Another team may also be assigned to that conference as well so it is frequently a repeating value within a data set. The factor has a level, meaning the conference name, and in effect the factor level alone represents specific “meta” information such as the other teams in the conference, and even perhaps some of the team’s schedule. This meta-information is inherited as a pattern within the larger data set, not explicitly defined within the object type. While this may be confusing, it will make sense eventually as the object types and classes move to multiple values instead of single values later in this chapter. The code below simply creates a single object, `teamA
` with a factor defined as the Eastern conference. The function to declare value as a factor is simply `as.factor
`.
teamA <- as.factor('Eastern_Conference')
In addition to factors, the last commonplace variable type includes “character.” Character objects represent natural language, for example, from social media or fan forums that need to be analyzed. The field of character and string analysis is referred to as Natural Language Processing (NLP). These methods and technology underpin the popular smart speakers and voice assistants among other everyday common technologies such as e-mail spam filters. This book devotes one chapter to gauging fan engagement on a popular forum. Thus, this type of data type will be covered extensively. However, one chapter merely covers the basics of NLP and much more can be accomplished with additional methods, code, and academic literature. Below is a fictitious social media post from a fan. Character values can be declared with `as.character
` but, as written here, are not necessary.
fanTweet <- "I love baseball"
In review, Table 1.2 reviews the common data types used in R and within the book. There are additional data types like `NULL
` and `NA
` but these are more straightforward, requiring less explanation. Once you have run all the code in the table, you can simply call `class
` on each object to check that R is interpreting the object type as expected.
Table 1.2 Common R data types including integer, numeric, logical, factor, and character.
Name | Code | Description |
---|---|---|
“integer” | x <- 5L | A whole number without a decimal point |
“numeric” | y <- 5.123 | A floating point number |
“logical” | z <- TRUE z <- T #capital T or F is acceptable too | A logical “Boolean” operator either TRUE or FALSE. R will interpret TRUE as 1 and FALSE as 0 for some operations |
“factor” | playerPosition <- as.factor(“forward”) | A factor is a distinct class often representing non-unique information. The factor classes are referred to as “levels.” Here, a player position is defined as a factor with the level “forward” |
“character” | fanComment <- “I love the hot dogs at the stadium” | Character values, known as strings, represent natural language. Unlike factors, they can be repeating or mutually exclusive. A growing subset of analytics work includes Natural Language Processing (NLP) |
Previously, the objects created such as `xVal
` and `i
` represented a single value. R’s coding environment relies on specific data types and corresponding classes that can be more complex than a single value. For instance, R can create and work with “vectors.” Vectors are merely columns of data that you may be familiar with if you’re coming to R from a spreadsheets program. To create a numeric vector, you employ the combine function which is `c
`. In the following code, a vector of numbers is created called `xVec
`. The `xVec
` object utilizes some of the objects previously created along with additional values that are explicitly declared within the `c`, combine function. Each value within the vector is separated by a comma. Once `xVec
` is created, calling in the console will return multiple values where the object such as `xVal
` is now substituted to their numeric equivalents.
xVec <- c(xVal, i, newObj, 345,678)
Scaling up from a single vector, one method for arranging multiple columns into a single object is with `cbind
`. The `cbind
` function arranges vectors in a column-wise fashion. Similarly, the `rbind
` function will stack vectors as rows. The resulting object type is no longer a “numeric” or other previous type discussed, but instead “matrix” type. A matrix arranges data into rows and columns within a single object. This code creates `xMatrix
` using `cbind
` and simply repeating the previous vector `xVec
` to create a second column. Once executed the `xMatrix
` variable is in the environment and when called demonstrates a five row by two column arrangement of the data in a single object. Calling `class
` on the object will return “matrix.”
xMatrix <- cbind(xVec, xVec)
R has another method for arranging data as rows and columns called a data frame. The data frame object type is useful when you are working with mixed data types, for example, a player roster with names, as characters, teams as factors, statistics as numeric, and so on. All of these vectors can be organized into a single object using `data.frame
`. This code is a bit more complex because it nests functions when constructing the data frame. Within the `data.frame
` function call, the first column is names “number1.” It is assigned a value of `xVec
` which equates to the numeric values previously constructed. The next column, “logical2,” is separated by a comma and employs the `c` function combining logical values. Next, the “factor3” column is declared. This column has multiple functions including `c` to combine a vector of “a,” “b,” “a,” “b,” and “b” but then it is changed from a simple character vector to factor using `as.factor
`. Finally, the fourth column, “string4,” consists of various character strings. Once instantiated in the console, the `xDataFrame
` object can be called to illustrate the mixed data types held within the single object. Table 1.3 shows the results of creating the `xDataFrame
` object.
Table 1.3 The constructed data frame with mixed data types.
number1 | logical2 | factor3 | string4 |
1 | TRUE | a | string1 |
4 | TRUE | b | s2 |
1 | FALSE | a | s3 |
345 | FALSE | b | s4 |
678 | TRUE | b | s5 |
xDataFrame <- data.frame(number1 = xVec, logical2 = c(T,T,F,F,T), factor3 = as.factor(c('a','b','a','b','b')), string4 = c('string1', 's2', 's3', 's4', 's5'))
R can employ either a matrix or data frame to arrange data in rows and columns. In both object types, the columns and rows must be complete. For example, you cannot `cbind` a vector with three values to another with two values. This makes the data “ragged” and for matrices r data frames requires you to fill in the cell value with NA. However, some functions require one object class over another. The difference is that a matrix must have all values be of the same data type. For example, each value in all of the columns must all be numeric or all logical. If this is not the case, the matrix function will coerce the data into characters automatically which can cause issues. As a result, most often in this text, the `data.frame
` and object type are used. However, you can coerce either object type to the other using `as.matrix
` or `as.data.frame
` to switch. Just keep in mind the mixed data coercion mentioned previously.
The last data type discussed in this book is a “list” object. There are other object types including time series and arrays but for the most part this book employs mixed data types, with data frames and sparingly lists. If you are familiar with spreadsheets, think of a list as a “workbook” containing multiple “work sheets.” Each tab of the spreadsheet programs can contain different data even single values and different types. A list is similar in that each list “element” can contain a single value, multiple values, matrices, data frames, or even more lists! The following code creates a list object with varying data types and lengths, while Figure 1.7 is a graphical representation of the list.
Figure 1.7 The representation of the list with varying objects.
xList <- list(xDataFrame, fanTweet, teamA, xVec)
In complex R objects, you can get specific sections of the data by name or through indexing. The previous list has four elements, denoted with double square brackets such as `[[2]]
`. To access a specific list element, you can call the object `xList
` along with its specific element index as shown below to select the fourth element, the vector of numbers.
xList[[4]]
The same can be done with matrices or data frames using single brackets. Indexing row and column data requires two inputs separated by a comma. The selection for rows is first followed by the selection for columns. For example, let’s first call the `xDataFrame
` object in its entirety to establish familiarity. Then select the first row and third column which represents a single cell value of the data frame. Next, you can select a different row, column combination on your own within the console to establish this single value is returned.
xDataFrame xDataFrame[1,3]
Indexing also works for entire columns or entire rows. This is done by leaving the rows position blank or the columns position blank on either side of the comma. To call the second column of the data frame simply use single brackets, nothing on the left of the comma and a 2 to the right of the comma as shown.
xDataFrame[, 2]
Similarly, you can switch the index number to the left of the comma to obtain a specific row. Here, the entire fourth row is returned while the column position is left blank.
xDataFrame[4, ]
Besides the ability to have multiple data types, another benefit of the data frame object is the ability to declare a column by its name using the `$
` sign. For example, instead of an index position the column names `$numer1
` will return the entire first column of the data frame object. The two methods, indexing or by name, are equivalent but can be used interchangeably as long as the column has a declared name.
xDataFrame$number1
In fact, indexing can become more complex. You can access a specific list element, then a specific row, column, or single value by utilizing double then single brackets or `$` as shown. First, the fourth element of the list is obtained with `[[4]]`; then the second value is obtained within that vector. Keep in mind there is no need for a comma because a vector does not have rows or column. Instead, a vector merely has a position. In this case, “2” is returned.
# 4th element, vector 2nd position xList[[4]][2]
Next, the first list element is accessed, and as a data frame, the single brackets with a comma refer to the second row.
# 1st element, 2nd row xList[[1]][2,]
Similarly, the same data frame is indexed to return the first column because the “1” is to the right of the comma within the single square brackets.
# 1st element, 1st column xList[[1]][,1]
Of course, you can also use both rows and column positions separated by the comma within the single brackets.
# 1st element, 2nd row, 1st column xList[[1]][2,1]
Just to make things a bit more complex, if the list element is a data frame with named vectors, the second part of the code can employ the `$` along with the name. This will return the first list element, a data frame, and only the named column called “logical2.”
# 1st element, named column with $ xList[[1]]$logical2
Lastly, since the column of this list element is being accessed, it too can be indexed. Once again, the single column does not have a row and column pairing, it only has a position. Thus, no comma is needed and only the third position is returned in this example.
# 1st element, names column with $, third position xList[[1]]$logical2[3]
If all this seems wildly complex, do not fret. Throughout the book extensive explanation is given for both functions, inputs, and indexing. Further, with enough practice, this becomes commonplace and more readily understood.
So far, this basic explanation of R functionality has relied on base-R functions and libraries that are part of the standard installation. As mentioned previously, R can be specialized to a particular task by loading libraries. In order to obtain libraries, the `install.packages
` function must be run with a package name to download the specialized functions. This is done only once per library so that the library code is installed locally to your R installation. After the download occurs you can merely call the `library
` function with the name in order to enable the specialized functionality using the local installation. The code below installs a popular graphics library called “grammar of graphics” known as `ggplot2
` using the `install.packages
` function. After it is downloaded, the next line merely loads it as part of your R environment. This allows your R session to call functions within a “namespace” that includes base-R and now `ggplot2
` functions. It serves the purpose of specializing R for improved visualizations.
install.packages('ggplot2') library(ggplot2)
Throughout this book, multiple libraries are loaded. Novice R programmers can run into errors and frustrations regarding package installations. When executing scripts in this book that begin with `library(…)
`, an error of “there is no package called …” means you first need to use `install.packages
` to download the functionality to your library. Additionally errors may occur during the `install.packages
` step. This can be due to multiple reasons but most often stems from the fact that a package to be downloaded requires another package first. As a result, carefully read the console messages during the install phase to identify any other package prerequisites. If the `install.packages
` function executes correctly, then it is not necessary to repeat that function for each script. Thus, the code in this book only calls `library` for each specific library enabling corresponding functionality needed for the task at hand. This assumes all libraries have been previously and successfully installed.
To specialize R, first install a package with `install.packages` with the corresponding name. If installed without issue, simply call `library` any time your R session needs specialized functionality corresponding to the specific library. You will only need to use `install.packages` once but `library` will need to be called each time you start R and require the specialized functions of a particularly library.
In at least one instance in the book, a custom function is needed to make the code more concise. A custom function is like any other function loaded from a library. It is defined for an operation and requires an input and returns a value or object. The code below creates a simple custom function as an example. The function is declared as `plus3
` with the `function
` statement. Next, the input parameter is declared as `x
`. This means the function will be called `plus3
` and requires an input temporarily called `x
`. What happens to `x
` occurs within the curly brackets. In this case, a simple operation `x + 3
` overwrites the internal value of `x` and the new value is returned. The function will be an object in the environment and can accept any numeric or integer value. Here, the function is created and then applied to a value of 2. The output is assigned an object itself in `exampleThree
`.
plus3 <- function(x){ x <- x + 3 return(x) } exampleThree <- plus3(2) exampleThree
Of course, functions can be more complex. As an example, the following function is made to be more dynamic by adding a new parameter, called `value
`. Now both are required for the function to operate. The `x` value is now divided by the `value
` input parameter that is passed into the function. Additionally, before the result is returned from the function, the `round
` function is applied further adjusting the preceding division. In the end, for example, the custom function `divideVal
` will accept a number 5, divide it by 2, and then round the result so that it returns the value 2.
divideVal <- function(x, value){ x <- x / value x <- round(x) return(x) } exampleValue <- divideVal(5,2) exampleValue