The Big R-Book
Реклама. ООО «ЛитРес», ИНН: 7719571260.
Оглавление
Philippe J. S. De Brouwer. The Big R-Book
Table of Contents
List of Tables
List of Illustrations
Guide
Pages
THE BIG R-BOOK. FROM DATA SCIENCE TO LEARNING MACHINES AND BIG DATA
Foreword
About the Companion Site
About the Author
Preface
Acknowledgements
♣1♣ The Big Picture with Kondratiev and Kardashev
Notes
♣2♣ The Scientific Method and Data
The Scientific Method
Note
♣3♣ Conventions
Listing 3.1: This is what you would see if you start R in the command line terminal. Note that the last sign is the R-prompt, inviting you to type commands. This code fragment is typical for how code that is not in the R-language has been typeset in this book
Listing 3.2: Another example of a command line instructions: factor, calc, and pi. This example only has CLI code and does not start R
Question #1 Histogram
Definition: This is a definition
Function use for mean()
Example: Mean
Example: Mean
Hint – Using the hint boxes
Note – Layout details
Digression – This is good to know
Warning – Read comments in code
Note – Shadow
Notes
♣4♣ The Basics of R
Hint – Getting more help
4.1 Getting Started with R
Hint – Using R Online
RStudio
Hint – RStudio is free
Basic arithmetic
Hint – White space
Editing variables interactively
Further information – Other ways to import data
Warning – Using CLI tools
Batch mode
4.2. Variables
Assignment
Variable Management
Note – What are invisible variables
4.3 Data Types
4.3.1 The Elementary Types
Warning – Changing data types
Dates
Further information –More about dates
4.3.2 Vectors. 4.3.2.1 Creating Vectors
4.3.3 Accessing Data from a Vector
4.3.3.1 Vector Arithmetic
Warning – Not all operations are element per element
4.3.3.2 Vector Recycling
Warning – Vector recycling
4.3.3.3 Reordering and Sorting
Question #2 Temperature conversion
Hint – Addressing the object nottem
4.3.4 Matrices
4.3.4.1 Creating Matrices
4.3.4.2 Naming Rows and Columns
4.3.4.3 Access Subsets of a Matrix
4.3.4.4 Matrix Arithmetic
Question #3 Dot product
Note – Percentage signs point towards matrix operations
Warning – R consistently works element by element
4.3.5 Arrays
4.3.5.1 Creating and Accessing Arrays
4.3.5.2 Naming Elements of Arrays
4.3.5.3 Manipulating Arrays
4.3.5.4 Applying Functions over Arrays
Function use for apply()
4.3.6 Lists
4.3.6.1 Creating Lists
Definition: List
Further information – Object-oriented programming in R
4.3.6.2 Naming Elements of Lists
4.3.6.3 List Manipulations
Lists of Lists Are Also Lists
Further information – Double square brackets
Add and Delete Elements of a List
Warning – Deleting elements in lists
Convert list to vectors
Warning – Silent failing of unlist()
4.3.7 Factors
4.3.7.1 Creating Factors
Digression – The reduced importance of factors
4.3.7.2 Ordering Factors
Generate Factors with the Function gl() Function use for gl()
Question #4
Question #5
Question #6
4.3.8 Data Frames. 4.3.8.1 Introduction to Data Frames
4.3.8.2 Accessing Information from a Data Frame
Warning – Avoiding conversion to factors
4.3.8.3 Editing Data in a Data Frame
4.3.8.4 Modifying Data Frames. Add Columns to a Data-frame
Adding Rows to a Data-frame
Merging data frames
Short-cuts
Warning – Short-cuts can be dangerous
Naming Rows and Columns
Question #7
4.3.9 Strings or the Character-type
Example: Using strings
Note – Paste
Formatting with
Function use for format()
Formatting examples
Further information – format()
Other string functions
4.4 Operators
4.4.1 Arithmetic Operators
Warning – Element-wise operations in R
4.4.2 Relational Operators
4.4.3 Logical Operators
Note – Numeric equivalent and logical evalutation
4.4.4 Assignment Operators
Hint – Assignment
Digression – For C++ programmers
Warning – Sparingly change variables in other environments
4.4.5 Other Operators
Warning – Redefine existing operators
4.5 Flow Control Statements
4.5.1 Choices. 4.5.1.1 The if-Statement
Function use for if()
Hint – Extending the if-statement
4.5.1.2 The Vectorised If-statement
4.5.1.3 The Switch-statement
4.5.2 Loops
4.5.2.1 The For Loop
Function use for for()
Example: For loop
Note – No counter in the for loop
4.5.2.2 Repeat
Function use for repeat()
Example: Repeat loop
Warning – Break out of he repeat loop
4.5.2.3 While
Function use for while()
Example: While loop
4.5.2.4 Loop Control Statements
Digression – The speed of loops
Further information – Speed
4.6 Functions
4.6.1 Built-in Functions
4.6.2 Help with Functions
Help with functions
Further information on packages
4.6.3 User-defined Functions
Function use for function()
Example: A bespoke function
4.6.4 Changing Functions
Hint
4.6.5 Creating Function with Default Arguments
Example
Example: default value for function
4.7 Packages
4.7.1 Discovering Packages in R
Example: loading the package DiagrammeR
Further information – Packages
Useful functions for packages
Further information – All available packages
Further information – All installed packages
4.7.2 Managing Packages in R
Note – Cold code in this section
4.8 Selected Data Interfaces
4.8.1 CSV Files
Hint – Reading files directly from the Internet
Finding data
Writing to a CSV file
Warning – Silently added rows
4.8.2 Excel Files
4.8.3 Databases
Connecting to the Database
Fetching Data Drom a Database
Update Queries
Create Tables from R Data-frames
Warning – Closing the database connection
Notes
♣5♣ Lexical Scoping and Environments. 5.1 Environments in R
5.2 Lexical Scoping in R
Note – Dynamic scoping
Warning – Dynamical scoping
Hint –Write readable code
Note
♣6♣ The Implementation of OO
6.1. Base Types
6.2. S3 Objects
Hint – Naming conventions
6.2.1 Creating S3 Objects
6.2.2 Creating Generic Methods
6.2.3 Method Dispatch
Note – Avoid direct calls
Hint – Speed gain
6.2.4 Group Generic Functions
Note – Distinguish groups and functions
Hint – Find what is the next method
6.3. S4 Objects
Hint – Loading the library methods
6.3.1 Creating S4 Objects
Note – Difference between inheritance and methods
6.3.2 Using S4 Objects
Note – Compare addressing slots in S4 and S3
Warning – Partialmatching
Hint – Alternative to address slots
Question #8
Question #9
Hint – List all slots
6.3.3 Validation of Input
Warning – Silent setting to default
Warning – Changing class definitions at runtime
Hint – Locking a class definition
Hint – Typesetting conventions
6.3.4 Constructor functions
Hint – Calling the constructor function
6.3.5 The .Data slot
6.3.6 Recognising Objects, Generic Functions, and Methods
Note – Nuances in the OO system
6.3.7 Creating S4 Generics
Warning – Overloading functions
6.3.8 Method Dispatch
6.4. The Reference Class, refclass, RC or R5 Model
Note – Recent developments
6.4.1 Creating RC Objects
Note
Hint
Note – Assigning in the encapsulating environment
Note – Addressing attributes and methods
Note – No dynamic editing of field definitions
6.4.2 Important Methods and Attributes
6.5. Conclusions about the OO Implementation
Digression – R6
Notes
♣7♣ Tidy R with the Tidyverse. 7.1. The Philosophy of the Tidyverse
Tidy Data
Tidy Conventions
Further information – Tidyverse philosophy
7.2. Packages in the Tidyverse
Digression – Calling methods of not loaded packages
7.2.1 The Core Tidyverse
7.2.2 The Non-core Tidyverse
Warning –Work in progress
7.3. Working with the Tidyverse. 7.3.1 Tibbles
Digression – Special characters in column names
Hint
Digression – Changing how a tibble is printed
Hint – Viewing the content of a tibble
7.3.2 Piping with R
Example: – Pipe operator
Hint – Pronouncing the pipe
Note – Equivalence of piping and nesting
7.3.3 Attention Points When Using the Pipe
Further information – Error catching
7.3.4 Advanced Piping. 7.3.4.1 The Dollar Pipe
Note – Using functions without brackets
7.3.4.2 The T-Pipe
7.3.4.3 The Assignment Pipe
Warning – Assignment pipe
7.3.5 Conclusion
Hint – Use pipes sparingly
Notes
♣8♣ Elements of Descriptive Statistics
8.1. Measures of Central Tendency
8.1.1 Mean
8.1.1.1 The Arithmetic Mean
Definition: Arithmetic mean
Hint – Outliers
8.1.1.2 Generalised Means
Definition: f-mean
The Power Mean
Example: Whichmeanmakes most sense?
8.1.2 The Median
8.1.3 The Mode
Hint – Use default values to keep code backwards compatible
8.2. Measures of Variation or Spread
Definition: Variance
8.2.1 Standard Deviation. Definition: Standard deviation
8.2.2 Median absolute deviation. Definition: mad
8.3. Measures of Covariation
8.3.1 8.3.1 The Pearson Correlation
8.3.2 8.3.2 The Spearman Correlation
Question #10
Warning – Correlation is more specific than relation
8.3.3 Chi-square Tests
Chi-Square test in R. Function use for chisq.test()
8.4. Distributions
8.4.1 Normal Distribution
The Normal Distribution in R
Illustrating the Normal Distribution
Case Study: Returns on the Stock Exchange
8.4.2 Binomial Distribution
The Binomial Distribution in R
An Example of the Binomial Distribution
8.5. Creating an Overview of Data Characteristics
Note – A tibble is a special form of data-frame
Notes
♣9♣ Visualisation Methods
9.1 Scatterplots
Function use for plot() – for a scatterplot
Further information – See the code
9.2 Line Graphs
Function use for plot() – for line plots
A line-plot example
9.3 Pie Charts
Function use for pie()
9.3.1 Pie chart example
9.4 Bar Charts
The function barplot() Function use for barplot()
Stacked bar charts
Barplots With Total of 100 Procent
Warning – Scaled boxplots
9.5 Boxplots
Function use for boxplot()
9.6. Violin Plots
Further information – ggplot2
9.7 Histograms
Function use for hist()
9.8 Plotting Functions
9.9 Maps and Contour Plots
9.10 Heat-maps
Function use for heatmap()
9.11 Text Mining
9.11.1 Word Clouds
Example: – The text of this book
Step 1: Importing the Text
Step 2: Cleaning the Text
Hint – Visualize the text-file
Step 3: Build a Term-document Matrix
Step 4: Generate the Word-cloud
Function use for wordcloud()
9.11.2 Word Associations
Word Associations in R
9.12 Colours in R
Hint – Using American English
Hint – Online list of colours for R
Colour sets
Further information –More plots
Notes
♣10♣ Time Series Analysis. 10.1 Time Series in R
10.1.1 The Basics of Time Series in R. 10.1.1.1 The Function ts()
Function use for ts()
10.1.1.2 Multiple Time Series in one Object
10.2 Forecasting
10.2.1 Moving Average. 10.2.1.1 The Moving Average in R
Example: – GDP data
10.2.1.2 Testing the Accuracy of the Forecasts
Testing the Accuracy of Forecasts – Backtesting
10.2.1.3 Basic Exponential Smoothing
10.2.1.4 Holt-Winters Exponential Smoothing
Holt Exponential Smoothing
10.2.2 Seasonal Decomposition
Note – Exponential trends
Exponential Models
Question #11
Note
♣11♣ Further Reading
Hint – What if you are stuck?
Further information – CRAN
♣12♣ A Short History of Modern Database Systems
Notes
♣13♣ RDBMS
Notes
♣14♣ SQL. 14.1 Designing the Database
14.2 Building the Database Structure. 14.2.1 Installing a RDBMS
Listing 14.1: Installing MySQL on a Linux computer is easy and straightforward. Here shown for a Debian based system
Further information –MariaDB
Listing 14.2: Starting MySQL as root user. The first line is the command in the CLI, the last line is the MySQL prompt, indicating that we are now in the MySQL shell
Hint – Hardening the database server
Note – Similarities between the R CLI and the MySQL CLI
14.2.2 Creating the Database
Listing 14.3: Create the database in which all tables will be created
Hint – Comments in SQL
14.2.3 Creating the Tables and Relations
Listing 14.4: Starting MySQL, as user “libroot.” Note that this is done from the Linux CLI
Listing 14.5: Create the table tbl_authors
Digression – SQL is not case sensitive
Listing 14.6: This SQL code block creates the tabletbl_booksand then define an index on two of its fields
Listing 14.7: Manage indexes in MySQL
Note – Impossible definitions are possible
Listing 14.8: This SQL code creates the table tbl_genres and then checks if it is really there
Digression – UTF8 collation
Listing 14.9: Checking the structure of the tabletbl_books
14.3 Adding Data to the Database
Listing 14.10: Logging in as user “librarian.”
Listing 14.11: Adding the first author to the database
Note – Providing values for automatically incremented fields
Listing 14.12: An alternative way to add the author to the book by specifying the fields provided
Note –Missing values
Listing 14.13: This SQL code adds all books in one statement
Hint – Input errors
Hint – View the data in a table
Listing 14.14: Add the data to the table tbl_genres
Listing 14.15: Add the data to the table tbl_books
14.4 Querying the Database
Digression – Can a FK be NULL?
14.4.1 The Basic Select Query
Listing 14.16: Some example of SELECT-queries. Note that the output is not shown here, simply because it would be too long
Note –Working with dates in SQL
14.4.2 More Complex Queries
Note – Using NULL
Digression –MySQL specific note
Listing 14.17: This code can be used to show that a manual linking of fields leads to the same records as a left join. Note that the output is not provided
Hint – Automated linking
Note – The difference between right an left joins
Hint – Tidy queries
Further information –MySQL
14.5 Modifying the Database Structure
Digression –More than one solution
Further information
Listing 14.18: This code first creates the table tbl_author_books and then inserts the necessary information that was already into the database also in that table. Finally, it discards the old information
Digression –Matching unknown strings
Note – Composite PKs
Listing 14.19: Finally, we can add our book that has more than one author to our database
Digression – Removing a variable in SQL
14.6 Selected Features of SQL
14.6.1 Changing Data
Listing 14.20: Capitalize all first letter of all full names of authors
14.6.2 Functions in SQL
Listing 14.21: Creating a function and using it in SQL
Listing 14.22: Just a little taste of some additional features in SQL.We encourage you to learnmore about SQL. This piece of code introduces functions, variable, and the UNION-query
Hint – Delimiter in SQL
Hint –Make a backup of the database
Further information –More about SQL
Notes
♣15♣ Connecting R to an SQL Database
Hint – RODBC
Digression –MariaDB
Warning – Batch environment
Hint – Clearing the query cache
Note
PART IV Data Wrangling
Hint – RDBMS and R
Notes
♣16♣ Anonymous Data
Listing 16.1: SQL code for MySQL (or MariaDB) to encrypt using AES256.Note that those relational database systems (RDBMSs) provide much more methods for encryption. It is worth to go through the documentation of your particular system for more support
Further information – Cryptology
Notes
♣17♣ Data Wrangling in the tidyverse
17.1 Importing the Data. 17.1.1 Importing from an SQL RDBMS
17.1.2 Importing Flat Files in the Tidyverse
Further information – readr
17.1.2.1 CSV Files
Note – Separator specific functions
Hint – Check data-type before importing
Question #12 Importing difficult files
17.1.2.2 Making Sense of Fixed-width Files
Note – Automated downloading and decompressing
17.2 Tidy Data
17.3 Tidying Up Data with tidyr
Definition: – Tidy data
Hint – The tidyverse philosophy
17.3.1 Splitting Tables
17.3.2 Convert Headers to Data
17.3.3 Spreading One Column Over Many
Note – spread() and gather()
17.3.4 Split One Columns into Many
Note – Fixed width separation
17.3.5 Merge Multiple Columns Into One
17.3.6 Wrong Data
17.4 SQL-like Functionality via dplyr
Hint –Mix all data wrangling techniques
17.4.1 Selecting Columns
Note – Using pipes
17.4.2 Filtering Rows
Note – Name-space conflict
Hint – Equivalence between dplyr and SQL
17.4.3 Joining
Hint – Sort order
Note – Remove duplicates
Note – Short-cuts
17.4.4 Mutating Data
Hint – Advanced mutating
Warning – Difference between filter() and joins
17.4.5 Set Operations
Note – Column headings in data-frames
17.5 String Manipulation in the tidyverse
Hint – Naming convention of functions
Warning – str_c() does not return the C-string
17.5.1 Basic String Manipulation
Hint – Replacing sub-strings
Duplicate Strings
Manage White Space
Determining Order and Sorting Strings
17.5.2 Pattern Matching with Regular Expressions
17.5.2.1 The Syntax of Regular Expressions
Digression – Advanced email matching
Note – Single or double escape characters
Lazy and Greedy Quantifiers
Other Regex Aspects
Hint – General methods in R
Further information – Regex
Regex for Humans with rex
17.5.2.2 Functions Using Regex
Detect a Match
Locate
Note – Using locating functions as boolean
Replace
Extract
Split strings Using the Match as Separator
Further information about regex
17.6 Dates with lubridate
17.6.1 ISO 8601 Format
Digression – R's internal date-format
Warning – Dates as numbers can be confusing
Hint – Other date formats
Note – Today's date
17.6.2 Time-zones
Hint – Available time-zones
Hint – Create date-time from split data
17.6.3 Extract Date and Time Components
Note – No side effects
17.6.4 Calculating with Date-times
17.6.4.1 Durations
17.6.4.2 Periods
Note – Period functions starting letter
17.6.4.3 Intervals
17.6.4.4 Rounding
17.7 Factors with Forcats
Warning – Unmatched labels
Hint – Find out which factor levels exist
Other functions from forcats
Hint – Use regex
Further information – Plots
Question #13
Notes
♣18♣ Dealing with Missing Data
18.1 Reasons for Data to be Missing
Example: Unclear questions
18.2 Methods to Handle Missing Data. 18.2.1 Alternative Solutions to Missing Data
Example
Note – The randomness in our example
18.2.2 Predictive Mean Matching (PMM)
18.3 R Packages to Deal with Missing Data. 18.3.1 mice
18.3.2 missForest
Hint – Fine-tuning
18.3.3 Hmisc
Further information – The package mi
Notes
♣19♣ Data Binning. 19.1 What is Binning and Why Use It
Further information – Bias
Hint – Automated binning
19.2 Tuning the Binning Procedure
Further information – ggplot2
A Model Without Binning
A Model with Binning
Note – Reasons for Binning
19.3 More Complex Cases: Matrix Binning
Question #14 Binary dependent variables
Question #15 Think outside the box
Question #16 Think deeper
19.4 Weight of Evidence and Information Value
19.4.1 Weight of Evidence (WOE)
19.4.2 Information Value (IV)
19.4.3 WOE and IV in R
Further information – Other packages used
Digression – The function kable()
Question #17
Notes
♣20♣ Factoring Analysis and Principle Components
20.1 Principle Components Analysis (PCA)
Hint – Fine tuning the function princomp()
Hint – Executing PCA before fitting a model
20.2 Factor Analysis
</image>Hint – Customisation of factanal()
Further information –More tools for factor analysis
Note
♣ 21 ♣ Regression Models. 21.1 Linear Regression
Linear Regression
Question #18 – Build a linearmodel
21.2 Multiple Linear Regression
Question #19 – Build amultiple linear regression
21.2.1 Poisson Regression
Definition: Poisson Regression
Function use for glm()
Note – The function glm()
21.2.2 Non-linear Regression
Function use for nls()
Hint – Shorthand notation
21.3 Performance of Regression Models
21.3.1 Mean Square Error (MSE)
Definition: Mean Square Error (MSE)
21.3.2 R-Squared
Definition: R-squared
Further information – About the summary
Question #20 – Find a better model
21.3.3 Mean Average Deviation (MAD)
Definition: Mean average deviation (MAD)
♣ 22 ♣ Classification Models
22.1. Logistic Regression
Definition: – Generalised logistic regression
Logistic Regression. Definition: – Additive logistic regression
Note –Multiple uses for glm()
22.2. Performance of Binary Classification Models
22.2.1 The Confusion Matrix and Related Measures
22.2.2 ROC
22.2.3 The AUC
22.2.4 The Gini Coefficient
22.2.5 Kolmogorov-Smirnov (KS) for Logistic Regression
Further information – code for plots
22.2.6 Finding an Optimal Cut-off
Hint – Backwards compatibility
Notes
♣ 23 ♣ Learning Machines
23.1. Decision Tree
23.1.1 Essential Background. 23.1.1.1 The Linear Additive Decision Tree
Note – Not all sub-sets are possible
23.1.1.2 The CART Method
23.1.1.3 Tree Pruning
23.1.1.4 Classification Trees
Digression – Drawing the misclassification functions
23.1.1.5 Binary Classification Trees
23.1.2 Important Considerations
23.1.2.1 Broadening the Scope
23.1.2.2 Selected Issues
23.1.3 Growing Trees with the Package rpart. 23.1.3.1 Getting Started with the Function rpart()
Function use for rpart()
Further information about rpart
23.1.3.2 Example of a Classification Tree with rpart
23.1.3.3 Visualising a Decision Tree with rpart.plot
23.1.3.4 Example of a Regression Tree with rpart
23.1.4 Evaluating the Performance of a Decision Tree. 23.1.4.1 The Performance of the Regression Tree
23.1.4.2 The Performance of the Classification Tree
Hint – Confusingmatrix ready from a package
23.2. Random Forest
Note – Variations for the random forest
23.3. Artificial Neural Networks (ANNs) 23.3.1 The Basics of ANNs in R
23.3.2 Neural Networks in R
Function use for neuralnet()
23.3.3 The Work-flow to for Fitting a NN
Step 1: Missing Data
Hint – Never delete more data than necessary
Step 2: Split the Data in Test and Training Set
Step 3: Fit a Challenger Model
Step 4: Rescale the Data and Split into Training and Testing Set
Step 5: Train the ANN on the Training Set
Step 6: Test the Model on the Test Data
23.3.4 Cross Validate the NN
Note – Cross validation withoutmodelr
23.4. Support Vector Machine
23.4.1 Fitting a SVM in R
Function use for svm()
Note – Optimisation types
23.4.2 Optimizing the SVM
Further information cross validation
23.5. Unsupervised Learning and Clustering
23.5.1 k-Means Clustering
Warning – Hard problems
23.5.1.1 k-Means Clustering in R
Note – Elegant labels
Warning – Distance and units
Question #21 – k-Means
Hint – Adding Voronoi cell borders
23.5.1.2 PCA before Clustering
Note – Autoplot or native ggplot?
23.5.1.3 On the Relation Between PCA and k-Means
23.5.2 Visualizing Clusters in Three Dimensions
23.5.3 Fuzzy Clustering
23.5.4 Hierarchical Clustering
Note – Scaling data
Further information –Ward’s method
Digression – A nicer dendrogram
23.5.5 Other Clustering Methods
Notes
♣24♣ Towards a Tidy Modelling Cycle with modelr
24.1 Adding Predictions
Function use for add_predictions()
24.2 Adding Residuals
Function use for add_residuals()
24.3 Bootstrapping Data
Function use for bootstrap()
24.4 Other Functions of modelr
♣25♣ Model Validation
25.1 Model Quality Measures
25.2 Predictions and Residuals
25.3 Bootstrapping
25.3.1 Bootstrapping in Base R
Function use for sample()
25.3.2 Bootstrapping in the tidyverse with modelr
25.4 Cross-Validation
25.4.1 Elementary Cross Validation
Note – Which data to use for testing?
Further information – To refit or not to refit?
Hint – Split Size
25.4.2 Monte Carlo Cross Validation
Note – Simple cross validation
Digression – Advanced piping
25.4.3 k-Fold Cross Validation
25.4.4 Comparing Cross Validation Methods
Further information – Hybrid cross validation methods
25.5 Validation in a Broader Perspective
Notes
♣26♣ Labs. 26.1 Financial Analysis with quantmod
26.1.1 The Basics of quantmod
26.1.2 Types of Data Available in quantmod
Hint – Other data sources
26.1.3 Plotting with quantmod
Hint – short-cuts for dates
26.1.4 The quantmod Data Structure
26.1.4.1 Sub-setting by Time and Date
26.1.4.2 Switching Time Scales
26.1.4.3 Apply by Period
26.1.5 Support Functions Supplied by quantmod
26.1.6 Financial Modelling in quantmod. 26.1.6.1 Financial Models in quantmod
26.1.6.2 A Simple Model with quantmod
26.1.6.3 Testing the Model Robustness
Notes
♣27♣ Multi Criteria Decision Analysis (MCDA) 27.1 What and Why
Digression – Corporate change time scales
27.2 General Work‐flow
Step 1: Explore the Big Picture
Step 2: Identify the Problem at Hand
Note – MCDA needs a structured approach
Step 3: Get Data, Construct and Normalise the Decision Matrix
Step 4: Leave Out Unacceptable and Inefficient Alternatives
Step 5: Use a Multi Criteria Decision Method to Get a Ranking
Step 6: Recommend a Solution
Definition: MCDA wording
27.3 Identify the Issue at Hand: Steps 1 and 2
Digression – SWOT Analysis
27.4 Step 3: the Decision Matrix. 27.4.1 Construct a Decision Matrix
27.4.2 Normalize the Decision Matrix
27.5 Step 4: Delete Inefficient and Unacceptable Alternatives. 27.5.1 Unacceptable Alternatives
27.5.2 Dominance – Inefficient Alternatives
Warning – Rescale the decision matrix
27.6 Plotting Preference Relationships
Note – Dots
27.7 Step 5: MCDA Methods
Definition: Efficient Solutions
27.7.1 Examples of Non‐compensatory Methods
27.7.1.1 The MaxMin Method
27.7.1.2 The MaxMax Method
27.7.2 The Weighted Sum Method (WSM)
Digression – Colour schemes
Note – MCDA is not exact science
27.7.3 Weighted Product Method (WPM)
27.7.4 ELECTRE
Definition: Preference of one solution over another
Definition: Anti‐preference of one solution over another
Definition: TheWeighted Degree of Indifference
Note – Should the weights be the same as in WSM?
27.7.4.1 ELECTRE I
Definition: Index of comparability of Type 1
Definition: Index of comparability of Type 2
Definition: Kernel of an MCDA problem
ELECTRE I in R
Digression – Passing on matrices as a list
Note – Transitiveness of preference
Warning – Handle results with care
Conclusion for ELECTRE I
27.7.4.2 ELECTRE II
Note – Alternative
Note – ELECTRE II
27.7.4.3 Conclusions ELECTRE
27.7.5 PROMethEE
The Idea of PROMethEE
Definition: Distance dk(a, b)
Further information – code for plots
27.7.5.1 PROMethEE I. Note – Avoiding compensation
The Preference Relations
PROMethEE I in R
Question #22 – Avoiding the Side effect in the function
Note – Symmetry in the preference functions
Advantages and Disadvantages of PROMethEE I
27.7.5.2 PROMethEE II
27.7.6 PCA (Gaia)
27.7.7 Outranking Methods
27.7.8 Goal Programming
27.8 Summary MCDA
Further information – MCDA
Digression – Step 6
Notes
PART VI Introduction to Companies
♣28♣ Financial Accounting (FA)
28.1 The Statements of Accounts
28.1.1 Income Statement. Definition: Income Statement
28.1.2 Net Income: The P&L statement
Definition: P & L
Definition: NOPAT
28.1.3 Balance Sheet
Definition: Balance Sheet
28.2 The Value Chain
Value Creation
Observation of Value Creation
28.3 Further, Terminology
Definition: Loans
Note –Wording loans an debt
Definition: Equity
Example: Equity
Definition: CapEx
Further information – Capex
Definition: OpEx
Note – Opex
28.4 Selected Financial Ratios
Profit Margin. Definition: ProfitMargin (PM)
Gross Margin. Definition: GrossMargin (GM)
Asset Utilisation. Definition: Asset Utilisation (AU)
Liquid Ratio or Current Ratio
Definition: Liquid Ratio (LR)
Definition: Current Ratio (CR)
Note – CR or LR
Quick Ratio. Definition: Quick ratio (QR)
Operating Assets. Definition: Operating Assets (OA)
Operating Liabilities. Definition: Operating Assets (OL)
Net Operating Assets. Definition: Net Operating Assets (NOA)
Digression – non-operating assets
Working Capital. Definition: Working Capital (WC)
Digression –WC and LR
Total Capital (Employed)
Definition: Total Capital Employed (TC or TCE)
Weighted Average Cost of Capital (WACC)
Definition: Weighted Average Cost of Capital (WACC)
Reinvestment Rate (RIR) Definition: Reinvestment rate (RIR)
Coverage Ratio. Definition: Coverage Ratio (CoverageR)
Definition: Debt-Service Coverage Ratio (DSCR)
Gearing. Definition: Gearing Ratio (GR)
Debt-to-equity ratio. Definition: Debt-to-equity ratio (DE)
Notes
♣29♣ Management Accounting. 29.1 Introduction
29.1.1 Definition of Management Accounting (MA) Definition: Management Accounting (MA)
Definition: Management Information—MI
29.1.2 Management Information Systems (MIS)
Definition: MIS
29.2 Selected Methods in MA
29.2.1 Cost Accounting
Definition: – Cost accounting
Further information – Cost accounting
29.2.2 Selected Cost Types
Direct Costs. Definition: Direct Cost
Marginal Cost. Definition: Marginal Cost
Indirect Cost. Definition: Indirect Cost
Example: A Computer Assembly Facility
Fixed Cost. Definition: Fixed Cost
Variable Cost. Definition: Variable Cost
Example: A Computer Assembly Facility
Overhead Cost. Definition: Overhead Cost
29.3 Selected Use Cases of MA
29.3.1 Balanced Scorecard. Definition: Balanced scorecard (BS)
Example: Diversity dashboard
Third Generation Balanced Scorecard
Further information – Balanced scorecard
29.3.2 Key Performance Indicators (KPIs)
Definition: KPI
Example: Net Promoter Score (NPS) and customer satisfaction
29.3.2.1 Lagging Indicators
Definition: Lagging Indicator
Example: Lagging Indicator
29.3.2.2 Leading Indicators
Leading KPIs. Definition: Leading Indicator
Example: Leading Indicator
Note – KPIs and corporate organisation
29.3.2.3 Selected Useful KPIs
Customer Value Metric (CVM) Definition: Customer ValueMetric (CVM)
Note – The usefullness of past customer profit
Net Promoter Scores (NPS)
Definition: Net Promoter Score (NPS)
Digression NPS
Definition: Net Satisfaction Score (NSS)
Question #23 – Using a scale from 1 to 5 as a number
Notes
♣30♣ Asset Valuation Basics
30.1 Time Value of Money
30.1.1 Interest Basics
Question #24
30.1.2 Specific Interest Rate Concepts
Definition: APR or AER
Question #25 – Impact of monthly fee on the APR
Nominal vs. Real Interest Rates. Definition: Nominal Interest Rate
Definition: Real Interest Rate
Example: Real Interest Rate
30.1.3 Discounting
Example
Note – Element-wise operations in R
30.2 Cash
Definition: Cash
Example: Cash
30.3 Bonds. Definition: Bond
30.3.1 Features of a Bond
Definition: Principal
Definition: Maturity
Definition: Coupon
Definition: Yield
Definition: Credit quality
Definition: Market price
30.3.2 Valuation of Bonds
Digression – Required interest rate
Note – Bond prices change every day, even every second
Example: – Bond value
Example: – Higher rates increase
Question #26 – Calculate a bond value
Question #27 – Lower interest rates
Question #28 – Higher interest rates
30.3.3 Duration
30.3.3.1 Macaulay Duration
Further information – Yield to maturity
30.3.3.2 Modified Duration
Note – First order estimate of price change
Digression – DV01
30.4 The Capital Asset Pricing Model (CAPM)
30.4.1 The CAPM Framework
Example: Company A
Example: Company B
30.4.2 The CAPM and Risk
30.4.3 Limitations and Shortcomings of the CAPM
Further information – CAPM
30.5 Equities
30.5.1 Definition. Definition: Stock, shares and equity
Digression – Local use of definitions
30.5.2 Short History
30.5.3 Valuation of Equities
30.5.4 Absolute Value Models
30.5.4.1 Dividend Discount Model (DDM)
Constant Growth DDM (CGDDM)
Note – The required rate of return
Example: ABCD with g = 0%
Example: ABCD—extreme growth
Relationship between growth rate and ROE
Definition: earnings
Definition: dividend payout ratio
Definition: plow-back ratio (earnings retention ratio)
Definition: Return on Equity
Note
Conclusions for the DDM Method
30.5.4.2 Free Cash Flow (FCF)
Free Cash Flow (FCF) Definition: Free Cash Flow (FCF)
Digression – FCF or FCFF?
Alternative Ways to Calculate FCF
30.5.4.3 Discounted Cash Flow Model
Discounted Cash Flow. Definition: Discounted Cash Flow
Definition: NPV
Advantages and Disadvantages of the DCF method
30.5.4.4 Discounted Abnormal Operating Earnings Model
30.5.4.5 Net Asset Value Method or Cost Method
Definition: Net Asset Value Method
Investment Funds
Advantages and Disadvantages of the NAV Method
30.5.4.6 Excess Earnings Method
30.5.5 Relative Value Models. 30.5.5.1 The Concept of Relative Value Models
Market Value vs Instrinsic Value
Definition: Price or market value
Definition: Value
Definition: Market capitalization
30.5.5.2 The Price Earnings Ratio (PE)
Definition: – Price earnings ratio (PE ratio)
Digression – How bonds and shares are (dis-)similar
30.5.5.3 Pitfalls when using PE Analysis
30.5.5.4 Other Company Value Ratios
Definition: Price-to-book ratio (PTB)
Definition: Price-to-cash-flow ratio (PTCF)
Definition: Price-to-sales ratio (PTS)
Return on Invested Capital (ROIC)
Return on Capital Employed (ROCE)
Return on Equity (ROE)
Note – Difference ROIC and ROE
Economic Value Added (EVA) Definition: Economic Value Added (EVA)
Market Value Added (MVA) Definition: Market Value Added (MVA)
30.5.6 Selection of Valuation Methods
30.5.7 Pitfalls in Company Valuation
30.5.7.1 Forecasting Performance
30.5.7.2 Results and Sensitivity
Stress Testing
Monte Carlo Simulations
Beyond the Monte Carlo Simulation
Definition: Kernel
Conclusion
30.6 Forwards and Futures
Definition: Future
Definition: Forward
Note – Commodities
30.7 Options. 30.7.1 Definitions
Definition: Call Option
Definition: Put Option
Example: buying an option
Note – long and short
30.7.2 Commercial Aspects
Definition: OTC
Digression – The International Swap and Derivatives Organization (ISDA)
30.7.3 Short History
30.7.4 Valuation of Options at Maturity
30.7.4.1 A Long Call at Maturity
30.7.4.2 A Short Call at Maturity
30.7.4.3 Long and Short Put
30.7.4.4 The Put-Call Parity
30.7.5 The Black and Scholes Model
30.7.5.1 Pricing of Options Before Maturity
30.7.5.2 Apply the Black and Scholes Formula
30.7.5.3 The Limits of the Black and Scholes Model
30.7.6 The Binomial Model
Step One
Example: One step binomial pricing model
The Second Step of the Binomial Model
30.7.6.1 Risk Neutral Method
Warning – Memory use and running time
30.7.6.2 The Equivalent Portfolio Binomial Model
Example: – First order binomial model
30.7.6.3 Summary Binomial Model
30.7.7 Dependencies of the Option Price
30.7.7.1 Dependencies in a Long Call Option
30.7.7.2 Dependencies in a Long Put Option
30.7.7.3 Summary of Findings
30.7.8 The Greeks
30.7.9 Delta Hedging
Delta Hedging Example
Warning – Option hedging
30.7.10 Linear Option Strategies
30.7.10.1 Plotting a Portfolio of Options
Digression – Plotting with base R
30.7.10.2 Single Option Strategies
30.7.10.3 Composite Option Strategies
Note – Callspread and putspread
30.7.11 Integrated Option Strategies
30.7.11.1 The Covered Call
30.7.11.2 The Married Put
30.7.11.3 The Collar
30.7.12 Exotic Options
Digression – Investment advice
30.7.13 Capital Protected Structures
Example: capital protected structure
Notes
PART VII Reporting
♣31♣ A Grammar of Graphics with ggplot2
Further information – Extensions
31.1 The Basics of ggplot2
Hint – Themes
Question #29 – Explore data with ggplot
31.2 Over-plotting
Warning – Not plotting data is not including it at all
Note – Selection of the smoothing method
31.3 Case Study for ggplot2
Note – ggplot and the pipe
Question #30 – Predicting days past due
Notes
♣32♣ R Markdown
Hint – RStudio
Further information –More about RMarkdown
Warning – Be careful with specific symbols
Hint – Change document format
Digression – Free book on R Markdown
Digression – Notebook format
Digression – R Bookdown
Note
♣33♣ knitr and LATEX
Note – Short and long code
Further information – LATEX
Digression – RStudio
Notes
♣34♣ An Automated Development Cycle
♣35♣ Writing and Communication Skills
Example: Customer profitability
Hint – Think fromthe audience’s point of view
Hint – How R can help
Note – Two very different slide decks
Note
♣36♣ Interactive Apps
36.1 Shiny
Hint – Online gallery for Shiny
Hint – Run apps in your browser
Hint – See all available examples
Listing 36.1 The html code to include our Shiny app on a live web-page
Further information – Shiny
36.2 Browser Born Data Visualization
36.2.1 HTML-widgets
Further information – Stunning visualisations
36.2.2 Interactive Maps with leaflet
Further information – leaflet
Digression – References for leaflet
36.2.3 Interactive Data Visualisation with ggvis
Hint – Interactive data for data scientists
36.2.3.1 Getting Started in R with ggvis
Further information – ggvis
36.2.3.2 Combining the Power of ggvis and Shiny
36.2.4 googleVis
36.3 Dashboards
36.3.1 The Business Case: a Diversity Dashboard
Note – Special characters in variable names
36.3.2 A Dashboard with flexdashboard
36.3.2.1 A Static Dashboard
Hint – Find the full code of the dashboard
Further information – Code for this dashboard
Further information – More eye candy
Note – Visualising is not via printing the plot
36.3.2.2 Interactive Dashboards with flexdashboard
Further information – flexdashboard
36.3.3 A Dashboard with shinydashboard
Note –Make your website reactive
Notes
♣37♣ Parallel Computing
37.1 Combine foreach and doParallel
Note – foreach
Hint – Expressions
Further information – Alternatives for multi-core processing
37.2 Distribute Calculations over LAN with Snow
Hint – Random numbers
Note – snow overwrites many functions
Hint – Cost efficiency
Hint – Load balancing with clusterApplyLB(
37.3 Using the GPU
Example: Nvidia
Digression – CUDA or OpenCL?
Hint – Nvidia
37.3.1 Getting Started with gpuR
Warning – Use the memory of the GPU
Note –Milage may vary
Hint – Cleaning up
Further information –Machine learning libraries
37.3.2 On the Importance of Memory use
Digression – Deviations in execution time
37.3.3 Conclusions for GPU Programming
Further information – Advanced GPU programming
Hint –Multiple GPUs
Further information – Nvidia’s CUDA
Notes
♣38♣ R and Big Data
Note –Memory limits in R
38.1 Use a Powerful Server. 38.1.1 Use R on a Server
Hint – RStudio
38.1.2 Let the Database Server do the Heavy Lifting
Hint – A general package to talk ODBC
Further information – A nice intro fromRStudio
38.2 Using more Memory than we have RAM
Notes
♣39♣ Parallelism for Big Data
39.1 Apache Hadoop
Further information – Apache Sotware Foundation
39.2 Apache Spark
Further information – Spark
39.2.1 Installing Spark
Warning – Installing Spark
Hint – Installing via the Apache website
Digression – Spark installer
39.2.2 Running Spark
Digression – Scala
Note – Stopping Spark
39.2.3 SparkR
Hint – Adding checkpoints
Note – Our convention
Note – SparkDataFrame or DataFrame
Note – Statements that are not repeated further
39.2.3.1 A User Defined Function on Spark
dapply
Hint – Extracting the schema
dapplyCollect
The Group Apply Variant: gapply
gapplyCollect
spark.lapply
39.2.3.2 Some Other Functions of SparkR
Back and Forth Between R and Spark
Reading CSV files
Changing Columns and Adding New Ones
Aggregation
39.2.3.3 Machine learning with SparkR
Note – Familiar MLliB
Model Persistence and Sampling
Further information – SparkR
39.2.4 sparklyr
Note – Short intro to sparklyr
Hint – Use SQL fromR on RDD
Spark_apply
Machine Learning
Further information – sparklyr
39.2.5 SparkR or sparklyr
Both tools
Reasons to choose SparkR
Reasons to choose sparklyr
Notes
♣40♣ The Need for Speed
40.1 Benchmarking
Digression – Expressions and curly brackets
40.2 Optimize Code
40.2.1 Avoid Repeating the Same
40.2.2 Use Vectorisation where Appropriate
Note – Overhead created by the for-loops
40.2.3 Pre-allocating Memory
40.2.4 Use the Fastest Function
40.2.5 Use the Fastest Package
40.2.6 Be Mindful about Details
Note – Cold code ahead
40.2.7 Compile Functions
40.2.8 Use C or C++ Code in R
Digression – Background operation
Further information – Rcpp
Digression – Efficient R
40.2.9 Using a C++ Source File in R
Hint – R code within the C++ code within R
Warning – C++ function not saved in .RData
Further information – Using C in R
40.2.10 Call Compiled C++ Functions in R
Further information – Calling C functions
Digression – R in C++
40.3 Profiling Code
Note – Cold code ahead
40.3.1 The Package profr
40.3.2 The Package proftools
Hint – Delete file
40.4 Optimize Your Computer
Warning – Be nice
Digression – Batch R code
Notes
♣A♣ Create your own R Package
Further information – More on creating packages
A.1 Creating the Package in the R Console
Digression – Creating packages in RStudio
A.2 Update the Package Description
A.3 Documenting the Functions
A.4 Loading the Package
A.5 Further Steps
Further information – Further reading
Notes
♣B♣ Levels of Measurement. B.1 Nominal Scale
B.2 Ordinal Scale
B.3 Interval Scale
B.4 Ratio Scale
♣C♣ Trademark Notices
C.1 General Trademark Notices
C.2 R-Related Notices. C.2.1 Crediting Developers of R Packages
C.2.2 The R-packages used in this Book
♣D♣ Code Not Shown in the Body of the Book
♣E♣ Answers to Selected Questions
Note
Bibliography
Nomenclature
Index
WILEY END USER LICENSE AGREEMENT
Отрывок из книги
Philippe J.S. De Brouwer
The student who graduated froma science, technology, engineering ormathematics or similar program will find that this book helps to make a successful step from the academic world into a any private or governmental company.
.....
data_test$N ## [1] Piotr Pawel Paula Lisa Laura ## Levels: Laura Lisa Paula Pawel Piotr
Use “short-cuts” sparingly and only when working interactively (not in functions or code that will be saved and re-run later). When later another column is added the short-cut will no longer be unique and behaviour is hard to predict and it is even harder to spot the programming error in a part of your code that previously worked fine.
.....