The Big R-Book

The Big R-Book
Автор книги: id книги: 1887917     Оценка: 0.0     Голосов: 0     Отзывы, комментарии: 0 13151,9 руб.     (144,03$) Читать книгу Купить и скачать книгу Купить бумажную книгу Электронная книга Жанр: Математика Правообладатель и/или издательство: John Wiley & Sons Limited Дата добавления в каталог КнигаЛит: ISBN: 9781119632771 Скачать фрагмент в формате   fb2   fb2.zip Возрастное ограничение: 0+ Оглавление Отрывок из книги

Реклама. ООО «ЛитРес», ИНН: 7719571260.

Описание книги

Introduces professionals and scientists to statistics and machine learning using the programming language R Written by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science.  The Big R-Book for Professionals: From Data Science to Learning Machines and Reporting with R includes nine parts, starting with an introduction to the subject and followed by an overview of R and elements of statistics. The third part revolves around data, while the fourth focuses on data wrangling. Part 5 teaches readers about exploring data. In Part 6 we learn to build models, Part 7 introduces the reader to the reality in companies, Part 8 covers reports and interactive applications and finally Part 9 introduces the reader to big data and performance computing. It also includes some helpful appendices. Provides a practical guide for non-experts with a focus on business users Contains a unique combination of topics including an introduction to R, machine learning, mathematical models, data wrangling, and reporting Uses a practical tone and integrates multiple topics in a coherent framework Demystifies the hype around machine learning and AI by enabling readers to understand the provided models and program them in R Shows readers how to visualize results in static and interactive reports Supplementary materials includes PDF slides based on the book’s content, as well as all the extracted R-code and is available to everyone on a Wiley Book Companion Site The Big R-Book is an excellent guide for science technology, engineering, or mathematics students who wish to make a successful transition from the academic world to the professional. It will also appeal to all young data scientists, quantitative analysts, and analytics professionals, as well as those who make mathematical models.

Оглавление

Philippe J. S. De Brouwer. The Big R-Book

Table of Contents

List of Tables

List of Illustrations

Guide

Pages

THE BIG R-BOOK. FROM DATA SCIENCE TO LEARNING MACHINES AND BIG DATA

Foreword

About the Companion Site

About the Author

Preface

Acknowledgements

♣1♣ The Big Picture with Kondratiev and Kardashev

Notes

♣2♣ The Scientific Method and Data

The Scientific Method

Note

♣3♣ Conventions

Listing 3.1: This is what you would see if you start R in the command line terminal. Note that the last sign is the R-prompt, inviting you to type commands. This code fragment is typical for how code that is not in the R-language has been typeset in this book

Listing 3.2: Another example of a command line instructions: factor, calc, and pi. This example only has CLI code and does not start R

Question #1 Histogram

Definition: This is a definition

Function use for mean()

Example: Mean

Example: Mean

Hint – Using the hint boxes

Note – Layout details

Digression – This is good to know

Warning – Read comments in code

Note – Shadow

Notes

♣4♣ The Basics of R

Hint – Getting more help

4.1 Getting Started with R

Hint – Using R Online

RStudio

Hint – RStudio is free

Basic arithmetic

Hint – White space

Editing variables interactively

Further information – Other ways to import data

Warning – Using CLI tools

Batch mode

4.2. Variables

Assignment

Variable Management

Note – What are invisible variables

4.3 Data Types

4.3.1 The Elementary Types

Warning – Changing data types

Dates

Further information –More about dates

4.3.2 Vectors. 4.3.2.1 Creating Vectors

4.3.3 Accessing Data from a Vector

4.3.3.1 Vector Arithmetic

Warning – Not all operations are element per element

4.3.3.2 Vector Recycling

Warning – Vector recycling

4.3.3.3 Reordering and Sorting

Question #2 Temperature conversion

Hint – Addressing the object nottem

4.3.4 Matrices

4.3.4.1 Creating Matrices

4.3.4.2 Naming Rows and Columns

4.3.4.3 Access Subsets of a Matrix

4.3.4.4 Matrix Arithmetic

Question #3 Dot product

Note – Percentage signs point towards matrix operations

Warning – R consistently works element by element

4.3.5 Arrays

4.3.5.1 Creating and Accessing Arrays

4.3.5.2 Naming Elements of Arrays

4.3.5.3 Manipulating Arrays

4.3.5.4 Applying Functions over Arrays

Function use for apply()

4.3.6 Lists

4.3.6.1 Creating Lists

Definition: List

Further information – Object-oriented programming in R

4.3.6.2 Naming Elements of Lists

4.3.6.3 List Manipulations

Lists of Lists Are Also Lists

Further information – Double square brackets

Add and Delete Elements of a List

Warning – Deleting elements in lists

Convert list to vectors

Warning – Silent failing of unlist()

4.3.7 Factors

4.3.7.1 Creating Factors

Digression – The reduced importance of factors

4.3.7.2 Ordering Factors

Generate Factors with the Function gl() Function use for gl()

Question #4

Question #5

Question #6

4.3.8 Data Frames. 4.3.8.1 Introduction to Data Frames

4.3.8.2 Accessing Information from a Data Frame

Warning – Avoiding conversion to factors

4.3.8.3 Editing Data in a Data Frame

4.3.8.4 Modifying Data Frames. Add Columns to a Data-frame

Adding Rows to a Data-frame

Merging data frames

Short-cuts

Warning – Short-cuts can be dangerous

Naming Rows and Columns

Question #7

4.3.9 Strings or the Character-type

Example: Using strings

Note – Paste

Formatting with

Function use for format()

Formatting examples

Further information – format()

Other string functions

4.4 Operators

4.4.1 Arithmetic Operators

Warning – Element-wise operations in R

4.4.2 Relational Operators

4.4.3 Logical Operators

Note – Numeric equivalent and logical evalutation

4.4.4 Assignment Operators

Hint – Assignment

Digression – For C++ programmers

Warning – Sparingly change variables in other environments

4.4.5 Other Operators

Warning – Redefine existing operators

4.5 Flow Control Statements

4.5.1 Choices. 4.5.1.1 The if-Statement

Function use for if()

Hint – Extending the if-statement

4.5.1.2 The Vectorised If-statement

4.5.1.3 The Switch-statement

4.5.2 Loops

4.5.2.1 The For Loop

Function use for for()

Example: For loop

Note – No counter in the for loop

4.5.2.2 Repeat

Function use for repeat()

Example: Repeat loop

Warning – Break out of he repeat loop

4.5.2.3 While

Function use for while()

Example: While loop

4.5.2.4 Loop Control Statements

Digression – The speed of loops

Further information – Speed

4.6 Functions

4.6.1 Built-in Functions

4.6.2 Help with Functions

Help with functions

Further information on packages

4.6.3 User-defined Functions

Function use for function()

Example: A bespoke function

4.6.4 Changing Functions

Hint

4.6.5 Creating Function with Default Arguments

Example

Example: default value for function

4.7 Packages

4.7.1 Discovering Packages in R

Example: loading the package DiagrammeR

Further information – Packages

Useful functions for packages

Further information – All available packages

Further information – All installed packages

4.7.2 Managing Packages in R

Note – Cold code in this section

4.8 Selected Data Interfaces

4.8.1 CSV Files

Hint – Reading files directly from the Internet

Finding data

Writing to a CSV file

Warning – Silently added rows

4.8.2 Excel Files

4.8.3 Databases

Connecting to the Database

Fetching Data Drom a Database

Update Queries

Create Tables from R Data-frames

Warning – Closing the database connection

Notes

♣5♣ Lexical Scoping and Environments. 5.1 Environments in R

5.2 Lexical Scoping in R

Note – Dynamic scoping

Warning – Dynamical scoping

Hint –Write readable code

Note

♣6♣ The Implementation of OO

6.1. Base Types

6.2. S3 Objects

Hint – Naming conventions

6.2.1 Creating S3 Objects

6.2.2 Creating Generic Methods

6.2.3 Method Dispatch

Note – Avoid direct calls

Hint – Speed gain

6.2.4 Group Generic Functions

Note – Distinguish groups and functions

Hint – Find what is the next method

6.3. S4 Objects

Hint – Loading the library methods

6.3.1 Creating S4 Objects

Note – Difference between inheritance and methods

6.3.2 Using S4 Objects

Note – Compare addressing slots in S4 and S3

Warning – Partialmatching

Hint – Alternative to address slots

Question #8

Question #9

Hint – List all slots

6.3.3 Validation of Input

Warning – Silent setting to default

Warning – Changing class definitions at runtime

Hint – Locking a class definition

Hint – Typesetting conventions

6.3.4 Constructor functions

Hint – Calling the constructor function

6.3.5 The .Data slot

6.3.6 Recognising Objects, Generic Functions, and Methods

Note – Nuances in the OO system

6.3.7 Creating S4 Generics

Warning – Overloading functions

6.3.8 Method Dispatch

6.4. The Reference Class, refclass, RC or R5 Model

Note – Recent developments

6.4.1 Creating RC Objects

Note

Hint

Note – Assigning in the encapsulating environment

Note – Addressing attributes and methods

Note – No dynamic editing of field definitions

6.4.2 Important Methods and Attributes

6.5. Conclusions about the OO Implementation

Digression – R6

Notes

♣7♣ Tidy R with the Tidyverse. 7.1. The Philosophy of the Tidyverse

Tidy Data

Tidy Conventions

Further information – Tidyverse philosophy

7.2. Packages in the Tidyverse

Digression – Calling methods of not loaded packages

7.2.1 The Core Tidyverse

7.2.2 The Non-core Tidyverse

Warning –Work in progress

7.3. Working with the Tidyverse. 7.3.1 Tibbles

Digression – Special characters in column names

Hint

Digression – Changing how a tibble is printed

Hint – Viewing the content of a tibble

7.3.2 Piping with R

Example: – Pipe operator

Hint – Pronouncing the pipe

Note – Equivalence of piping and nesting

7.3.3 Attention Points When Using the Pipe

Further information – Error catching

7.3.4 Advanced Piping. 7.3.4.1 The Dollar Pipe

Note – Using functions without brackets

7.3.4.2 The T-Pipe

7.3.4.3 The Assignment Pipe

Warning – Assignment pipe

7.3.5 Conclusion

Hint – Use pipes sparingly

Notes

♣8♣ Elements of Descriptive Statistics

8.1. Measures of Central Tendency

8.1.1 Mean

8.1.1.1 The Arithmetic Mean

Definition: Arithmetic mean

Hint – Outliers

8.1.1.2 Generalised Means

Definition: f-mean

The Power Mean

Example: Whichmeanmakes most sense?

8.1.2 The Median

8.1.3 The Mode

Hint – Use default values to keep code backwards compatible

8.2. Measures of Variation or Spread

Definition: Variance

8.2.1 Standard Deviation. Definition: Standard deviation

8.2.2 Median absolute deviation. Definition: mad

8.3. Measures of Covariation

8.3.1 8.3.1 The Pearson Correlation

8.3.2 8.3.2 The Spearman Correlation

Question #10

Warning – Correlation is more specific than relation

8.3.3 Chi-square Tests

Chi-Square test in R. Function use for chisq.test()

8.4. Distributions

8.4.1 Normal Distribution

The Normal Distribution in R

Illustrating the Normal Distribution

Case Study: Returns on the Stock Exchange

8.4.2 Binomial Distribution

The Binomial Distribution in R

An Example of the Binomial Distribution

8.5. Creating an Overview of Data Characteristics

Note – A tibble is a special form of data-frame

Notes

♣9♣ Visualisation Methods

9.1 Scatterplots

Function use for plot() – for a scatterplot

Further information – See the code

9.2 Line Graphs

Function use for plot() – for line plots

A line-plot example

9.3 Pie Charts

Function use for pie()

9.3.1 Pie chart example

9.4 Bar Charts

The function barplot() Function use for barplot()

Stacked bar charts

Barplots With Total of 100 Procent

Warning – Scaled boxplots

9.5 Boxplots

Function use for boxplot()

9.6. Violin Plots

Further information – ggplot2

9.7 Histograms

Function use for hist()

9.8 Plotting Functions

9.9 Maps and Contour Plots

9.10 Heat-maps

Function use for heatmap()

9.11 Text Mining

9.11.1 Word Clouds

Example: – The text of this book

Step 1: Importing the Text

Step 2: Cleaning the Text

Hint – Visualize the text-file

Step 3: Build a Term-document Matrix

Step 4: Generate the Word-cloud

Function use for wordcloud()

9.11.2 Word Associations

Word Associations in R

9.12 Colours in R

Hint – Using American English

Hint – Online list of colours for R

Colour sets

Further information –More plots

Notes

♣10♣ Time Series Analysis. 10.1 Time Series in R

10.1.1 The Basics of Time Series in R. 10.1.1.1 The Function ts()

Function use for ts()

10.1.1.2 Multiple Time Series in one Object

10.2 Forecasting

10.2.1 Moving Average. 10.2.1.1 The Moving Average in R

Example: – GDP data

10.2.1.2 Testing the Accuracy of the Forecasts

Testing the Accuracy of Forecasts – Backtesting

10.2.1.3 Basic Exponential Smoothing

10.2.1.4 Holt-Winters Exponential Smoothing

Holt Exponential Smoothing

10.2.2 Seasonal Decomposition

Note – Exponential trends

Exponential Models

Question #11

Note

♣11♣ Further Reading

Hint – What if you are stuck?

Further information – CRAN

♣12♣ A Short History of Modern Database Systems

Notes

♣13♣ RDBMS

Notes

♣14♣ SQL. 14.1 Designing the Database

14.2 Building the Database Structure. 14.2.1 Installing a RDBMS

Listing 14.1: Installing MySQL on a Linux computer is easy and straightforward. Here shown for a Debian based system

Further information –MariaDB

Listing 14.2: Starting MySQL as root user. The first line is the command in the CLI, the last line is the MySQL prompt, indicating that we are now in the MySQL shell

Hint – Hardening the database server

Note – Similarities between the R CLI and the MySQL CLI

14.2.2 Creating the Database

Listing 14.3: Create the database in which all tables will be created

Hint – Comments in SQL

14.2.3 Creating the Tables and Relations

Listing 14.4: Starting MySQL, as user “libroot.” Note that this is done from the Linux CLI

Listing 14.5: Create the table tbl_authors

Digression – SQL is not case sensitive

Listing 14.6: This SQL code block creates the tabletbl_booksand then define an index on two of its fields

Listing 14.7: Manage indexes in MySQL

Note – Impossible definitions are possible

Listing 14.8: This SQL code creates the table tbl_genres and then checks if it is really there

Digression – UTF8 collation

Listing 14.9: Checking the structure of the tabletbl_books

14.3 Adding Data to the Database

Listing 14.10: Logging in as user “librarian.”

Listing 14.11: Adding the first author to the database

Note – Providing values for automatically incremented fields

Listing 14.12: An alternative way to add the author to the book by specifying the fields provided

Note –Missing values

Listing 14.13: This SQL code adds all books in one statement

Hint – Input errors

Hint – View the data in a table

Listing 14.14: Add the data to the table tbl_genres

Listing 14.15: Add the data to the table tbl_books

14.4 Querying the Database

Digression – Can a FK be NULL?

14.4.1 The Basic Select Query

Listing 14.16: Some example of SELECT-queries. Note that the output is not shown here, simply because it would be too long

Note –Working with dates in SQL

14.4.2 More Complex Queries

Note – Using NULL

Digression –MySQL specific note

Listing 14.17: This code can be used to show that a manual linking of fields leads to the same records as a left join. Note that the output is not provided

Hint – Automated linking

Note – The difference between right an left joins

Hint – Tidy queries

Further information –MySQL

14.5 Modifying the Database Structure

Digression –More than one solution

Further information

Listing 14.18: This code first creates the table tbl_author_books and then inserts the necessary information that was already into the database also in that table. Finally, it discards the old information

Digression –Matching unknown strings

Note – Composite PKs

Listing 14.19: Finally, we can add our book that has more than one author to our database

Digression – Removing a variable in SQL

14.6 Selected Features of SQL

14.6.1 Changing Data

Listing 14.20: Capitalize all first letter of all full names of authors

14.6.2 Functions in SQL

Listing 14.21: Creating a function and using it in SQL

Listing 14.22: Just a little taste of some additional features in SQL.We encourage you to learnmore about SQL. This piece of code introduces functions, variable, and the UNION-query

Hint – Delimiter in SQL

Hint –Make a backup of the database

Further information –More about SQL

Notes

♣15♣ Connecting R to an SQL Database

Hint – RODBC

Digression –MariaDB

Warning – Batch environment

Hint – Clearing the query cache

Note

PART IV Data Wrangling

Hint – RDBMS and R

Notes

♣16♣ Anonymous Data

Listing 16.1: SQL code for MySQL (or MariaDB) to encrypt using AES256.Note that those relational database systems (RDBMSs) provide much more methods for encryption. It is worth to go through the documentation of your particular system for more support

Further information – Cryptology

Notes

♣17♣ Data Wrangling in the tidyverse

17.1 Importing the Data. 17.1.1 Importing from an SQL RDBMS

17.1.2 Importing Flat Files in the Tidyverse

Further information – readr

17.1.2.1 CSV Files

Note – Separator specific functions

Hint – Check data-type before importing

Question #12 Importing difficult files

17.1.2.2 Making Sense of Fixed-width Files

Note – Automated downloading and decompressing

17.2 Tidy Data

17.3 Tidying Up Data with tidyr

Definition: – Tidy data

Hint – The tidyverse philosophy

17.3.1 Splitting Tables

17.3.2 Convert Headers to Data

17.3.3 Spreading One Column Over Many

Note – spread() and gather()

17.3.4 Split One Columns into Many

Note – Fixed width separation

17.3.5 Merge Multiple Columns Into One

17.3.6 Wrong Data

17.4 SQL-like Functionality via dplyr

Hint –Mix all data wrangling techniques

17.4.1 Selecting Columns

Note – Using pipes

17.4.2 Filtering Rows

Note – Name-space conflict

Hint – Equivalence between dplyr and SQL

17.4.3 Joining

Hint – Sort order

Note – Remove duplicates

Note – Short-cuts

17.4.4 Mutating Data

Hint – Advanced mutating

Warning – Difference between filter() and joins

17.4.5 Set Operations

Note – Column headings in data-frames

17.5 String Manipulation in the tidyverse

Hint – Naming convention of functions

Warning – str_c() does not return the C-string

17.5.1 Basic String Manipulation

Hint – Replacing sub-strings

Duplicate Strings

Manage White Space

Determining Order and Sorting Strings

17.5.2 Pattern Matching with Regular Expressions

17.5.2.1 The Syntax of Regular Expressions

Digression – Advanced email matching

Note – Single or double escape characters

Lazy and Greedy Quantifiers

Other Regex Aspects

Hint – General methods in R

Further information – Regex

Regex for Humans with rex

17.5.2.2 Functions Using Regex

Detect a Match

Locate

Note – Using locating functions as boolean

Replace

Extract

Split strings Using the Match as Separator

Further information about regex

17.6 Dates with lubridate

17.6.1 ISO 8601 Format

Digression – R's internal date-format

Warning – Dates as numbers can be confusing

Hint – Other date formats

Note – Today's date

17.6.2 Time-zones

Hint – Available time-zones

Hint – Create date-time from split data

17.6.3 Extract Date and Time Components

Note – No side effects

17.6.4 Calculating with Date-times

17.6.4.1 Durations

17.6.4.2 Periods

Note – Period functions starting letter

17.6.4.3 Intervals

17.6.4.4 Rounding

17.7 Factors with Forcats

Warning – Unmatched labels

Hint – Find out which factor levels exist

Other functions from forcats

Hint – Use regex

Further information – Plots

Question #13

Notes

♣18♣ Dealing with Missing Data

18.1 Reasons for Data to be Missing

Example: Unclear questions

18.2 Methods to Handle Missing Data. 18.2.1 Alternative Solutions to Missing Data

Example

Note – The randomness in our example

18.2.2 Predictive Mean Matching (PMM)

18.3 R Packages to Deal with Missing Data. 18.3.1 mice

18.3.2 missForest

Hint – Fine-tuning

18.3.3 Hmisc

Further information – The package mi

Notes

♣19♣ Data Binning. 19.1 What is Binning and Why Use It

Further information – Bias

Hint – Automated binning

19.2 Tuning the Binning Procedure

Further information – ggplot2

A Model Without Binning

A Model with Binning

Note – Reasons for Binning

19.3 More Complex Cases: Matrix Binning

Question #14 Binary dependent variables

Question #15 Think outside the box

Question #16 Think deeper

19.4 Weight of Evidence and Information Value

19.4.1 Weight of Evidence (WOE)

19.4.2 Information Value (IV)

19.4.3 WOE and IV in R

Further information – Other packages used

Digression – The function kable()

Question #17

Notes

♣20♣ Factoring Analysis and Principle Components

20.1 Principle Components Analysis (PCA)

Hint – Fine tuning the function princomp()

Hint – Executing PCA before fitting a model

20.2 Factor Analysis

</image>Hint – Customisation of factanal()

Further information –More tools for factor analysis

Note

♣ 21 ♣ Regression Models. 21.1 Linear Regression

Linear Regression

Question #18 – Build a linearmodel

21.2 Multiple Linear Regression

Question #19 – Build amultiple linear regression

21.2.1 Poisson Regression

Definition: Poisson Regression

Function use for glm()

Note – The function glm()

21.2.2 Non-linear Regression

Function use for nls()

Hint – Shorthand notation

21.3 Performance of Regression Models

21.3.1 Mean Square Error (MSE)

Definition: Mean Square Error (MSE)

21.3.2 R-Squared

Definition: R-squared

Further information – About the summary

Question #20 – Find a better model

21.3.3 Mean Average Deviation (MAD)

Definition: Mean average deviation (MAD)

♣ 22 ♣ Classification Models

22.1. Logistic Regression

Definition: – Generalised logistic regression

Logistic Regression. Definition: – Additive logistic regression

Note –Multiple uses for glm()

22.2. Performance of Binary Classification Models

22.2.1 The Confusion Matrix and Related Measures

22.2.2 ROC

22.2.3 The AUC

22.2.4 The Gini Coefficient

22.2.5 Kolmogorov-Smirnov (KS) for Logistic Regression

Further information – code for plots

22.2.6 Finding an Optimal Cut-off

Hint – Backwards compatibility

Notes

♣ 23 ♣ Learning Machines

23.1. Decision Tree

23.1.1 Essential Background. 23.1.1.1 The Linear Additive Decision Tree

Note – Not all sub-sets are possible

23.1.1.2 The CART Method

23.1.1.3 Tree Pruning

23.1.1.4 Classification Trees

Digression – Drawing the misclassification functions

23.1.1.5 Binary Classification Trees

23.1.2 Important Considerations

23.1.2.1 Broadening the Scope

23.1.2.2 Selected Issues

23.1.3 Growing Trees with the Package rpart. 23.1.3.1 Getting Started with the Function rpart()

Function use for rpart()

Further information about rpart

23.1.3.2 Example of a Classification Tree with rpart

23.1.3.3 Visualising a Decision Tree with rpart.plot

23.1.3.4 Example of a Regression Tree with rpart

23.1.4 Evaluating the Performance of a Decision Tree. 23.1.4.1 The Performance of the Regression Tree

23.1.4.2 The Performance of the Classification Tree

Hint – Confusingmatrix ready from a package

23.2. Random Forest

Note – Variations for the random forest

23.3. Artificial Neural Networks (ANNs) 23.3.1 The Basics of ANNs in R

23.3.2 Neural Networks in R

Function use for neuralnet()

23.3.3 The Work-flow to for Fitting a NN

Step 1: Missing Data

Hint – Never delete more data than necessary

Step 2: Split the Data in Test and Training Set

Step 3: Fit a Challenger Model

Step 4: Rescale the Data and Split into Training and Testing Set

Step 5: Train the ANN on the Training Set

Step 6: Test the Model on the Test Data

23.3.4 Cross Validate the NN

Note – Cross validation withoutmodelr

23.4. Support Vector Machine

23.4.1 Fitting a SVM in R

Function use for svm()

Note – Optimisation types

23.4.2 Optimizing the SVM

Further information cross validation

23.5. Unsupervised Learning and Clustering

23.5.1 k-Means Clustering

Warning – Hard problems

23.5.1.1 k-Means Clustering in R

Note – Elegant labels

Warning – Distance and units

Question #21 – k-Means

Hint – Adding Voronoi cell borders

23.5.1.2 PCA before Clustering

Note – Autoplot or native ggplot?

23.5.1.3 On the Relation Between PCA and k-Means

23.5.2 Visualizing Clusters in Three Dimensions

23.5.3 Fuzzy Clustering

23.5.4 Hierarchical Clustering

Note – Scaling data

Further information –Ward’s method

Digression – A nicer dendrogram

23.5.5 Other Clustering Methods

Notes

♣24♣ Towards a Tidy Modelling Cycle with modelr

24.1 Adding Predictions

Function use for add_predictions()

24.2 Adding Residuals

Function use for add_residuals()

24.3 Bootstrapping Data

Function use for bootstrap()

24.4 Other Functions of modelr

♣25♣ Model Validation

25.1 Model Quality Measures

25.2 Predictions and Residuals

25.3 Bootstrapping

25.3.1 Bootstrapping in Base R

Function use for sample()

25.3.2 Bootstrapping in the tidyverse with modelr

25.4 Cross-Validation

25.4.1 Elementary Cross Validation

Note – Which data to use for testing?

Further information – To refit or not to refit?

Hint – Split Size

25.4.2 Monte Carlo Cross Validation

Note – Simple cross validation

Digression – Advanced piping

25.4.3 k-Fold Cross Validation

25.4.4 Comparing Cross Validation Methods

Further information – Hybrid cross validation methods

25.5 Validation in a Broader Perspective

Notes

♣26♣ Labs. 26.1 Financial Analysis with quantmod

26.1.1 The Basics of quantmod

26.1.2 Types of Data Available in quantmod

Hint – Other data sources

26.1.3 Plotting with quantmod

Hint – short-cuts for dates

26.1.4 The quantmod Data Structure

26.1.4.1 Sub-setting by Time and Date

26.1.4.2 Switching Time Scales

26.1.4.3 Apply by Period

26.1.5 Support Functions Supplied by quantmod

26.1.6 Financial Modelling in quantmod. 26.1.6.1 Financial Models in quantmod

26.1.6.2 A Simple Model with quantmod

26.1.6.3 Testing the Model Robustness

Notes

♣27♣ Multi Criteria Decision Analysis (MCDA) 27.1 What and Why

Digression – Corporate change time scales

27.2 General Work‐flow

Step 1: Explore the Big Picture

Step 2: Identify the Problem at Hand

Note – MCDA needs a structured approach

Step 3: Get Data, Construct and Normalise the Decision Matrix

Step 4: Leave Out Unacceptable and Inefficient Alternatives

Step 5: Use a Multi Criteria Decision Method to Get a Ranking

Step 6: Recommend a Solution

Definition: MCDA wording

27.3 Identify the Issue at Hand: Steps 1 and 2

Digression – SWOT Analysis

27.4 Step 3: the Decision Matrix. 27.4.1 Construct a Decision Matrix

27.4.2 Normalize the Decision Matrix

27.5 Step 4: Delete Inefficient and Unacceptable Alternatives. 27.5.1 Unacceptable Alternatives

27.5.2 Dominance – Inefficient Alternatives

Warning – Rescale the decision matrix

27.6 Plotting Preference Relationships

Note – Dots

27.7 Step 5: MCDA Methods

Definition: Efficient Solutions

27.7.1 Examples of Non‐compensatory Methods

27.7.1.1 The MaxMin Method

27.7.1.2 The MaxMax Method

27.7.2 The Weighted Sum Method (WSM)

Digression – Colour schemes

Note – MCDA is not exact science

27.7.3 Weighted Product Method (WPM)

27.7.4 ELECTRE

Definition: Preference of one solution over another

Definition: Anti‐preference of one solution over another

Definition: TheWeighted Degree of Indifference

Note – Should the weights be the same as in WSM?

27.7.4.1 ELECTRE I

Definition: Index of comparability of Type 1

Definition: Index of comparability of Type 2

Definition: Kernel of an MCDA problem

ELECTRE I in R

Digression – Passing on matrices as a list

Note – Transitiveness of preference

Warning – Handle results with care

Conclusion for ELECTRE I

27.7.4.2 ELECTRE II

Note – Alternative

Note – ELECTRE II

27.7.4.3 Conclusions ELECTRE

27.7.5 PROMethEE

The Idea of PROMethEE

Definition: Distance dk(a, b)

Further information – code for plots

27.7.5.1 PROMethEE I. Note – Avoiding compensation

The Preference Relations

PROMethEE I in R

Question #22 – Avoiding the Side effect in the function

Note – Symmetry in the preference functions

Advantages and Disadvantages of PROMethEE I

27.7.5.2 PROMethEE II

27.7.6 PCA (Gaia)

27.7.7 Outranking Methods

27.7.8 Goal Programming

27.8 Summary MCDA

Further information – MCDA

Digression – Step 6

Notes

PART VI Introduction to Companies

♣28♣ Financial Accounting (FA)

28.1 The Statements of Accounts

28.1.1 Income Statement. Definition: Income Statement

28.1.2 Net Income: The P&L statement

Definition: P & L

Definition: NOPAT

28.1.3 Balance Sheet

Definition: Balance Sheet

28.2 The Value Chain

Value Creation

Observation of Value Creation

28.3 Further, Terminology

Definition: Loans

Note –Wording loans an debt

Definition: Equity

Example: Equity

Definition: CapEx

Further information – Capex

Definition: OpEx

Note – Opex

28.4 Selected Financial Ratios

Profit Margin. Definition: ProfitMargin (PM)

Gross Margin. Definition: GrossMargin (GM)

Asset Utilisation. Definition: Asset Utilisation (AU)

Liquid Ratio or Current Ratio

Definition: Liquid Ratio (LR)

Definition: Current Ratio (CR)

Note – CR or LR

Quick Ratio. Definition: Quick ratio (QR)

Operating Assets. Definition: Operating Assets (OA)

Operating Liabilities. Definition: Operating Assets (OL)

Net Operating Assets. Definition: Net Operating Assets (NOA)

Digression – non-operating assets

Working Capital. Definition: Working Capital (WC)

Digression –WC and LR

Total Capital (Employed)

Definition: Total Capital Employed (TC or TCE)

Weighted Average Cost of Capital (WACC)

Definition: Weighted Average Cost of Capital (WACC)

Reinvestment Rate (RIR) Definition: Reinvestment rate (RIR)

Coverage Ratio. Definition: Coverage Ratio (CoverageR)

Definition: Debt-Service Coverage Ratio (DSCR)

Gearing. Definition: Gearing Ratio (GR)

Debt-to-equity ratio. Definition: Debt-to-equity ratio (DE)

Notes

♣29♣ Management Accounting. 29.1 Introduction

29.1.1 Definition of Management Accounting (MA) Definition: Management Accounting (MA)

Definition: Management Information—MI

29.1.2 Management Information Systems (MIS)

Definition: MIS

29.2 Selected Methods in MA

29.2.1 Cost Accounting

Definition: – Cost accounting

Further information – Cost accounting

29.2.2 Selected Cost Types

Direct Costs. Definition: Direct Cost

Marginal Cost. Definition: Marginal Cost

Indirect Cost. Definition: Indirect Cost

Example: A Computer Assembly Facility

Fixed Cost. Definition: Fixed Cost

Variable Cost. Definition: Variable Cost

Example: A Computer Assembly Facility

Overhead Cost. Definition: Overhead Cost

29.3 Selected Use Cases of MA

29.3.1 Balanced Scorecard. Definition: Balanced scorecard (BS)

Example: Diversity dashboard

Third Generation Balanced Scorecard

Further information – Balanced scorecard

29.3.2 Key Performance Indicators (KPIs)

Definition: KPI

Example: Net Promoter Score (NPS) and customer satisfaction

29.3.2.1 Lagging Indicators

Definition: Lagging Indicator

Example: Lagging Indicator

29.3.2.2 Leading Indicators

Leading KPIs. Definition: Leading Indicator

Example: Leading Indicator

Note – KPIs and corporate organisation

29.3.2.3 Selected Useful KPIs

Customer Value Metric (CVM) Definition: Customer ValueMetric (CVM)

Note – The usefullness of past customer profit

Net Promoter Scores (NPS)

Definition: Net Promoter Score (NPS)

Digression NPS

Definition: Net Satisfaction Score (NSS)

Question #23 – Using a scale from 1 to 5 as a number

Notes

♣30♣ Asset Valuation Basics

30.1 Time Value of Money

30.1.1 Interest Basics

Question #24

30.1.2 Specific Interest Rate Concepts

Definition: APR or AER

Question #25 – Impact of monthly fee on the APR

Nominal vs. Real Interest Rates. Definition: Nominal Interest Rate

Definition: Real Interest Rate

Example: Real Interest Rate

30.1.3 Discounting

Example

Note – Element-wise operations in R

30.2 Cash

Definition: Cash

Example: Cash

30.3 Bonds. Definition: Bond

30.3.1 Features of a Bond

Definition: Principal

Definition: Maturity

Definition: Coupon

Definition: Yield

Definition: Credit quality

Definition: Market price

30.3.2 Valuation of Bonds

Digression – Required interest rate

Note – Bond prices change every day, even every second

Example: – Bond value

Example: – Higher rates increase

Question #26 – Calculate a bond value

Question #27 – Lower interest rates

Question #28 – Higher interest rates

30.3.3 Duration

30.3.3.1 Macaulay Duration

Further information – Yield to maturity

30.3.3.2 Modified Duration

Note – First order estimate of price change

Digression – DV01

30.4 The Capital Asset Pricing Model (CAPM)

30.4.1 The CAPM Framework

Example: Company A

Example: Company B

30.4.2 The CAPM and Risk

30.4.3 Limitations and Shortcomings of the CAPM

Further information – CAPM

30.5 Equities

30.5.1 Definition. Definition: Stock, shares and equity

Digression – Local use of definitions

30.5.2 Short History

30.5.3 Valuation of Equities

30.5.4 Absolute Value Models

30.5.4.1 Dividend Discount Model (DDM)

Constant Growth DDM (CGDDM)

Note – The required rate of return

Example: ABCD with g = 0%

Example: ABCD—extreme growth

Relationship between growth rate and ROE

Definition: earnings

Definition: dividend payout ratio

Definition: plow-back ratio (earnings retention ratio)

Definition: Return on Equity

Note

Conclusions for the DDM Method

30.5.4.2 Free Cash Flow (FCF)

Free Cash Flow (FCF) Definition: Free Cash Flow (FCF)

Digression – FCF or FCFF?

Alternative Ways to Calculate FCF

30.5.4.3 Discounted Cash Flow Model

Discounted Cash Flow. Definition: Discounted Cash Flow

Definition: NPV

Advantages and Disadvantages of the DCF method

30.5.4.4 Discounted Abnormal Operating Earnings Model

30.5.4.5 Net Asset Value Method or Cost Method

Definition: Net Asset Value Method

Investment Funds

Advantages and Disadvantages of the NAV Method

30.5.4.6 Excess Earnings Method

30.5.5 Relative Value Models. 30.5.5.1 The Concept of Relative Value Models

Market Value vs Instrinsic Value

Definition: Price or market value

Definition: Value

Definition: Market capitalization

30.5.5.2 The Price Earnings Ratio (PE)

Definition: – Price earnings ratio (PE ratio)

Digression – How bonds and shares are (dis-)similar

30.5.5.3 Pitfalls when using PE Analysis

30.5.5.4 Other Company Value Ratios

Definition: Price-to-book ratio (PTB)

Definition: Price-to-cash-flow ratio (PTCF)

Definition: Price-to-sales ratio (PTS)

Return on Invested Capital (ROIC)

Return on Capital Employed (ROCE)

Return on Equity (ROE)

Note – Difference ROIC and ROE

Economic Value Added (EVA) Definition: Economic Value Added (EVA)

Market Value Added (MVA) Definition: Market Value Added (MVA)

30.5.6 Selection of Valuation Methods

30.5.7 Pitfalls in Company Valuation

30.5.7.1 Forecasting Performance

30.5.7.2 Results and Sensitivity

Stress Testing

Monte Carlo Simulations

Beyond the Monte Carlo Simulation

Definition: Kernel

Conclusion

30.6 Forwards and Futures

Definition: Future

Definition: Forward

Note – Commodities

30.7 Options. 30.7.1 Definitions

Definition: Call Option

Definition: Put Option

Example: buying an option

Note – long and short

30.7.2 Commercial Aspects

Definition: OTC

Digression – The International Swap and Derivatives Organization (ISDA)

30.7.3 Short History

30.7.4 Valuation of Options at Maturity

30.7.4.1 A Long Call at Maturity

30.7.4.2 A Short Call at Maturity

30.7.4.3 Long and Short Put

30.7.4.4 The Put-Call Parity

30.7.5 The Black and Scholes Model

30.7.5.1 Pricing of Options Before Maturity

30.7.5.2 Apply the Black and Scholes Formula

30.7.5.3 The Limits of the Black and Scholes Model

30.7.6 The Binomial Model

Step One

Example: One step binomial pricing model

The Second Step of the Binomial Model

30.7.6.1 Risk Neutral Method

Warning – Memory use and running time

30.7.6.2 The Equivalent Portfolio Binomial Model

Example: – First order binomial model

30.7.6.3 Summary Binomial Model

30.7.7 Dependencies of the Option Price

30.7.7.1 Dependencies in a Long Call Option

30.7.7.2 Dependencies in a Long Put Option

30.7.7.3 Summary of Findings

30.7.8 The Greeks

30.7.9 Delta Hedging

Delta Hedging Example

Warning – Option hedging

30.7.10 Linear Option Strategies

30.7.10.1 Plotting a Portfolio of Options

Digression – Plotting with base R

30.7.10.2 Single Option Strategies

30.7.10.3 Composite Option Strategies

Note – Callspread and putspread

30.7.11 Integrated Option Strategies

30.7.11.1 The Covered Call

30.7.11.2 The Married Put

30.7.11.3 The Collar

30.7.12 Exotic Options

Digression – Investment advice

30.7.13 Capital Protected Structures

Example: capital protected structure

Notes

PART VII Reporting

♣31♣ A Grammar of Graphics with ggplot2

Further information – Extensions

31.1 The Basics of ggplot2

Hint – Themes

Question #29 – Explore data with ggplot

31.2 Over-plotting

Warning – Not plotting data is not including it at all

Note – Selection of the smoothing method

31.3 Case Study for ggplot2

Note – ggplot and the pipe

Question #30 – Predicting days past due

Notes

♣32♣ R Markdown

Hint – RStudio

Further information –More about RMarkdown

Warning – Be careful with specific symbols

Hint – Change document format

Digression – Free book on R Markdown

Digression – Notebook format

Digression – R Bookdown

Note

♣33♣ knitr and LATEX

Note – Short and long code

Further information – LATEX

Digression – RStudio

Notes

♣34♣ An Automated Development Cycle

♣35♣ Writing and Communication Skills

Example: Customer profitability

Hint – Think fromthe audience’s point of view

Hint – How R can help

Note – Two very different slide decks

Note

♣36♣ Interactive Apps

36.1 Shiny

Hint – Online gallery for Shiny

Hint – Run apps in your browser

Hint – See all available examples

Listing 36.1 The html code to include our Shiny app on a live web-page

Further information – Shiny

36.2 Browser Born Data Visualization

36.2.1 HTML-widgets

Further information – Stunning visualisations

36.2.2 Interactive Maps with leaflet

Further information – leaflet

Digression – References for leaflet

36.2.3 Interactive Data Visualisation with ggvis

Hint – Interactive data for data scientists

36.2.3.1 Getting Started in R with ggvis

Further information – ggvis

36.2.3.2 Combining the Power of ggvis and Shiny

36.2.4 googleVis

36.3 Dashboards

36.3.1 The Business Case: a Diversity Dashboard

Note – Special characters in variable names

36.3.2 A Dashboard with flexdashboard

36.3.2.1 A Static Dashboard

Hint – Find the full code of the dashboard

Further information – Code for this dashboard

Further information – More eye candy

Note – Visualising is not via printing the plot

36.3.2.2 Interactive Dashboards with flexdashboard

Further information – flexdashboard

36.3.3 A Dashboard with shinydashboard

Note –Make your website reactive

Notes

♣37♣ Parallel Computing

37.1 Combine foreach and doParallel

Note – foreach

Hint – Expressions

Further information – Alternatives for multi-core processing

37.2 Distribute Calculations over LAN with Snow

Hint – Random numbers

Note – snow overwrites many functions

Hint – Cost efficiency

Hint – Load balancing with clusterApplyLB(

37.3 Using the GPU

Example: Nvidia

Digression – CUDA or OpenCL?

Hint – Nvidia

37.3.1 Getting Started with gpuR

Warning – Use the memory of the GPU

Note –Milage may vary

Hint – Cleaning up

Further information –Machine learning libraries

37.3.2 On the Importance of Memory use

Digression – Deviations in execution time

37.3.3 Conclusions for GPU Programming

Further information – Advanced GPU programming

Hint –Multiple GPUs

Further information – Nvidia’s CUDA

Notes

♣38♣ R and Big Data

Note –Memory limits in R

38.1 Use a Powerful Server. 38.1.1 Use R on a Server

Hint – RStudio

38.1.2 Let the Database Server do the Heavy Lifting

Hint – A general package to talk ODBC

Further information – A nice intro fromRStudio

38.2 Using more Memory than we have RAM

Notes

♣39♣ Parallelism for Big Data

39.1 Apache Hadoop

Further information – Apache Sotware Foundation

39.2 Apache Spark

Further information – Spark

39.2.1 Installing Spark

Warning – Installing Spark

Hint – Installing via the Apache website

Digression – Spark installer

39.2.2 Running Spark

Digression – Scala

Note – Stopping Spark

39.2.3 SparkR

Hint – Adding checkpoints

Note – Our convention

Note – SparkDataFrame or DataFrame

Note – Statements that are not repeated further

39.2.3.1 A User Defined Function on Spark

dapply

Hint – Extracting the schema

dapplyCollect

The Group Apply Variant: gapply

gapplyCollect

spark.lapply

39.2.3.2 Some Other Functions of SparkR

Back and Forth Between R and Spark

Reading CSV files

Changing Columns and Adding New Ones

Aggregation

39.2.3.3 Machine learning with SparkR

Note – Familiar MLliB

Model Persistence and Sampling

Further information – SparkR

39.2.4 sparklyr

Note – Short intro to sparklyr

Hint – Use SQL fromR on RDD

Spark_apply

Machine Learning

Further information – sparklyr

39.2.5 SparkR or sparklyr

Both tools

Reasons to choose SparkR

Reasons to choose sparklyr

Notes

♣40♣ The Need for Speed

40.1 Benchmarking

Digression – Expressions and curly brackets

40.2 Optimize Code

40.2.1 Avoid Repeating the Same

40.2.2 Use Vectorisation where Appropriate

Note – Overhead created by the for-loops

40.2.3 Pre-allocating Memory

40.2.4 Use the Fastest Function

40.2.5 Use the Fastest Package

40.2.6 Be Mindful about Details

Note – Cold code ahead

40.2.7 Compile Functions

40.2.8 Use C or C++ Code in R

Digression – Background operation

Further information – Rcpp

Digression – Efficient R

40.2.9 Using a C++ Source File in R

Hint – R code within the C++ code within R

Warning – C++ function not saved in .RData

Further information – Using C in R

40.2.10 Call Compiled C++ Functions in R

Further information – Calling C functions

Digression – R in C++

40.3 Profiling Code

Note – Cold code ahead

40.3.1 The Package profr

40.3.2 The Package proftools

Hint – Delete file

40.4 Optimize Your Computer

Warning – Be nice

Digression – Batch R code

Notes

♣A♣ Create your own R Package

Further information – More on creating packages

A.1 Creating the Package in the R Console

Digression – Creating packages in RStudio

A.2 Update the Package Description

A.3 Documenting the Functions

A.4 Loading the Package

A.5 Further Steps

Further information – Further reading

Notes

♣B♣ Levels of Measurement. B.1 Nominal Scale

B.2 Ordinal Scale

B.3 Interval Scale

B.4 Ratio Scale

♣C♣ Trademark Notices

C.1 General Trademark Notices

C.2 R-Related Notices. C.2.1 Crediting Developers of R Packages

C.2.2 The R-packages used in this Book

♣D♣ Code Not Shown in the Body of the Book

♣E♣ Answers to Selected Questions

Note

Bibliography

Nomenclature

Index

WILEY END USER LICENSE AGREEMENT

Отрывок из книги

Philippe J.S. De Brouwer

The student who graduated froma science, technology, engineering ormathematics or similar program will find that this book helps to make a successful step from the academic world into a any private or governmental company.

.....

data_test$N ## [1] Piotr Pawel Paula Lisa Laura ## Levels: Laura Lisa Paula Pawel Piotr

Use “short-cuts” sparingly and only when working interactively (not in functions or code that will be saved and re-run later). When later another column is added the short-cut will no longer be unique and behaviour is hard to predict and it is even harder to spot the programming error in a part of your code that previously worked fine.

.....

Добавление нового отзыва

Комментарий Поле, отмеченное звёздочкой  — обязательно к заполнению

Отзывы и комментарии читателей

Нет рецензий. Будьте первым, кто напишет рецензию на книгу The Big R-Book
Подняться наверх