Data Science in Theory and Practice
Реклама. ООО «ЛитРес», ИНН: 7719571260.
Оглавление
Maria Cristina Mariani. Data Science in Theory and Practice
Table of Contents
List of Tables
List of Illustrations
Guide
Pages
Data Science in Theory and Practice. Techniques for Big Data Analytics and Complex Data Sets
List of Figures
List of Tables
Preface
1 Background of Data Science. 1.1 Introduction
1.2 Origin of Data Science
1.3 Who is a Data Scientist?
1.4 Big Data
1.4.1 Characteristics of Big Data
1.4.2 Big Data Architectures
2 Matrix Algebra and Random Vectors. 2.1 Introduction
2.2 Some Basics of Matrix Algebra. 2.2.1 Vectors
2.2.2 Matrices
Theorem 2.1
2.3 Random Variables and Distribution Functions
2.3.1 The Dirichlet Distribution
2.3.2 Multinomial Distribution
2.3.3 Multivariate Normal Distribution
2.4 Problems
3 Multivariate Analysis. 3.1 Introduction
3.2 Multivariate Analysis: Overview
3.3 Mean Vectors
3.4 Variance–Covariance Matrices
3.5 Correlation Matrices
3.6 Linear Combinations of Variables
3.6.1 Linear Combinations of Sample Means
3.6.2 Linear Combinations of Sample Variance and Covariance
3.6.3 Linear Combinations of Sample Correlation
3.7 Problems
4 Time Series Forecasting. 4.1 Introduction
4.2 Terminologies
4.3 Components of Time Series
4.3.1 Seasonal
4.3.2 Trend
4.3.3 Cyclical
4.3.4 Random
4.4 Transformations to Achieve Stationarity
4.5 Elimination of Seasonality via Differencing
4.6 Additive and Multiplicative Models
4.7 Measuring Accuracy of Different Time Series Techniques
4.7.1 Mean Absolute Deviation
4.7.2 Mean Absolute Percent Error
4.7.3 Mean Square Error
4.7.4 Root Mean Square Error
4.8 Averaging and Exponential Smoothing Forecasting Methods
4.8.1 Averaging Methods
4.8.1.1 Simple Moving Averages
4.8.1.2 Weighted Moving Averages
4.8.2 Exponential Smoothing Methods
4.8.2.1 Simple Exponential Smoothing
4.8.2.2 Adjusted Exponential Smoothing
4.9 Problems
5 Introduction to R. 5.1 Introduction
5.2 Basic Data Types
5.2.1 Numeric Data Type
5.2.2 Integer Data Type
5.2.3 Character
5.2.4 Complex Data Types
5.2.5 Logical Data Types
5.3 Simple Manipulations – Numbers and Vectors. 5.3.1 Vectors and Assignment
5.3.2 Vector Arithmetic
5.3.3 Vector Index
5.3.4 Logical Vectors
5.3.5 Missing Values
5.3.6 Index Vectors
5.3.6.1 Indexing with Logicals
5.3.6.2 A Vector of Positive Integral Quantities
5.3.6.3 A Vector of Negative Integral Quantities
5.3.6.4 Named Indexing
5.3.7 Other Types of Objects
5.3.7.1 Matrices
5.3.7.2 List
5.3.7.3 Factor
5.3.7.4 Data Frames
5.3.8 Data Import
5.3.8.1 Excel File
5.3.8.2 CSV File
5.3.8.3 Table File
5.3.8.4 Minitab File
5.3.8.5 SPSS File
5.4 Problems
6 Introduction to Python. 6.1 Introduction
6.2 Basic Data Types
6.2.1 Number Data Type
6.2.1.1 Integer
6.2.1.2 Floating‐Point Numbers
6.2.1.3 Complex Numbers
6.2.2 Strings
6.2.3 Lists
6.2.4 Tuples
6.2.5 Dictionaries
6.3 Number Type Conversion
6.4 Python Conditions
6.4.1 If Statements
6.4.2 The Else and Elif Clauses
6.4.3 The While Loop
6.4.3.1 The Break Statement
6.4.3.2 The Continue Statement
6.4.4 For Loops
6.4.4.1 Nested Loops
6.5 Python File Handling: Open, Read, and Close
6.6 Python Functions
6.6.1 Calling a Function in Python
6.6.2 Scope and Lifetime of Variables
6.7 Problems
7 Algorithms. 7.1 Introduction
7.2 Algorithm – Definition
7.3 How to Write an Algorithm
7.3.1 Algorithm Analysis
7.3.2 Algorithm Complexity
7.3.3 Space Complexity
7.3.4 Time Complexity
7.4 Asymptotic Analysis of an Algorithm
7.4.1 Asymptotic Notations
7.4.1.1 Big O Notation
7.4.1.2 The Omega Notation,
7.4.1.3 The Notation
7.5 Examples of Algorithms
7.6 Flowchart
7.7 Problems
8 Data Preprocessing and Data Validations. 8.1 Introduction
8.2 Definition – Data Preprocessing
8.3 Data Cleaning
8.3.1 Handling Missing Data
8.3.2 Types of Missing Data
8.3.2.1 Missing Completely at Random
8.3.2.2 Missing at Random
8.3.2.3 Missing Not at Random
8.3.3 Techniques for Handling the Missing Data
8.3.3.1 Listwise Deletion
8.3.3.2 Pairwise Deletion
8.3.3.3 Mean Substitution
8.3.3.4 Regression Imputation
8.3.3.5 Multiple Imputation
8.3.4 Identifying Outliers and Noisy Data
8.3.4.1 Binning
8.3.4.2 Box and Whisker plot
8.4 Data Transformations
8.4.1 Min–Max Normalization
8.4.2 ‐score Normalization
8.5 Data Reduction
8.6 Data Validations
8.6.1 Methods for Data Validation
8.6.1.1 Simple Statistical Criterion
8.6.1.2 Fourier Series Modeling and SSC
8.6.1.3 Principal Component Analysis and SSC
8.7 Problems
9 Data Visualizations. 9.1 Introduction
9.2 Definition – Data Visualization
9.2.1 Scientific Visualization
9.2.2 Information Visualization
9.2.3 Visual Analytics
9.3 Data Visualization Techniques
9.3.1 Time Series Data
9.3.2 Statistical Distributions
9.3.2.1 Stem‐and‐Leaf Plots
9.3.2.2 Q–Q Plots
9.4 Data Visualization Tools
9.4.1 Tableau
9.4.2 Infogram
9.4.3 Google Charts
9.5 Problems
10 Binomial and Trinomial Trees. 10.1 Introduction
10.2 The Binomial Tree Method
10.2.1 One Step Binomial Tree
Remarks
10.2.2 Using the Tree to Price a European Option
10.2.3 Using the Tree to Price an American Option
10.2.4 Using the Tree to Price Any Path Dependent Option
10.3 Binomial Discrete Model
10.3.1 One‐Step Method
10.3.2 Multi‐step Method
10.3.2.1 Example: European Call Option
10.4 Trinomial Tree Method
10.4.1 What is the Meaning of Little o and Big O?
10.5 Problems
11 Principal Component Analysis. 11.1 Introduction
11.2 Background of Principal Component Analysis
11.3 Motivation
11.3.1 Correlation and Redundancy
11.3.2 Visualization
11.4 The Mathematics of PCA
11.4.1 The Eigenvalues and Eigenvectors
11.5 How PCA Works
11.5.1 Algorithm
11.6 Application
11.7 Problems
12 Discriminant and Cluster Analysis. 12.1 Introduction
12.2 Distance
12.3 Discriminant Analysis
12.3.1 Kullback–Leibler Divergence
12.3.2 Chernoff Distance
12.3.3 Application – Seismic Time Series
12.3.4 Application – Financial Time Series
12.4 Cluster Analysis
12.4.1 Partitioning Algorithms
12.4.2 ‐Means Algorithm
12.4.3 ‐Medoids Algorithm
12.4.4 Application – Seismic Time Series
12.4.5 Application – Financial Time Series
12.5 Problems
13 Multidimensional Scaling. 13.1 Introduction
13.2 Motivation
13.3 Number of Dimensions and Goodness of Fit
13.4 Proximity Measures
13.5 Metric Multidimensional Scaling
13.5.1 The Classical Solution
13.6 Nonmetric Multidimensional Scaling
13.6.1 Shepard–Kruskal Algorithm
13.7 Problems
14 Classification and Tree‐Based Methods. 14.1 Introduction
14.2 An Overview of Classification
14.2.1 The Classification Problem
14.2.2 Logistic Regression Model
14.2.2.1 Regularization
14.2.2.2 Regularization
14.3 Linear Discriminant Analysis
14.3.1 Optimal Classification and Estimation of Gaussian Distribution
14.4 Tree‐Based Methods
14.4.1 One Single Decision Tree
14.4.2 Random Forest
14.5 Applications
14.6 Problems
15 Association Rules. 15.1 Introduction
15.2 Market Basket Analysis
15.3 Terminologies
15.3.1 Itemset and Support Count
15.3.2 Frequent Itemset
15.3.3 Closed Frequent Itemset
15.3.4 Maximal Frequent Itemset
15.3.5 Association Rule
15.3.6 Rule Evaluation Metrics
15.4 The Apriori Algorithm
15.4.1 An example of the Apriori Algorithm
15.5 Applications
15.5.1 Confidence
15.5.2 Lift
15.5.3 Conviction
15.6 Problems
16 Support Vector Machines. 16.1 Introduction
16.2 The Maximal Margin Classifier
16.3 Classification Using a Separating Hyperplane
16.4 Kernel Functions
16.5 Applications
16.6 Problems
17 Neural Networks. 17.1 Introduction
17.2 Perceptrons
17.3 Feed Forward Neural Network
17.4 Recurrent Neural Networks
17.5 Long Short‐Term Memory
17.5.1 Residual Connections
17.5.2 Loss Functions
17.5.3 Stochastic Gradient Descent
17.5.4 Regularization – Ensemble Learning
17.6 Application
17.6.1 Emergent and Developed Market
17.6.2 The Lehman Brothers Collapse
17.6.3 Methodology
17.6.4 Analyses of Data
17.6.4.1 Results of the Emergent Market Index
17.6.4.2 Results of the Developed Market Index
17.7 Significance of Study
17.8 Problems
18 Fourier Analysis. 18.1 Introduction
18.2 Definition
18.3 Discrete Fourier Transform
18.4 The Fast Fourier Transform (FFT) Method
18.5 Dynamic Fourier Analysis
18.5.1 Tapering
18.5.2 Daniell Kernel Estimation
18.6 Applications of the Fourier Transform
18.6.1 Modeling Power Spectrum of Financial Returns Using Fourier Transforms
18.6.2 Image Compression
18.7 Problems
19 Wavelets Analysis. 19.1 Introduction
19.1.1 Wavelets Transform
19.2 Discrete Wavelets Transforms
19.2.1 Haar Wavelets
19.2.1.1 Haar Functions
19.2.1.2 Haar Transform Matrix
19.2.2 Daubechies Wavelets
19.3 Applications of the Wavelets Transform
19.3.1 Discriminating Between Mining Explosions and Cluster of Earthquakes
19.3.1.1 Background of Data
19.3.1.2 Results
19.3.2 Finance
19.3.3 Damage Detection in Frame Structures
19.3.4 Image Compression
19.3.5 Seismic Signals
19.4 Problems
20 Stochastic Analysis. 20.1 Introduction
20.2 Necessary Definitions from Probability Theory
20.3 Stochastic Processes
20.3.1 The Index Set
20.3.2 The State Space
20.3.3 Stationary and Independent Components
20.3.4 Stationary and Independent Increments
20.3.5 Filtration and Standard Filtration
20.4 Examples of Stochastic Processes
20.4.1 Markov Chains
20.4.1.1 Examples of Markov Processes. Forecasting the Weather
A Random Walk Model
The Gambling Model
20.4.1.2 The Chapman–Kolmogorov Equation
20.4.1.3 Classification of States
Definition 20.19
Example 20.1
20.4.1.4 Limiting Probabilities
Definition 20.21
20.4.1.5 Branching Processes
20.4.1.6 Time Homogeneous Chains
20.4.2 Martingales
20.4.3 Simple Random Walk
20.4.4 The Brownian Motion (Wiener Process)
20.5 Measurable Functions and Expectations
20.5.1 Radon–Nikodym Theorem and Conditional Expectation
Remarks 20.10
20.6 Problems
21 Fractal Analysis – Lévy, Hurst, DFA, DEA. 21.1 Introduction and Definitions
21.2 Lévy Processes
21.2.1 Examples of Lévy Processes
21.2.1.1 The Poisson Process (Jumps)
21.2.1.2 The Compound Poisson Process
21.2.1.3 Inverse Gaussian (IG) Process
21.2.1.4 The Gamma Process
21.2.2 Exponential Lévy Models
21.2.3 Subordination of Lévy Processes
21.2.4 Stable Distributions
21.3 Lévy Flight Models
21.4 Rescaled Range Analysis (Hurst Analysis)
21.5 Detrended Fluctuation Analysis (DFA)
21.6 Diffusion Entropy Analysis (DEA)
21.6.1 Estimation Procedure
21.6.1.1 The Shannon Entropy
21.6.2 The – Relationship for the Truncated Lévy Flight
Remarks 21.2
21.7 Application – Characterization of Volcanic Time Series
21.7.1 Background of Volcanic Data
21.7.2 Results
21.8 Problems
22 Stochastic Differential Equations. 22.1 Introduction
22.2 Stochastic Differential Equations
22.2.1 Solution Methods of SDEs
22.3 Examples
22.3.1 Modeling Asset Prices
22.3.2 Modeling Magnitude of Earthquake Series
22.4 Multidimensional Stochastic Differential Equations
22.4.1 The multidimensional Ornstein–Uhlenbeck Processes
22.4.2 Solution of the Ornstein–Uhlenbeck Process
22.5 Simulation of Stochastic Differential Equations
22.5.1 Euler–Maruyama Scheme for Approximating Stochastic Differential Equations
22.5.2 Euler–Milstein Scheme for Approximating Stochastic Differential Equations
22.6 Problems
23 Ethics: With Great Power Comes Great Responsibility. 23.1 Introduction
23.2 Data Science Ethical Principles
23.2.1 Enhance Value in Society
23.2.2 Avoiding Harm
23.2.3 Professional Competence
23.2.4 Increasing Trustworthiness
23.2.5 Maintaining Accountability and Oversight
23.3 Data Science Code of Professional Conduct
23.4 Application
23.4.1 Project Planning
23.4.2 Data Preprocessing
23.4.3 Data Management
23.4.4 Analysis and Development
23.5 Problems
Bibliography
Index. a
b
c
d
e
f
g
h
i
k
l
m
n
o
p
q
r
s
t
u
v
w
z
WILEY END USER LICENSE AGREEMENT
Отрывок из книги
Maria Cristina Mariani University of Texas, El Paso El Paso, United States
Osei Kofi Tweneboah Ramapo College of New Jersey Mahwah, United States
.....
Figure 18.2 3D power spectra of the returns (generated per minute) from the four analyzed stock companies. (a) Discover. (b) Microsoft. (c) Walmart. (d) JPM Chase.
Figure 19.1 Time‐frequency image of explosion 1 recorded by ANMO (Table 19.2).
.....