Data Science in Theory and Practice

Data Science in Theory and Practice
Автор книги: id книги: 2170469     Оценка: 0.0     Голосов: 0     Отзывы, комментарии: 0 12561,2 руб.     (122,46$) Читать книгу Купить и скачать книгу Электронная книга Жанр: Математика Правообладатель и/или издательство: John Wiley & Sons Limited Дата добавления в каталог КнигаЛит: ISBN: 9781119674733 Скачать фрагмент в формате   fb2   fb2.zip Возрастное ограничение: 0+ Оглавление Отрывок из книги

Реклама. ООО «ЛитРес», ИНН: 7719571260.

Описание книги

DATA SCIENCE IN THEORY AND PRACTICE[/b] EXPLORE THE FOUNDATIONS OF DATA SCIENCE WITH THIS INSIGHTFUL NEW RESOURCE Data Science in Theory and Practice delivers a comprehensive treatment of the mathematical and statistical models useful for analyzing data sets arising in various disciplines, like banking, finance, health care, bioinformatics, security, education, and social services. Written in five parts, the book examines some of the most commonly used and fundamental mathematical and statistical concepts that form the basis of data science. The authors go on to analyze various data transformation techniques useful for extracting information from raw data, long memory behavior, and predictive modeling. The book offers readers a multitude of topics all relevant to the analysis of complex data sets. Along with a robust exploration of the theory underpinning data science, it contains numerous applications to specific and practical problems. The book also provides examples of code algorithms in R and Python and provides pseudo-algorithms to port the code to any other language. Ideal for students and practitioners without a strong background in data science, readers will also learn from topics like: Analyses of foundational theoretical subjects, including the history of data science, matrix algebra and random vectors, and multivariate analysis A comprehensive examination of time series forecasting, including the different components of time series and transformations to achieve stationarity Introductions to both the R and Python programming languages, including basic data types and sample manipulations for both languages An exploration of algorithms, including how to write one and how to perform an asymptotic analysis A comprehensive discussion of several techniques for analyzing and predicting complex data sets Perfect for advanced undergraduate and graduate students in Data Science, Business Analytics, and Statistics programs, Data Science in Theory and Practice will also earn a place in the libraries of practicing data scientists, data and business analysts, and statisticians in the private sector, government, and academia.

Оглавление

Maria Cristina Mariani. Data Science in Theory and Practice

Table of Contents

List of Tables

List of Illustrations

Guide

Pages

Data Science in Theory and Practice. Techniques for Big Data Analytics and Complex Data Sets

List of Figures

List of Tables

Preface

1 Background of Data Science. 1.1 Introduction

1.2 Origin of Data Science

1.3 Who is a Data Scientist?

1.4 Big Data

1.4.1 Characteristics of Big Data

1.4.2 Big Data Architectures

2 Matrix Algebra and Random Vectors. 2.1 Introduction

2.2 Some Basics of Matrix Algebra. 2.2.1 Vectors

2.2.2 Matrices

Theorem 2.1

2.3 Random Variables and Distribution Functions

2.3.1 The Dirichlet Distribution

2.3.2 Multinomial Distribution

2.3.3 Multivariate Normal Distribution

2.4 Problems

3 Multivariate Analysis. 3.1 Introduction

3.2 Multivariate Analysis: Overview

3.3 Mean Vectors

3.4 Variance–Covariance Matrices

3.5 Correlation Matrices

3.6 Linear Combinations of Variables

3.6.1 Linear Combinations of Sample Means

3.6.2 Linear Combinations of Sample Variance and Covariance

3.6.3 Linear Combinations of Sample Correlation

3.7 Problems

4 Time Series Forecasting. 4.1 Introduction

4.2 Terminologies

4.3 Components of Time Series

4.3.1 Seasonal

4.3.2 Trend

4.3.3 Cyclical

4.3.4 Random

4.4 Transformations to Achieve Stationarity

4.5 Elimination of Seasonality via Differencing

4.6 Additive and Multiplicative Models

4.7 Measuring Accuracy of Different Time Series Techniques

4.7.1 Mean Absolute Deviation

4.7.2 Mean Absolute Percent Error

4.7.3 Mean Square Error

4.7.4 Root Mean Square Error

4.8 Averaging and Exponential Smoothing Forecasting Methods

4.8.1 Averaging Methods

4.8.1.1 Simple Moving Averages

4.8.1.2 Weighted Moving Averages

4.8.2 Exponential Smoothing Methods

4.8.2.1 Simple Exponential Smoothing

4.8.2.2 Adjusted Exponential Smoothing

4.9 Problems

5 Introduction to R. 5.1 Introduction

5.2 Basic Data Types

5.2.1 Numeric Data Type

5.2.2 Integer Data Type

5.2.3 Character

5.2.4 Complex Data Types

5.2.5 Logical Data Types

5.3 Simple Manipulations – Numbers and Vectors. 5.3.1 Vectors and Assignment

5.3.2 Vector Arithmetic

5.3.3 Vector Index

5.3.4 Logical Vectors

5.3.5 Missing Values

5.3.6 Index Vectors

5.3.6.1 Indexing with Logicals

5.3.6.2 A Vector of Positive Integral Quantities

5.3.6.3 A Vector of Negative Integral Quantities

5.3.6.4 Named Indexing

5.3.7 Other Types of Objects

5.3.7.1 Matrices

5.3.7.2 List

5.3.7.3 Factor

5.3.7.4 Data Frames

5.3.8 Data Import

5.3.8.1 Excel File

5.3.8.2 CSV File

5.3.8.3 Table File

5.3.8.4 Minitab File

5.3.8.5 SPSS File

5.4 Problems

6 Introduction to Python. 6.1 Introduction

6.2 Basic Data Types

6.2.1 Number Data Type

6.2.1.1 Integer

6.2.1.2 Floating‐Point Numbers

6.2.1.3 Complex Numbers

6.2.2 Strings

6.2.3 Lists

6.2.4 Tuples

6.2.5 Dictionaries

6.3 Number Type Conversion

6.4 Python Conditions

6.4.1 If Statements

6.4.2 The Else and Elif Clauses

6.4.3 The While Loop

6.4.3.1 The Break Statement

6.4.3.2 The Continue Statement

6.4.4 For Loops

6.4.4.1 Nested Loops

6.5 Python File Handling: Open, Read, and Close

6.6 Python Functions

6.6.1 Calling a Function in Python

6.6.2 Scope and Lifetime of Variables

6.7 Problems

7 Algorithms. 7.1 Introduction

7.2 Algorithm – Definition

7.3 How to Write an Algorithm

7.3.1 Algorithm Analysis

7.3.2 Algorithm Complexity

7.3.3 Space Complexity

7.3.4 Time Complexity

7.4 Asymptotic Analysis of an Algorithm

7.4.1 Asymptotic Notations

7.4.1.1 Big O Notation

7.4.1.2 The Omega Notation,

7.4.1.3 The Notation

7.5 Examples of Algorithms

7.6 Flowchart

7.7 Problems

8 Data Preprocessing and Data Validations. 8.1 Introduction

8.2 Definition – Data Preprocessing

8.3 Data Cleaning

8.3.1 Handling Missing Data

8.3.2 Types of Missing Data

8.3.2.1 Missing Completely at Random

8.3.2.2 Missing at Random

8.3.2.3 Missing Not at Random

8.3.3 Techniques for Handling the Missing Data

8.3.3.1 Listwise Deletion

8.3.3.2 Pairwise Deletion

8.3.3.3 Mean Substitution

8.3.3.4 Regression Imputation

8.3.3.5 Multiple Imputation

8.3.4 Identifying Outliers and Noisy Data

8.3.4.1 Binning

8.3.4.2 Box and Whisker plot

8.4 Data Transformations

8.4.1 Min–Max Normalization

8.4.2 ‐score Normalization

8.5 Data Reduction

8.6 Data Validations

8.6.1 Methods for Data Validation

8.6.1.1 Simple Statistical Criterion

8.6.1.2 Fourier Series Modeling and SSC

8.6.1.3 Principal Component Analysis and SSC

8.7 Problems

9 Data Visualizations. 9.1 Introduction

9.2 Definition – Data Visualization

9.2.1 Scientific Visualization

9.2.2 Information Visualization

9.2.3 Visual Analytics

9.3 Data Visualization Techniques

9.3.1 Time Series Data

9.3.2 Statistical Distributions

9.3.2.1 Stem‐and‐Leaf Plots

9.3.2.2 Q–Q Plots

9.4 Data Visualization Tools

9.4.1 Tableau

9.4.2 Infogram

9.4.3 Google Charts

9.5 Problems

10 Binomial and Trinomial Trees. 10.1 Introduction

10.2 The Binomial Tree Method

10.2.1 One Step Binomial Tree

Remarks

10.2.2 Using the Tree to Price a European Option

10.2.3 Using the Tree to Price an American Option

10.2.4 Using the Tree to Price Any Path Dependent Option

10.3 Binomial Discrete Model

10.3.1 One‐Step Method

10.3.2 Multi‐step Method

10.3.2.1 Example: European Call Option

10.4 Trinomial Tree Method

10.4.1 What is the Meaning of Little o and Big O?

10.5 Problems

11 Principal Component Analysis. 11.1 Introduction

11.2 Background of Principal Component Analysis

11.3 Motivation

11.3.1 Correlation and Redundancy

11.3.2 Visualization

11.4 The Mathematics of PCA

11.4.1 The Eigenvalues and Eigenvectors

11.5 How PCA Works

11.5.1 Algorithm

11.6 Application

11.7 Problems

12 Discriminant and Cluster Analysis. 12.1 Introduction

12.2 Distance

12.3 Discriminant Analysis

12.3.1 Kullback–Leibler Divergence

12.3.2 Chernoff Distance

12.3.3 Application – Seismic Time Series

12.3.4 Application – Financial Time Series

12.4 Cluster Analysis

12.4.1 Partitioning Algorithms

12.4.2 ‐Means Algorithm

12.4.3 ‐Medoids Algorithm

12.4.4 Application – Seismic Time Series

12.4.5 Application – Financial Time Series

12.5 Problems

13 Multidimensional Scaling. 13.1 Introduction

13.2 Motivation

13.3 Number of Dimensions and Goodness of Fit

13.4 Proximity Measures

13.5 Metric Multidimensional Scaling

13.5.1 The Classical Solution

13.6 Nonmetric Multidimensional Scaling

13.6.1 Shepard–Kruskal Algorithm

13.7 Problems

14 Classification and Tree‐Based Methods. 14.1 Introduction

14.2 An Overview of Classification

14.2.1 The Classification Problem

14.2.2 Logistic Regression Model

14.2.2.1 Regularization

14.2.2.2 Regularization

14.3 Linear Discriminant Analysis

14.3.1 Optimal Classification and Estimation of Gaussian Distribution

14.4 Tree‐Based Methods

14.4.1 One Single Decision Tree

14.4.2 Random Forest

14.5 Applications

14.6 Problems

15 Association Rules. 15.1 Introduction

15.2 Market Basket Analysis

15.3 Terminologies

15.3.1 Itemset and Support Count

15.3.2 Frequent Itemset

15.3.3 Closed Frequent Itemset

15.3.4 Maximal Frequent Itemset

15.3.5 Association Rule

15.3.6 Rule Evaluation Metrics

15.4 The Apriori Algorithm

15.4.1 An example of the Apriori Algorithm

15.5 Applications

15.5.1 Confidence

15.5.2 Lift

15.5.3 Conviction

15.6 Problems

16 Support Vector Machines. 16.1 Introduction

16.2 The Maximal Margin Classifier

16.3 Classification Using a Separating Hyperplane

16.4 Kernel Functions

16.5 Applications

16.6 Problems

17 Neural Networks. 17.1 Introduction

17.2 Perceptrons

17.3 Feed Forward Neural Network

17.4 Recurrent Neural Networks

17.5 Long Short‐Term Memory

17.5.1 Residual Connections

17.5.2 Loss Functions

17.5.3 Stochastic Gradient Descent

17.5.4 Regularization – Ensemble Learning

17.6 Application

17.6.1 Emergent and Developed Market

17.6.2 The Lehman Brothers Collapse

17.6.3 Methodology

17.6.4 Analyses of Data

17.6.4.1 Results of the Emergent Market Index

17.6.4.2 Results of the Developed Market Index

17.7 Significance of Study

17.8 Problems

18 Fourier Analysis. 18.1 Introduction

18.2 Definition

18.3 Discrete Fourier Transform

18.4 The Fast Fourier Transform (FFT) Method

18.5 Dynamic Fourier Analysis

18.5.1 Tapering

18.5.2 Daniell Kernel Estimation

18.6 Applications of the Fourier Transform

18.6.1 Modeling Power Spectrum of Financial Returns Using Fourier Transforms

18.6.2 Image Compression

18.7 Problems

19 Wavelets Analysis. 19.1 Introduction

19.1.1 Wavelets Transform

19.2 Discrete Wavelets Transforms

19.2.1 Haar Wavelets

19.2.1.1 Haar Functions

19.2.1.2 Haar Transform Matrix

19.2.2 Daubechies Wavelets

19.3 Applications of the Wavelets Transform

19.3.1 Discriminating Between Mining Explosions and Cluster of Earthquakes

19.3.1.1 Background of Data

19.3.1.2 Results

19.3.2 Finance

19.3.3 Damage Detection in Frame Structures

19.3.4 Image Compression

19.3.5 Seismic Signals

19.4 Problems

20 Stochastic Analysis. 20.1 Introduction

20.2 Necessary Definitions from Probability Theory

20.3 Stochastic Processes

20.3.1 The Index Set

20.3.2 The State Space

20.3.3 Stationary and Independent Components

20.3.4 Stationary and Independent Increments

20.3.5 Filtration and Standard Filtration

20.4 Examples of Stochastic Processes

20.4.1 Markov Chains

20.4.1.1 Examples of Markov Processes. Forecasting the Weather

A Random Walk Model

The Gambling Model

20.4.1.2 The Chapman–Kolmogorov Equation

20.4.1.3 Classification of States

Definition 20.19

Example 20.1

20.4.1.4 Limiting Probabilities

Definition 20.21

20.4.1.5 Branching Processes

20.4.1.6 Time Homogeneous Chains

20.4.2 Martingales

20.4.3 Simple Random Walk

20.4.4 The Brownian Motion (Wiener Process)

20.5 Measurable Functions and Expectations

20.5.1 Radon–Nikodym Theorem and Conditional Expectation

Remarks 20.10

20.6 Problems

21 Fractal Analysis – Lévy, Hurst, DFA, DEA. 21.1 Introduction and Definitions

21.2 Lévy Processes

21.2.1 Examples of Lévy Processes

21.2.1.1 The Poisson Process (Jumps)

21.2.1.2 The Compound Poisson Process

21.2.1.3 Inverse Gaussian (IG) Process

21.2.1.4 The Gamma Process

21.2.2 Exponential Lévy Models

21.2.3 Subordination of Lévy Processes

21.2.4 Stable Distributions

21.3 Lévy Flight Models

21.4 Rescaled Range Analysis (Hurst Analysis)

21.5 Detrended Fluctuation Analysis (DFA)

21.6 Diffusion Entropy Analysis (DEA)

21.6.1 Estimation Procedure

21.6.1.1 The Shannon Entropy

21.6.2 The – Relationship for the Truncated Lévy Flight

Remarks 21.2

21.7 Application – Characterization of Volcanic Time Series

21.7.1 Background of Volcanic Data

21.7.2 Results

21.8 Problems

22 Stochastic Differential Equations. 22.1 Introduction

22.2 Stochastic Differential Equations

22.2.1 Solution Methods of SDEs

22.3 Examples

22.3.1 Modeling Asset Prices

22.3.2 Modeling Magnitude of Earthquake Series

22.4 Multidimensional Stochastic Differential Equations

22.4.1 The multidimensional Ornstein–Uhlenbeck Processes

22.4.2 Solution of the Ornstein–Uhlenbeck Process

22.5 Simulation of Stochastic Differential Equations

22.5.1 Euler–Maruyama Scheme for Approximating Stochastic Differential Equations

22.5.2 Euler–Milstein Scheme for Approximating Stochastic Differential Equations

22.6 Problems

23 Ethics: With Great Power Comes Great Responsibility. 23.1 Introduction

23.2 Data Science Ethical Principles

23.2.1 Enhance Value in Society

23.2.2 Avoiding Harm

23.2.3 Professional Competence

23.2.4 Increasing Trustworthiness

23.2.5 Maintaining Accountability and Oversight

23.3 Data Science Code of Professional Conduct

23.4 Application

23.4.1 Project Planning

23.4.2 Data Preprocessing

23.4.3 Data Management

23.4.4 Analysis and Development

23.5 Problems

Bibliography

Index. a

b

c

d

e

f

g

h

i

k

l

m

n

o

p

q

r

s

t

u

v

w

z

WILEY END USER LICENSE AGREEMENT

Отрывок из книги

Maria Cristina Mariani University of Texas, El Paso El Paso, United States

Osei Kofi Tweneboah Ramapo College of New Jersey Mahwah, United States

.....

Figure 18.2 3D power spectra of the returns (generated per minute) from the four analyzed stock companies. (a) Discover. (b) Microsoft. (c) Walmart. (d) JPM Chase.

Figure 19.1 Time‐frequency image of explosion 1 recorded by ANMO (Table 19.2).

.....

Добавление нового отзыва

Комментарий Поле, отмеченное звёздочкой  — обязательно к заполнению

Отзывы и комментарии читателей

Нет рецензий. Будьте первым, кто напишет рецензию на книгу Data Science in Theory and Practice
Подняться наверх