Computational Statistics in Data Science

Computational Statistics in Data Science
Автор книги: id книги: 2295619     Оценка: 0.0     Голосов: 0     Отзывы, комментарии: 0 23394,2 руб.     (255,14$) Читать книгу Купить и скачать книгу Электронная книга Жанр: Математика Правообладатель и/или издательство: John Wiley & Sons Limited Дата добавления в каталог КнигаЛит: ISBN: 9781119561088 Скачать фрагмент в формате   fb2   fb2.zip Возрастное ограничение: 0+ Оглавление Отрывок из книги

Реклама. ООО «ЛитРес», ИНН: 7719571260.

Описание книги

An essential roadmap to the application of computational statistics in contemporary data science In Computational Statistics in Data Science , a team of distinguished mathematicians and statisticians delivers an expert compilation of concepts, theories, techniques, and practices in computational statistics for readers who seek a single, standalone sourcebook on statistics in contemporary data science. The book contains multiple sections devoted to key, specific areas in computational statistics, offering modern and accessible presentations of up-to-date techniques. Computational Statistics in Data Science provides complimentary access to finalized entries in the Wiley StatsRef: Statistics Reference Online compendium. Readers will also find: A thorough introduction to computational statistics relevant and accessible to practitioners and researchers in a variety of data-intensive areas Comprehensive explorations of active topics in statistics, including big data, data stream processing, quantitative visualization, and deep learningPerfect for researchers and scholars working in any field requiring intermediate and advanced computational statistics techniques, Computational Statistics in Data Science will also earn a place in the libraries of scholars researching and developing computational data-scientific technologies and statistical graphics.

Оглавление

Группа авторов. Computational Statistics in Data Science

Table of Contents

List of Tables

List of Illustrations

Guide

Pages

Computational Statistics in Data Science

List of Contributors

Preface

Reference

1 Computational Statistics and Data Science in the Twenty‐First Century

1 Introduction

2 Core Challenges 1–3

2.1 Big N

2.2 Big P

2.3 Big M

3 Model‐Specific Advances

3.1 Bayesian Sparse Regression in the Age of Big N and Big P

3.1.1 Continuous shrinkage: alleviating big M

3.1.2 Conjugate gradient sampler for structured high‐dimensional Gaussians

3.2 Phylogenetic Reconstruction

4 Core Challenges 4 and 5

4.1 Fast, Flexible, and Friendly Statistical Algo‐Ware

4.2 Hardware‐Optimized Inference

5 Rise of Data Science

Acknowledgments

Notes

References

2 Statistical Software

1 User Development Environments

1.1 Extensible Text Editors: Emacs and Vim

1.2 Jupyter Notebooks

1.3 RStudio and Rmarkdown

2 Popular Statistical Software

2.1 R

2.1.1 Why use R over Python or Minitab?

2.1.2 Where can users find R support?

2.1.3 How easy is R to develop?

2.1.4 What is the downside of R?

2.1.5 Summary of R

2.2 Python

2.3 SAS®

2.4 SPSS®

3 Noteworthy Statistical Software and Related Tools

3.1 BUGS/JAGS

3.2 C++

3.3 Microsoft Excel/Spreadsheets

3.4 Git

3.5 Java

3.6 JavaScript, Typescript

3.7 Maple

3.8 MATLAB, GNU Octave

3.9 Minitab®

3.10 Workload Managers: SLURM/LSF

3.11 SQL

3.12 Stata®

3.13 Tableau®

4 Promising and Emerging Statistical Software

4.1 Edward, Pyro, NumPyro, and PyMC3

4.2 Julia

4.3 NIMBLE

4.4 Scala

4.5 Stan

5 The Future of Statistical Computing

6 Concluding Remarks

Acknowledgments

References

Further Reading

3 An Introduction to Deep Learning Methods

1 Introduction

2 Machine Learning: An Overview. 2.1 Introduction

2.2 Supervised Learning

2.3 Gradient Descent

3 Feedforward Neural Networks. 3.1 Introduction

3.2 Model Description

3.3 Training an MLP

4 Convolutional Neural Networks. 4.1 Introduction

4.2 Convolutional Layer

4.3 LeNet‐5

5 Autoencoders. 5.1 Introduction

5.2 Objective Function

5.3 Variational Autoencoder

6 Recurrent Neural Networks. 6.1 Introduction

6.2 Architecture

6.3 Long Short‐Term Memory Networks

7 Conclusion

References

4 Streaming Data and Data Streams

1 Introduction

2 Data Stream Computing

3 Issues in Data Stream Mining

3.1 Scalability

3.2 Integration

3.3 Fault‐Tolerance

3.4 Timeliness

3.5 Consistency

3.6 Heterogeneity and Incompleteness

3.7 Load Balancing

3.8 High Throughput

3.9 Privacy

3.10 Accuracy

4 Streaming Data Tools and Technologies

5 Streaming Data Pre‐Processing: Concept and Implementation

6 Streaming Data Algorithms

6.1 Unsupervised Learning

6.2 Semi‐Supervised Learning

6.3 Supervised Learning

6.4 Ontology‐Based Methods

7 Strategies for Processing Data Streams

8 Best Practices for Managing Data Streams

9 Conclusion and the Way Forward

References

5 Monte Carlo Simulation: Are We There Yet?

1 Introduction

2 Estimation

2.1 Expectations

2.2 Quantiles

2.3 Other Estimators

3 Sampling Distribution

3.1 Means

Theorem 1

3.2 Quantiles

Theorem 2

3.3 Other Estimators

3.4 Confidence Regions for Means

4 Estimating

5 Stopping Rules

5.1 IID Monte Carlo

5.2 MCMC

6 Workflow

7 Examples. 7.1 Action Figure Collector Problem

7.2 Estimating Risk for Empirical Bayes

7.3 Bayesian Nonlinear Regression

Note

References

6 Sequential Monte Carlo: Particle Filters and Beyond

1 Introduction

2 Sequential Importance Sampling and Resampling

2.1 Extended State Spaces and SMC Samplers

2.2 Particle MCMC and Related Methods

3 SMC in Statistical Contexts

3.1 SMC for Hidden Markov Models

3.1.1 Filtering

3.1.2 Smoothing

3.1.3 Parameter estimation

3.2 SMC for Bayesian Inference

3.2.1 SMC for model comparison

3.2.2 SMC for ABC

3.3 SMC for Maximum‐Likelihood Estimation

3.4 SMC for Rare Event Estimation

4 Selected Recent Developments

Acknowledgments

Note

References

7 Markov Chain Monte Carlo Methods, A Survey with Some Frequent Misunderstandings

1 Introduction

2 Monte Carlo Methods

3 Markov Chain Monte Carlo Methods

3.1 Metropolis–Hastings Algorithms

3.2 Gibbs Sampling

3.3 Hamiltonian Monte Carlo

4 Approximate Bayesian Computation

5 Further Reading

Abbreviations and Acronyms

Notes

References

Note

8 Bayesian Inference with Adaptive Markov Chain Monte Carlo

1 Introduction

2 Random‐Walk Metropolis Algorithm

3 Adaptation of Random‐Walk Metropolis

3.1 Adaptive Metropolis (AM)

3.2 Adaptive Scaling Metropolis (ASM)

3.3 Robust Adaptive Metropolis (RAM)

3.4 Rationale behind the Adaptations

3.5 Summary and Discussion on the Methods

4 Multimodal Targets with Parallel Tempering

5 Dynamic Models with Particle Filters

6 Discussion

Acknowledgments

Notes

References

9 Advances in Importance Sampling

1 Introduction and Problem Statement

1.1 Standard Monte Carlo Integration

2 Importance Sampling. 2.1 Origins

2.2 Basics

2.3 Theoretical Analysis

2.4 Diagnostics

2.5 Other IS Schemes

2.5.1 Transformation of the importance weights

2.5.2 Particle filtering (sequential Monte Carlo)

3 Multiple Importance Sampling (MIS)

3.1 Generalized MIS

3.1.1 MIS with different number of samples per proposal

3.2 Rare Event Estimation

3.3 Compressed and Distributed IS

4 Adaptive Importance Sampling (AIS)

Acknowledgments

Notes

References

10 Supervised Learning

1 Introduction

2 Penalized Empirical Risk Minimization

2.1 Bias–Variance Trade‐Off

2.2 First‐Order Optimization Methods

3 Linear Regression

3.1 Linear Regression and Ridge Regression

3.2 LASSO

4 Classification

4.1 Model‐Based Methods

4.2 Support Vector Machine (SVM)

4.3 Convex Surrogate Loss

4.3.1 Surrogate risk minimization

4.3.2 Large‐margin unified machines (LUMs)

4.4 Nonconvex Surrogate Loss

4.5 Multicategory Classification Problem

5 Extensions for Complex Data

5.1 Reproducing Kernel Hilbert Space (RKHS)

5.2 Large‐Scale Optimization

6 Discussion

References

11 Unsupervised and Semisupervised Learning

1 Introduction

2 Unsupervised Learning

2.1 Mixture‐Model‐Based Clustering

2.1.1 Gaussian mixture model

2.1.2 Clustering by mode association

2.1.3 Hidden Markov model on variable blocks

2.1.4 Variable selection

2.2 Clustering of Distributional Data

2.3 Uncertainty Analysis

3 Semisupervised Learning

3.1 Setting

3.2 Self‐Training

3.3 Generative Models

3.4 Graphical Models

3.5 Entropy Minimization

3.6 Consistency Regularization

3.7 Mixup

3.8 MixMatch

4 Conclusions

Acknowledgment

Notes

References

12 Random Forests

1 Introduction

2 Random Forest (RF)

2.1 RF Algorithm

2.2 RF Advantages and Limitations

3 Random Forest Extensions

3.1 Extremely Randomized Trees (ERT)

3.2 Acceptance‐Rejection Trees (ART)

3.3 Conditional Random Forest (CRF)

3.4 Miscellaneous

4 Random Forests of Interaction Trees (RFIT)

4.1 Modified Splitting Statistic

4.2 Standard Errors

4.3 Concomitant Outputs

4.4 Illustration of RFIT

5 Random Forest of Interaction Trees for Observational Studies

5.1 Propensity Score

5.2 Random Forest Adjusting for Propensity Score

5.3 Variable Importance

5.4 Simulation Study

6 Discussion

References

13 Network Analysis

1 Introduction

2 Gaussian Graphical Models for Mixed Partial Compositional Data. 2.1 A Statistical Framework for Mixed Partial Compositional Data

2.2 Estimation of Gaussian Graphical Models of Mixed Partial Compositional Data

3 Theoretical Properties

3.1 Assumptions

3.2 Rates of Convergence

4 Graphical Model Selection

5 Analysis of a Microbiome–Metabolomics Data

6 Discussion

References

14 Tensors in Modern Statistical Learning

1 Introduction

2 Background

2.1 Definitions and Notation

2.2 Tensor Operations

2.3 Tensor Decompositions

3 Tensor Supervised Learning

3.1 Tensor Predictor Regression. 3.1.1 Motivating examples

3.1.2 Low‐rank linear and generalized linear model

3.1.3 Large‐scale tensor regression via sketching

3.1.4 Nonparametric tensor regression

3.1.5 Future directions

3.2 Tensor Response Regression. 3.2.1 Motivating examples

3.2.2 Sparse low‐rank tensor response model

3.2.3 Additional tensor response regression models

3.2.4 Future directions

4 Tensor Unsupervised Learning

4.1 Tensor Clustering. 4.1.1 Motivating examples

4.1.2 Convex tensor co‐clustering

4.1.3 Tensor clustering via low‐rank decomposition

4.1.4 Additional tensor clustering approaches

4.1.5 Future directions

4.2 Tensor Graphical Model. 4.2.1 Motivating examples

4.2.2 Gaussian graphical model

4.2.3 Variation in the Kronecker structure

4.2.4 Future directions

5 Tensor Reinforcement Learning

5.1 Stochastic Low‐Rank Tensor Bandit. 5.1.1 Motivating examples

5.1.2 Low‐rank tensor bandit problem formulation

5.1.3 Rank‐1 bandit

5.1.4 General‐rank bandit

5.1.5 Future directions

5.2 Learning Markov Decision Process via Tensor Decomposition. 5.2.1 Motivating examples

5.2.2 Dimension reduction of Markov decision process

5.2.3 Maximum‐likelihood estimation and Tucker decomposition

5.2.4 Future directions

6 Tensor Deep Learning

6.1 Tensor‐Based Deep Neural Network Compression. 6.1.1 Motivating examples

6.1.2 Compression of convolutional layers of CNN

6.1.3 Compression of fully‐connected layers of CNN

6.1.4 Compression of all layers of CNN

6.1.5 Compression of RNN

6.1.6 Future directions

6.2 Deep Learning Theory through Tensor Methods. 6.2.1 Motivating examples

6.2.2 Expressive power, compressibility and generalizability

6.2.3 Additional connections

6.2.4 Future directions

Acknowledgments

References

15 Computational Approaches to Bayesian Additive Regression Trees

1 Introduction

2 Bayesian CART

2.1 A Single‐Tree Model

2.2 Tree Model Likelihood

2.3 Tree Model Prior

2.3.1

2.3.2

3 Tree MCMC

3.1 The BIRTH/DEATH Move

3.2 CHANGE Rule

3.3 SWAP Rule

3.4 Improved Tree Space Moves

3.4.1 Rotate

3.4.2 Perturb

3.4.3 The complex mixtures that are tree proposals

4 The BART Model

4.1 Specification of the BART Regularization Prior

5 BART Example: Boston Housing Values and Air Pollution

6 BART MCMC

7 BART Extentions

7.1 The DART Sparsity Prior

7.1.1 Grouped variables and the DART prior

7.2 XBART

7.2.1 The XBART algorithm and GrowFromRoot

Presorting predictor variables

Adaptive nested cut‐points

7.2.2 Warm‐start XBART

8 Conclusion

References

16 Penalized Regression

1 Introduction

2 Penalization for Smoothness

3 Penalization for Sparsity

4 Tuning Parameter Selection

References

17 Model Selection in High‐Dimensional Regression

1 Model Selection Problem

2 Model Selection in High‐Dimensional Linear Regression

2.1 Shrinkage Methods

2.2 Sure Screening Methods

2.3 Model Selection Theory

2.4 Tuning Parameter Selection

2.5 Numerical Computation

3 Interaction‐Effect Selection for High‐Dimensional Data. 3.1 Problem Setup

3.2 Joint Selection of Main Effects and Interactions

3.3 Two‐Stage Approach

3.4 Regularization Path Algorithm under Marginality Principle (RAMP)

4 Model Selection in High‐Dimensional Nonparametric Models

4.1 Model Selection Problem

Penalty on basis coefficients

Function soft‐thresholding methods

4.2 Penalty on Basis Coefficients

Polynomial basis example

Basis pursuit

Adaptive group LASSO

4.3 Component Selection and Smoothing Operator (COSSO)

4.4 Adaptive COSSO

Nonparametric oracle property

4.5 Sparse Additive Models (SpAM)

Persistent property

4.6 Sparsity‐Smoothness Penalty

4.7 Nonparametric Independence Screening (NIS)

5 Concluding Remarks

References

18 Sampling Local Scale Parameters in High-Dimensional Regression Models

1 Introduction

2 A Blocked Gibbs Sampler for the Horseshoe

2.1 Some Highlights for the Blocked Algorithm

3 Sampling

3.1 Sampling

3.2 Sampling

3.3 Sampling

4 Sampling. 4.1 The Slice Sampling Strategy

4.2 Direct Sampling

4.2.1 Inverse‐cdf sampler

5 Appendix: A. Newton–Raphson Steps for the Inverse‐cdf Sampler for

Acknowledgment

References

Note

19 Factor Modeling for High-Dimensional Time Series

1 Introduction

2 Identifiability

3 Estimation of High‐Dimensional Factor Model

3.1 Least‐Squares or Principal Component Estimation

3.2 Factor Loading Space Estimation

3.2.1 Improved Estimation of Factor Process

3.3 Frequency‐Domain Approach

3.4 Likelihood‐Based Estimation

3.4.1 Exact likelihood via Kalman filtering

3.4.2 Exact likelihood via matrix decomposition

3.4.3 Bai and Li's Quasi‐likelihood estimation

3.4.4 Breitung and Tenhofen's Quasi‐likelihood estimation

3.4.5 Frequency‐domain (Whittle) likelihood

4 Determining the Number of Factors

4.1 Information Criterion

4.2 Eigenvalues Difference/Ratio Estimators

4.3 Testing Approaches

4.4 Estimation of Dynamic Factors

Acknowledgment

References

20 Visual Communication of Data: It Is Not a Programming Problem, It Is Viewer Perception

1 Introduction. 1.1 Observation

1.2 Available Guidance

1.3 Our Message

2 Case Studies Part 1

2.1 Imogene: A Senior Data Analyst Who Becomes Too Interested in the Program

2.2 Regis: An Intern Who Wants to Get the Job Done Quickly

3 Let StAR Be Your Guide

4 Case Studies Part 2: Using StAR Principles to Develop Better Graphics

4.1 StAR Method: Imogene Thinks through and Investigates Changing Scales

4.2 StAR Method: Regis Thinks through and Discovers an Interesting Way to Depict Uncertainty

5 Ask Colleagues Their Opinion

6 Case Studies: Part 3. 6.1 Imogene Gets Advice on Using Dot Plots

6.2 Regis Gets Advice on Visualizing in the Presence of Multiple Tests

7 Iterate

8 Final Thoughts

Notes

References

21 Uncertainty Visualization

1 Introduction

1.1 Uncertainty Visualization Design Space

2 Uncertainty Visualization Theories

2.1 Frequency Framing

2.1.1 Icon arrays

2.1.2 Quantile dotplots

2.2 Attribute Substitution

2.2.1 Hypothetical outcome plots

2.3 Visual Boundaries = Cognitive Categories

2.3.1 Ensemble displays

2.3.2 Error bars

2.4 Visual Semiotics of Uncertainty

3 General Discussion

References

22 Big Data Visualization

1 Introduction

2 Architecture for Big Data Analytics

3 Filtering

3.1 Sampling

4 Aggregating

4.1 1D Continuous Aggregation

4.2 1D Categorical Aggregation

4.3 2D Aggregation

4.3.1 2D binning on the surface of a sphere

4.3.2 2D categorical versus continuous aggregation

4.3.3 2D categorical versus categorical aggregation

4.4 nD Aggregation

4.5 Two‐Way Aggregation

5 Analyzing

6 Big Data Graphics

6.1 Box Plots

6.2 Histograms

6.3 Scatterplot Matrices

6.4 Parallel Coordinates

7 Conclusion

References

23 Visualization‐Assisted Statistical Learning

1 Introduction

2 Better Visualizations with Seriation

3 Visualizing Machine Learning Fits. 3.1 Partial Dependence

3.2 FEV Dataset

3.3 Interactive Conditional Visualization

4 Condvis2 Case Studies. 4.1 Interactive Exploration of FEV Regression Models

4.2 Interactive Exploration of Pima Classification Models

4.3 Interactive Exploration of Models for Wages Repeated Measures Data

5 Discussion

References

24 Functional Data Visualization

1 Introduction

2 Univariate Functional Data Visualization. 2.1 Functional Boxplots

2.2 Surface Boxplots

3 Multivariate Functional Data Visualization

3.1 Magnitude–Shape Plots

3.2 Two‐Stage Functional Boxplots

3.3 Trajectory Functional Boxplots

4 Conclusions

Acknowledgment

References

25 Gradient‐Based Optimizers for Statistics and Machine Learning

1 Introduction

2 Convex Versus Nonconvex Optimization

3 Gradient Descent. 3.1 Basic Formulation

3.2 How to Find the Step Size?

3.3 Examples

4 Proximal Gradient Descent: Handling Nondifferentiable Regularization

5 Stochastic Gradient Descent

5.1 Basic Formulation

5.2 Challenges

References

26 Alternating Minimization Algorithms

1 Introduction

2 Coordinate Descent

3 EM as Alternating Minimization

3.1 Finite Mixture Models

3.2 Variational EM

4 Matrix Approximation Algorithms

4.1 ‐Means Clustering

4.2 Low‐Rank Matrix Factorization

4.3 Reduced Rank Regression

5 Conclusion

References

27 A Gentle Introduction to Alternating Direction Method of Multipliers (ADMM) for Statistical Problems

1 Introduction

2 Two Perfect Examples of ADMM

3 Variable Splitting and Linearized ADMM

4 Multiblock ADMM

5 Nonconvex Problems

6 Stopping Criteria

7 Convergence Results of ADMM

7.1 Convex Problems. 7.1.1 Convex case

7.1.2 Strongly convex case

7.1.3 Linearized ADMM

7.2 Nonconvex Problems

Acknowledgments

References

28 Nonconvex Optimization via MM Algorithms: Convergence Theory

1 Background

2 Convergence Theorems

2.1 Classical Convergence Theorem

2.2 Smooth Objective Functions

Proof

2.3 Nonsmooth Objective Functions

2.3.1 MM convergence for semialgebraic functions

2.4 A Proximal Trick to Prevent Cycling

3 Paracontraction

4 Bregman Majorization

4.1 Convergence Analysis via SUMMA

4.2 Examples. 4.2.1 Proximal gradient method

4.2.2 Mirror descent method

References

29 Massive Parallelization

1 Introduction

2 Gaussian Process Regression and Surrogate Modeling

2.1 GP Basics

2.2 Pushing the Envelope

3 Divide‐and‐Conquer GP Regression

3.1 Local Approximate Gaussian Processes

3.2 Massively Parallelized Global GP Approximation

3.3 Off‐Loading Subroutines to GPUs

4 Empirical Results

4.1 SARCOS

4.2 Supercomputer Cascade

5 Conclusion

Acknowledgments

Notes

References

30 Divide‐and‐Conquer Methods for Big Data Analysis

1 Introduction

2 Linear Regression Model

3 Parametric Models

3.1 Sparse High‐Dimensional Models

3.2 Marginal Proportional Hazards Model

3.3 One‐Step Estimator and Multiround Divide‐and‐Conquer

3.4 Performance in Nonstandard Problems

4 Nonparametric and Semiparametric Models

5 Online Sequential Updating

6 Splitting the Number of Covariates

7 Bayesian Divide‐and‐Conquer and Median‐Based Combining

8 Real‐World Applications

9 Discussion

Acknowledgment

References

31 Bayesian Aggregation

1 From Model Selection to Model Combination

1.1 The Bayesian Decision Framework for Model Assessment

1.2 Remodeling: ‐Closed, ‐Complete, and ‐Open Views

2 From Bayesian Model Averaging to Bayesian Stacking

2.1 ‐Closed: Bayesian Model Averaging

2.2 ‐Open: Stacking

2.2.1 Choice of utility

2.3 ‐Complete: Reference‐Model Stacking

2.4 The Connection between BMA and Stacking

2.5 Hierarchical Stacking

2.6 Other Related Methods and Generalizations

3 Asymptotic Theories of Stacking

3.1 Model Aggregation Is No Worse than Model Selection

3.2 Stacking Viewed as Pointwise Model Selection

3.3 Selection or Averaging?

4 Stacking in Practice

4.1 Practical Implementation Using Pareto Smoothed Importance Sampling

4.2 Stacking for Multilevel Data

4.3 Stacking for Time Series Data

4.4 The Choice of Model List

5 Discussion

References

32 Asynchronous Parallel Computing

1 Introduction

1.1 Synchronous and Asynchronous Parallel Computing

1.2 Not All Algorithms Can Benefit from Parallelization

1.3 Outline

1.4 Notation

2 Asynchronous Parallel Coordinate Update

2.1 Least Absolute Shrinkage and Selection Operator (LASSO)

2.2 Nonnegative Matrix Factorization

2.3 Kernel Support Vector Machine

2.4 Decentralized Algorithms

3 Asynchronous Parallel Stochastic Approaches

3.1 Hogwild!

3.2 Federated Learning

4 Doubly Stochastic Coordinate Optimization with Variance Reduction

5 Concluding Remarks

References

Index. a

b

c

d

e

f

g

h

i

j

k

l

m

n

o

p

q

r

s

t

u

v

w

x

y

z

Abbreviations and Acronyms

WILEY END USER LICENSE AGREEMENT

Отрывок из книги

Edited by Walter W. Piegorsch University of Arizona

Richard A. Levine San Diego State University

.....

University Park, PA

Lexin Li

.....

Добавление нового отзыва

Комментарий Поле, отмеченное звёздочкой  — обязательно к заполнению

Отзывы и комментарии читателей

Нет рецензий. Будьте первым, кто напишет рецензию на книгу Computational Statistics in Data Science
Подняться наверх