Computational Statistics in Data Science
Реклама. ООО «ЛитРес», ИНН: 7719571260.
Оглавление
Группа авторов. Computational Statistics in Data Science
Table of Contents
List of Tables
List of Illustrations
Guide
Pages
Computational Statistics in Data Science
List of Contributors
Preface
Reference
1 Computational Statistics and Data Science in the Twenty‐First Century
1 Introduction
2 Core Challenges 1–3
2.1 Big N
2.2 Big P
2.3 Big M
3 Model‐Specific Advances
3.1 Bayesian Sparse Regression in the Age of Big N and Big P
3.1.1 Continuous shrinkage: alleviating big M
3.1.2 Conjugate gradient sampler for structured high‐dimensional Gaussians
3.2 Phylogenetic Reconstruction
4 Core Challenges 4 and 5
4.1 Fast, Flexible, and Friendly Statistical Algo‐Ware
4.2 Hardware‐Optimized Inference
5 Rise of Data Science
Acknowledgments
Notes
References
2 Statistical Software
1 User Development Environments
1.1 Extensible Text Editors: Emacs and Vim
1.2 Jupyter Notebooks
1.3 RStudio and Rmarkdown
2 Popular Statistical Software
2.1 R
2.1.1 Why use R over Python or Minitab?
2.1.2 Where can users find R support?
2.1.3 How easy is R to develop?
2.1.4 What is the downside of R?
2.1.5 Summary of R
2.2 Python
2.3 SAS®
2.4 SPSS®
3 Noteworthy Statistical Software and Related Tools
3.1 BUGS/JAGS
3.2 C++
3.3 Microsoft Excel/Spreadsheets
3.4 Git
3.5 Java
3.6 JavaScript, Typescript
3.7 Maple
3.8 MATLAB, GNU Octave
3.9 Minitab®
3.10 Workload Managers: SLURM/LSF
3.11 SQL
3.12 Stata®
3.13 Tableau®
4 Promising and Emerging Statistical Software
4.1 Edward, Pyro, NumPyro, and PyMC3
4.2 Julia
4.3 NIMBLE
4.4 Scala
4.5 Stan
5 The Future of Statistical Computing
6 Concluding Remarks
Acknowledgments
References
Further Reading
3 An Introduction to Deep Learning Methods
1 Introduction
2 Machine Learning: An Overview. 2.1 Introduction
2.2 Supervised Learning
2.3 Gradient Descent
3 Feedforward Neural Networks. 3.1 Introduction
3.2 Model Description
3.3 Training an MLP
4 Convolutional Neural Networks. 4.1 Introduction
4.2 Convolutional Layer
4.3 LeNet‐5
5 Autoencoders. 5.1 Introduction
5.2 Objective Function
5.3 Variational Autoencoder
6 Recurrent Neural Networks. 6.1 Introduction
6.2 Architecture
6.3 Long Short‐Term Memory Networks
7 Conclusion
References
4 Streaming Data and Data Streams
1 Introduction
2 Data Stream Computing
3 Issues in Data Stream Mining
3.1 Scalability
3.2 Integration
3.3 Fault‐Tolerance
3.4 Timeliness
3.5 Consistency
3.6 Heterogeneity and Incompleteness
3.7 Load Balancing
3.8 High Throughput
3.9 Privacy
3.10 Accuracy
4 Streaming Data Tools and Technologies
5 Streaming Data Pre‐Processing: Concept and Implementation
6 Streaming Data Algorithms
6.1 Unsupervised Learning
6.2 Semi‐Supervised Learning
6.3 Supervised Learning
6.4 Ontology‐Based Methods
7 Strategies for Processing Data Streams
8 Best Practices for Managing Data Streams
9 Conclusion and the Way Forward
References
5 Monte Carlo Simulation: Are We There Yet?
1 Introduction
2 Estimation
2.1 Expectations
2.2 Quantiles
2.3 Other Estimators
3 Sampling Distribution
3.1 Means
Theorem 1
3.2 Quantiles
Theorem 2
3.3 Other Estimators
3.4 Confidence Regions for Means
4 Estimating
5 Stopping Rules
5.1 IID Monte Carlo
5.2 MCMC
6 Workflow
7 Examples. 7.1 Action Figure Collector Problem
7.2 Estimating Risk for Empirical Bayes
7.3 Bayesian Nonlinear Regression
Note
References
6 Sequential Monte Carlo: Particle Filters and Beyond
1 Introduction
2 Sequential Importance Sampling and Resampling
2.1 Extended State Spaces and SMC Samplers
2.2 Particle MCMC and Related Methods
3 SMC in Statistical Contexts
3.1 SMC for Hidden Markov Models
3.1.1 Filtering
3.1.2 Smoothing
3.1.3 Parameter estimation
3.2 SMC for Bayesian Inference
3.2.1 SMC for model comparison
3.2.2 SMC for ABC
3.3 SMC for Maximum‐Likelihood Estimation
3.4 SMC for Rare Event Estimation
4 Selected Recent Developments
Acknowledgments
Note
References
7 Markov Chain Monte Carlo Methods, A Survey with Some Frequent Misunderstandings
1 Introduction
2 Monte Carlo Methods
3 Markov Chain Monte Carlo Methods
3.1 Metropolis–Hastings Algorithms
3.2 Gibbs Sampling
3.3 Hamiltonian Monte Carlo
4 Approximate Bayesian Computation
5 Further Reading
Abbreviations and Acronyms
Notes
References
Note
8 Bayesian Inference with Adaptive Markov Chain Monte Carlo
1 Introduction
2 Random‐Walk Metropolis Algorithm
3 Adaptation of Random‐Walk Metropolis
3.1 Adaptive Metropolis (AM)
3.2 Adaptive Scaling Metropolis (ASM)
3.3 Robust Adaptive Metropolis (RAM)
3.4 Rationale behind the Adaptations
3.5 Summary and Discussion on the Methods
4 Multimodal Targets with Parallel Tempering
5 Dynamic Models with Particle Filters
6 Discussion
Acknowledgments
Notes
References
9 Advances in Importance Sampling
1 Introduction and Problem Statement
1.1 Standard Monte Carlo Integration
2 Importance Sampling. 2.1 Origins
2.2 Basics
2.3 Theoretical Analysis
2.4 Diagnostics
2.5 Other IS Schemes
2.5.1 Transformation of the importance weights
2.5.2 Particle filtering (sequential Monte Carlo)
3 Multiple Importance Sampling (MIS)
3.1 Generalized MIS
3.1.1 MIS with different number of samples per proposal
3.2 Rare Event Estimation
3.3 Compressed and Distributed IS
4 Adaptive Importance Sampling (AIS)
Acknowledgments
Notes
References
10 Supervised Learning
1 Introduction
2 Penalized Empirical Risk Minimization
2.1 Bias–Variance Trade‐Off
2.2 First‐Order Optimization Methods
3 Linear Regression
3.1 Linear Regression and Ridge Regression
3.2 LASSO
4 Classification
4.1 Model‐Based Methods
4.2 Support Vector Machine (SVM)
4.3 Convex Surrogate Loss
4.3.1 Surrogate risk minimization
4.3.2 Large‐margin unified machines (LUMs)
4.4 Nonconvex Surrogate Loss
4.5 Multicategory Classification Problem
5 Extensions for Complex Data
5.1 Reproducing Kernel Hilbert Space (RKHS)
5.2 Large‐Scale Optimization
6 Discussion
References
11 Unsupervised and Semisupervised Learning
1 Introduction
2 Unsupervised Learning
2.1 Mixture‐Model‐Based Clustering
2.1.1 Gaussian mixture model
2.1.2 Clustering by mode association
2.1.3 Hidden Markov model on variable blocks
2.1.4 Variable selection
2.2 Clustering of Distributional Data
2.3 Uncertainty Analysis
3 Semisupervised Learning
3.1 Setting
3.2 Self‐Training
3.3 Generative Models
3.4 Graphical Models
3.5 Entropy Minimization
3.6 Consistency Regularization
3.7 Mixup
3.8 MixMatch
4 Conclusions
Acknowledgment
Notes
References
12 Random Forests
1 Introduction
2 Random Forest (RF)
2.1 RF Algorithm
2.2 RF Advantages and Limitations
3 Random Forest Extensions
3.1 Extremely Randomized Trees (ERT)
3.2 Acceptance‐Rejection Trees (ART)
3.3 Conditional Random Forest (CRF)
3.4 Miscellaneous
4 Random Forests of Interaction Trees (RFIT)
4.1 Modified Splitting Statistic
4.2 Standard Errors
4.3 Concomitant Outputs
4.4 Illustration of RFIT
5 Random Forest of Interaction Trees for Observational Studies
5.1 Propensity Score
5.2 Random Forest Adjusting for Propensity Score
5.3 Variable Importance
5.4 Simulation Study
6 Discussion
References
13 Network Analysis
1 Introduction
2 Gaussian Graphical Models for Mixed Partial Compositional Data. 2.1 A Statistical Framework for Mixed Partial Compositional Data
2.2 Estimation of Gaussian Graphical Models of Mixed Partial Compositional Data
3 Theoretical Properties
3.1 Assumptions
3.2 Rates of Convergence
4 Graphical Model Selection
5 Analysis of a Microbiome–Metabolomics Data
6 Discussion
References
14 Tensors in Modern Statistical Learning
1 Introduction
2 Background
2.1 Definitions and Notation
2.2 Tensor Operations
2.3 Tensor Decompositions
3 Tensor Supervised Learning
3.1 Tensor Predictor Regression. 3.1.1 Motivating examples
3.1.2 Low‐rank linear and generalized linear model
3.1.3 Large‐scale tensor regression via sketching
3.1.4 Nonparametric tensor regression
3.1.5 Future directions
3.2 Tensor Response Regression. 3.2.1 Motivating examples
3.2.2 Sparse low‐rank tensor response model
3.2.3 Additional tensor response regression models
3.2.4 Future directions
4 Tensor Unsupervised Learning
4.1 Tensor Clustering. 4.1.1 Motivating examples
4.1.2 Convex tensor co‐clustering
4.1.3 Tensor clustering via low‐rank decomposition
4.1.4 Additional tensor clustering approaches
4.1.5 Future directions
4.2 Tensor Graphical Model. 4.2.1 Motivating examples
4.2.2 Gaussian graphical model
4.2.3 Variation in the Kronecker structure
4.2.4 Future directions
5 Tensor Reinforcement Learning
5.1 Stochastic Low‐Rank Tensor Bandit. 5.1.1 Motivating examples
5.1.2 Low‐rank tensor bandit problem formulation
5.1.3 Rank‐1 bandit
5.1.4 General‐rank bandit
5.1.5 Future directions
5.2 Learning Markov Decision Process via Tensor Decomposition. 5.2.1 Motivating examples
5.2.2 Dimension reduction of Markov decision process
5.2.3 Maximum‐likelihood estimation and Tucker decomposition
5.2.4 Future directions
6 Tensor Deep Learning
6.1 Tensor‐Based Deep Neural Network Compression. 6.1.1 Motivating examples
6.1.2 Compression of convolutional layers of CNN
6.1.3 Compression of fully‐connected layers of CNN
6.1.4 Compression of all layers of CNN
6.1.5 Compression of RNN
6.1.6 Future directions
6.2 Deep Learning Theory through Tensor Methods. 6.2.1 Motivating examples
6.2.2 Expressive power, compressibility and generalizability
6.2.3 Additional connections
6.2.4 Future directions
Acknowledgments
References
15 Computational Approaches to Bayesian Additive Regression Trees
1 Introduction
2 Bayesian CART
2.1 A Single‐Tree Model
2.2 Tree Model Likelihood
2.3 Tree Model Prior
2.3.1
2.3.2
3 Tree MCMC
3.1 The BIRTH/DEATH Move
3.2 CHANGE Rule
3.3 SWAP Rule
3.4 Improved Tree Space Moves
3.4.1 Rotate
3.4.2 Perturb
3.4.3 The complex mixtures that are tree proposals
4 The BART Model
4.1 Specification of the BART Regularization Prior
5 BART Example: Boston Housing Values and Air Pollution
6 BART MCMC
7 BART Extentions
7.1 The DART Sparsity Prior
7.1.1 Grouped variables and the DART prior
7.2 XBART
7.2.1 The XBART algorithm and GrowFromRoot
Presorting predictor variables
Adaptive nested cut‐points
7.2.2 Warm‐start XBART
8 Conclusion
References
16 Penalized Regression
1 Introduction
2 Penalization for Smoothness
3 Penalization for Sparsity
4 Tuning Parameter Selection
References
17 Model Selection in High‐Dimensional Regression
1 Model Selection Problem
2 Model Selection in High‐Dimensional Linear Regression
2.1 Shrinkage Methods
2.2 Sure Screening Methods
2.3 Model Selection Theory
2.4 Tuning Parameter Selection
2.5 Numerical Computation
3 Interaction‐Effect Selection for High‐Dimensional Data. 3.1 Problem Setup
3.2 Joint Selection of Main Effects and Interactions
3.3 Two‐Stage Approach
3.4 Regularization Path Algorithm under Marginality Principle (RAMP)
4 Model Selection in High‐Dimensional Nonparametric Models
4.1 Model Selection Problem
Penalty on basis coefficients
Function soft‐thresholding methods
4.2 Penalty on Basis Coefficients
Polynomial basis example
Basis pursuit
Adaptive group LASSO
4.3 Component Selection and Smoothing Operator (COSSO)
4.4 Adaptive COSSO
Nonparametric oracle property
4.5 Sparse Additive Models (SpAM)
Persistent property
4.6 Sparsity‐Smoothness Penalty
4.7 Nonparametric Independence Screening (NIS)
5 Concluding Remarks
References
18 Sampling Local Scale Parameters in High-Dimensional Regression Models
1 Introduction
2 A Blocked Gibbs Sampler for the Horseshoe
2.1 Some Highlights for the Blocked Algorithm
3 Sampling
3.1 Sampling
3.2 Sampling
3.3 Sampling
4 Sampling. 4.1 The Slice Sampling Strategy
4.2 Direct Sampling
4.2.1 Inverse‐cdf sampler
5 Appendix: A. Newton–Raphson Steps for the Inverse‐cdf Sampler for
Acknowledgment
References
Note
19 Factor Modeling for High-Dimensional Time Series
1 Introduction
2 Identifiability
3 Estimation of High‐Dimensional Factor Model
3.1 Least‐Squares or Principal Component Estimation
3.2 Factor Loading Space Estimation
3.2.1 Improved Estimation of Factor Process
3.3 Frequency‐Domain Approach
3.4 Likelihood‐Based Estimation
3.4.1 Exact likelihood via Kalman filtering
3.4.2 Exact likelihood via matrix decomposition
3.4.3 Bai and Li's Quasi‐likelihood estimation
3.4.4 Breitung and Tenhofen's Quasi‐likelihood estimation
3.4.5 Frequency‐domain (Whittle) likelihood
4 Determining the Number of Factors
4.1 Information Criterion
4.2 Eigenvalues Difference/Ratio Estimators
4.3 Testing Approaches
4.4 Estimation of Dynamic Factors
Acknowledgment
References
20 Visual Communication of Data: It Is Not a Programming Problem, It Is Viewer Perception
1 Introduction. 1.1 Observation
1.2 Available Guidance
1.3 Our Message
2 Case Studies Part 1
2.1 Imogene: A Senior Data Analyst Who Becomes Too Interested in the Program
2.2 Regis: An Intern Who Wants to Get the Job Done Quickly
3 Let StAR Be Your Guide
4 Case Studies Part 2: Using StAR Principles to Develop Better Graphics
4.1 StAR Method: Imogene Thinks through and Investigates Changing Scales
4.2 StAR Method: Regis Thinks through and Discovers an Interesting Way to Depict Uncertainty
5 Ask Colleagues Their Opinion
6 Case Studies: Part 3. 6.1 Imogene Gets Advice on Using Dot Plots
6.2 Regis Gets Advice on Visualizing in the Presence of Multiple Tests
7 Iterate
8 Final Thoughts
Notes
References
21 Uncertainty Visualization
1 Introduction
1.1 Uncertainty Visualization Design Space
2 Uncertainty Visualization Theories
2.1 Frequency Framing
2.1.1 Icon arrays
2.1.2 Quantile dotplots
2.2 Attribute Substitution
2.2.1 Hypothetical outcome plots
2.3 Visual Boundaries = Cognitive Categories
2.3.1 Ensemble displays
2.3.2 Error bars
2.4 Visual Semiotics of Uncertainty
3 General Discussion
References
22 Big Data Visualization
1 Introduction
2 Architecture for Big Data Analytics
3 Filtering
3.1 Sampling
4 Aggregating
4.1 1D Continuous Aggregation
4.2 1D Categorical Aggregation
4.3 2D Aggregation
4.3.1 2D binning on the surface of a sphere
4.3.2 2D categorical versus continuous aggregation
4.3.3 2D categorical versus categorical aggregation
4.4 nD Aggregation
4.5 Two‐Way Aggregation
5 Analyzing
6 Big Data Graphics
6.1 Box Plots
6.2 Histograms
6.3 Scatterplot Matrices
6.4 Parallel Coordinates
7 Conclusion
References
23 Visualization‐Assisted Statistical Learning
1 Introduction
2 Better Visualizations with Seriation
3 Visualizing Machine Learning Fits. 3.1 Partial Dependence
3.2 FEV Dataset
3.3 Interactive Conditional Visualization
4 Condvis2 Case Studies. 4.1 Interactive Exploration of FEV Regression Models
4.2 Interactive Exploration of Pima Classification Models
4.3 Interactive Exploration of Models for Wages Repeated Measures Data
5 Discussion
References
24 Functional Data Visualization
1 Introduction
2 Univariate Functional Data Visualization. 2.1 Functional Boxplots
2.2 Surface Boxplots
3 Multivariate Functional Data Visualization
3.1 Magnitude–Shape Plots
3.2 Two‐Stage Functional Boxplots
3.3 Trajectory Functional Boxplots
4 Conclusions
Acknowledgment
References
25 Gradient‐Based Optimizers for Statistics and Machine Learning
1 Introduction
2 Convex Versus Nonconvex Optimization
3 Gradient Descent. 3.1 Basic Formulation
3.2 How to Find the Step Size?
3.3 Examples
4 Proximal Gradient Descent: Handling Nondifferentiable Regularization
5 Stochastic Gradient Descent
5.1 Basic Formulation
5.2 Challenges
References
26 Alternating Minimization Algorithms
1 Introduction
2 Coordinate Descent
3 EM as Alternating Minimization
3.1 Finite Mixture Models
3.2 Variational EM
4 Matrix Approximation Algorithms
4.1 ‐Means Clustering
4.2 Low‐Rank Matrix Factorization
4.3 Reduced Rank Regression
5 Conclusion
References
27 A Gentle Introduction to Alternating Direction Method of Multipliers (ADMM) for Statistical Problems
1 Introduction
2 Two Perfect Examples of ADMM
3 Variable Splitting and Linearized ADMM
4 Multiblock ADMM
5 Nonconvex Problems
6 Stopping Criteria
7 Convergence Results of ADMM
7.1 Convex Problems. 7.1.1 Convex case
7.1.2 Strongly convex case
7.1.3 Linearized ADMM
7.2 Nonconvex Problems
Acknowledgments
References
28 Nonconvex Optimization via MM Algorithms: Convergence Theory
1 Background
2 Convergence Theorems
2.1 Classical Convergence Theorem
2.2 Smooth Objective Functions
Proof
2.3 Nonsmooth Objective Functions
2.3.1 MM convergence for semialgebraic functions
2.4 A Proximal Trick to Prevent Cycling
3 Paracontraction
4 Bregman Majorization
4.1 Convergence Analysis via SUMMA
4.2 Examples. 4.2.1 Proximal gradient method
4.2.2 Mirror descent method
References
29 Massive Parallelization
1 Introduction
2 Gaussian Process Regression and Surrogate Modeling
2.1 GP Basics
2.2 Pushing the Envelope
3 Divide‐and‐Conquer GP Regression
3.1 Local Approximate Gaussian Processes
3.2 Massively Parallelized Global GP Approximation
3.3 Off‐Loading Subroutines to GPUs
4 Empirical Results
4.1 SARCOS
4.2 Supercomputer Cascade
5 Conclusion
Acknowledgments
Notes
References
30 Divide‐and‐Conquer Methods for Big Data Analysis
1 Introduction
2 Linear Regression Model
3 Parametric Models
3.1 Sparse High‐Dimensional Models
3.2 Marginal Proportional Hazards Model
3.3 One‐Step Estimator and Multiround Divide‐and‐Conquer
3.4 Performance in Nonstandard Problems
4 Nonparametric and Semiparametric Models
5 Online Sequential Updating
6 Splitting the Number of Covariates
7 Bayesian Divide‐and‐Conquer and Median‐Based Combining
8 Real‐World Applications
9 Discussion
Acknowledgment
References
31 Bayesian Aggregation
1 From Model Selection to Model Combination
1.1 The Bayesian Decision Framework for Model Assessment
1.2 Remodeling: ‐Closed, ‐Complete, and ‐Open Views
2 From Bayesian Model Averaging to Bayesian Stacking
2.1 ‐Closed: Bayesian Model Averaging
2.2 ‐Open: Stacking
2.2.1 Choice of utility
2.3 ‐Complete: Reference‐Model Stacking
2.4 The Connection between BMA and Stacking
2.5 Hierarchical Stacking
2.6 Other Related Methods and Generalizations
3 Asymptotic Theories of Stacking
3.1 Model Aggregation Is No Worse than Model Selection
3.2 Stacking Viewed as Pointwise Model Selection
3.3 Selection or Averaging?
4 Stacking in Practice
4.1 Practical Implementation Using Pareto Smoothed Importance Sampling
4.2 Stacking for Multilevel Data
4.3 Stacking for Time Series Data
4.4 The Choice of Model List
5 Discussion
References
32 Asynchronous Parallel Computing
1 Introduction
1.1 Synchronous and Asynchronous Parallel Computing
1.2 Not All Algorithms Can Benefit from Parallelization
1.3 Outline
1.4 Notation
2 Asynchronous Parallel Coordinate Update
2.1 Least Absolute Shrinkage and Selection Operator (LASSO)
2.2 Nonnegative Matrix Factorization
2.3 Kernel Support Vector Machine
2.4 Decentralized Algorithms
3 Asynchronous Parallel Stochastic Approaches
3.1 Hogwild!
3.2 Federated Learning
4 Doubly Stochastic Coordinate Optimization with Variance Reduction
5 Concluding Remarks
References
Index. a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
Abbreviations and Acronyms
WILEY END USER LICENSE AGREEMENT
Отрывок из книги
Edited by Walter W. Piegorsch University of Arizona
Richard A. Levine San Diego State University
.....
University Park, PA
Lexin Li
.....