Читать книгу Multiblock Data Fusion in Statistics and Machine Learning - Tormod Næs - Страница 4
Contents
Оглавление1 Cover
4 Foreword
5 Preface
8 Part I Introductory Concepts and Theory1 Introduction1.1 Scope of the Book1.2 Potential Audience1.3 Types of Data and Analyses1.3.1 Supervised and Unsupervised Analyses1.3.2 High-, Mid- and Low-level Fusion1.3.3 Dimension Reduction1.3.4 Indirect Versus Direct Data1.3.5 Heterogeneous Fusion1.4 Examples1.4.1 Metabolomics1.4.2 Genomics1.4.3 Systems Biology1.4.4 Chemistry1.4.5 Sensory Science1.5 Goals of Analyses1.6 Some History1.7 Fundamental Choices1.8 Common and Distinct Components1.9 Overview and Links1.10 Notation and Terminology1.11 Abbreviations2 Basic Theory and Concepts2.i General Introduction2.1 Component Models2.1.1 General Idea of Component Models2.1.2 Principal Component Analysis2.1.3 Sparse PCA2.1.4 Principal Component Regression2.1.5 Partial Least Squares2.1.6 Sparse PLS2.1.7 Principal Covariates Regression2.1.8 Redundancy Analysis2.1.9 Comparing PLS, PCovR and RDA2.1.10 Generalised Canonical Correlation Analysis2.1.11 Simultaneous Component Analysis2.2 Properties of Data2.2.1 Data Theory2.2.2 Scale-types2.3 Estimation Methods2.3.1 Least-squares Estimation2.3.2 Maximum-likelihood Estimation2.3.3 Eigenvalue Decomposition-based Methods2.3.4 Covariance or Correlation-based Estimation Methods2.3.5 Sequential Versus Simultaneous Methods2.3.6 Homogeneous Versus Heterogeneous Fusion2.4 Within- and Between-block Variation2.4.1 Definition and Example2.4.2 MAXBET Solution2.4.3 MAXNEAR Solution2.4.4 PLS2 Solution2.4.5 CCA Solution2.4.6 Comparing the Solutions2.4.7 PLS, RDA and CCA Revisited2.5 Framework for Common and Distinct Components2.6 Preprocessing2.7 Validation2.7.1 Outliers2.7.1.1 Residuals2.7.1.2 Leverage2.7.2 Model Fit2.7.3 Bias-variance Trade-off2.7.4 Test Set Validation2.7.5 Cross-validation2.7.6 Permutation Testing2.7.7 Jackknife and Bootstrap2.7.8 Hyper-parameters and Penalties2.8 Appendix3 Structure of Multiblock Data3.i General Introduction3.1 Taxonomy3.2 Skeleton of a Multiblock Data Set3.2.1 Shared Sample Mode3.2.2 Shared Variable Mode3.2.3 Shared Variable or Sample Mode3.2.4 Shared Variable and Sample Mode3.3 Topology of a Multiblock Data Set3.3.1 Unsupervised Analysis3.3.2 Supervised Analysis3.4 Linking Structures3.4.1 Linking Structure for Unsupervised Analysis3.4.2 Linking Structures for Supervised Analysis3.5 Summary4 Matrix Correlations4.i General Introduction4.1 Definition4.2 Most Used Matrix Correlations4.2.1 Inner Product Correlation4.2.2 GCD coefficient4.2.3 RV-coefficient4.2.4 SMI-coefficient4.3 Generic Framework of Matrix Correlations4.4 Generalised Matrix Correlations4.4.1 Generalised RV-coefficient4.4.2 Generalised Association Coefficient4.5 Partial Matrix Correlations4.6 Conclusions and Recommendations4.7 Open Issues
9 Part II Selected Methods for Unsupervised and Supervised Topologies5 Unsupervised Methods5.i General Introduction5.ii Relations to the General Framework5.1 Shared Variable Mode5.1.1 Only Common Variation5.1.1.1 Simultaneous Component Analysis5.1.1.2 Clustering and SCA5.1.1.3 Multigroup Data Analysis5.1.2 Common, Local, and Distinct Variation5.1.2.1 Distinct and Common Components5.1.2.2 Multivariate Curve Resolution5.2 Shared Sample Mode5.2.1 Only Common Variation5.2.1.1 SUM-PCA5.2.1.2 Multiple Factor Analysis and STATIS5.2.1.3 Generalised Canonical Analysis5.2.1.4 Regularised Generalised Canonical Correlation Analysis5.2.1.5 Exponential Family SCA5.2.1.6 Optimal-scaling5.2.2 Common, Local, and Distinct Variation5.2.2.1 Joint and Individual Variation Explained5.2.2.2 Distinct and Common Components5.2.2.3 PCA-GCA5.2.2.4 Advanced Coupled Matrix and Tensor Factorisation5.2.2.5 Penalised-ESCA5.2.2.6 Multivariate Curve Resolution5.3 Generic Framework5.3.1 Framework for Simultaneous Unsupervised Methods5.3.1.1 Description of the Framework5.3.1.2 Framework Applied to Simultaneous Unsupervised Data Analysis Methods5.3.1.3 Framework of Common/Distinct Applied to Simultaneous Unsupervised Multiblock Data Analysis Methods5.4 Conclusions and Recommendations5.5 Open Issues6 ASCA and Extensions6.i General Introduction6.ii Relations to the General Framework6.1 ANOVA-Simultaneous Component Analysis6.1.1 The ASCA Method6.1.2 Validation of ASCA6.1.2.1 Permutation Testing6.1.2.2 Back-projection6.1.2.3 Confidence Ellipsoids6.1.3 The ASCA+ and LiMM-PCA Methods6.2 Multilevel-SCA6.3 Penalised-ASCA6.4 Conclusions and Recommendations6.5 Open Issues7 Supervised Methods7.i General Introduction7.ii Relations to the General Framework7.1 Multiblock Regression: General Perspectives7.1.1 Model and Assumptions7.1.2 Different Challenges and Aims7.2 Multiblock PLS Regression7.2.1 Standard Multiblock PLS Regression7.2.2 MB-PLS Used for Classification7.2.3 Sparse Multiblock PLS Regression (sMB-PLS)7.3 The Family of SO-PLS Regression Methods (Sequential and Orthogonalised PLS Regression) 7.3.1 The SO-PLS Method 7.3.2 Order of Blocks 7.3.3 Interpretation Tools 7.3.4 Restricted PLS Components and their Application in SO-PLS 7.3.5 Validation and Component Selection 7.3.6 Relations to ANOVA 7.3.7 Extensions of SO-PLS to Handle Interactions Between Blocks 7.3.8 Further Applications of SO-PLS 7.3.9 Relations Between SO-PLS and ASCA 7.4 Parallel and Orthogonalised PLS (PO-PLS) Regression 7.5 Response Oriented Sequential Alternation 7.5.1 The ROSA Method 7.5.2 Validation7.5.3 Interpretation7.6 Conclusions and Recommendations7.7 Open Issues
10 Part III Methods for Complex Multiblock Structures8 Complex Block Structures; with Focus on L-Shape Relations8.i General Introduction8.ii Relations to the General Framework8.1 Analysis of L-shape Data: General Perspectives8.2 Sequential Procedures for L-shape Data Based on PLS/PCR and ANOVA8.2.1 Interpretation of X1, Quantitative X2-data, Horizontal Axis First8.2.2 Interpretation of X1, Categorical X2-data, Horizontal Axis First8.2.3 Analysis of Segments/Clusters of X1 Data8.3 The L-PLS Method for Joint Estimation of Blocks in L-shape Data8.3.1 The Original L-PLS Method, Endo-L-PLS8.3.2 Exo- Versus Endo-L-PLS8.4 Modifications of the Original L-PLS Idea8.4.1 Weighting Information from X3 and X1 in L-PLS Using a Parameter α8.4.2 Three-blocks Bifocal PLS8.5 Alternative L-shape Data Analysis Methods8.5.1 Principal Component Analysis with External Information8.5.2 A Simple PCA Based Procedure for Using Unlabelled Data in Calibration8.5.3 Multivariate Curve Resolution for Incomplete Data8.5.4 An Alternative Approach in Consumer Science Based on Correlations Between X3 and X18.6 Domino PLS and More Complex Data Structures8.7 Conclusions and Recommendations8.8 Open Issues
11 Part IV Alternative Methods for Unsupervised and Supervised Topologies9 Alternative Unsupervised Methods9.i General Introduction9.ii Relationship to the General Framework9.1 Shared Variable Mode9.2 Shared Sample Mode9.2.1 Only Common Variation9.2.1.1 DIABLO9.2.1.2 Generalised Coupled Tensor Factorisation9.2.1.3 Representation Matrices9.2.1.4 Extended PCA9.2.2 Common, Local, and Distinct Variation9.2.2.1 Generalised SVD9.2.2.2 Structural Learning and Integrative Decomposition9.2.2.3 Bayesian Inter-battery Factor Analysis9.2.2.4 Group Factor Analysis9.2.2.5 OnPLS9.2.2.6 Generalised Association Study9.2.2.7 Multi-Omics Factor Analysis9.3 Two Shared Modes and Only Common Variation9.3.1 Generalised Procrustes Analysis9.3.2 Three-way Methods9.4 Conclusions and Recommendations9.4.1 Open Issues10 Alternative Supervised Methods10.i General Introduction10.ii Relations to the General Framework10.1 Model and Focus10.2 Extension of PCovR10.2.1 Sparse Multiblock Principal Covariates Regression, Sparse PCovR10.2.2 Multiway Multiblock Covariates Regression10.3 Multiblock Redundancy Analysis10.3.1 Standard Multiblock Redundancy Analysis10.3.2 Sparse Multiblock Redundancy Analysis10.4 Miscellaneous Multiblock Regression Methods10.4.1 Multiblock Variance Partitioning10.4.2 Network Induced Supervised Learning10.4.3 Common Dimensions for Multiblock Regression10.5 Modifications and Extensions of the SO-PLS Method10.5.1 Extensions of SO-PLS to Three-Way Data10.5.2 Variable Selection for SO-PLS10.5.3 More Complicated Error Structure for SO-PLS10.5.4 SO-PLS Used for Path Modelling10.6 Methods for Data Sets Split Along the Sample Mode, Multigroup Methods10.6.1 Multigroup PLS Regression10.6.2 Clustering of Observations in Multiblock Regression10.6.3 Domain-Invariant PLS, DI-PLS10.7 Conclusions and Recommendations10.8 Open Issues
12 Part V Software11 Algorithms and Software11.1 Multiblock Software11.2 R package multiblock11.3 Installing and Starting the Package11.4 Data Handling11.4.1 Read From File11.4.2 Data Pre-processing11.4.3 Re-coding Categorical Data11.4.4 Data Structures for Multiblock Analysis11.4.4.1 Create List of Blocks11.4.4.2 Create data.frame of Blocks11.5 Basic Methods11.5.1 Prepare Data11.5.2 Modelling11.5.3 Common Output Elements Across Methods11.5.4 Scores and Loadings11.6 Unsupervised Methods11.6.1 Formatting Data for Unsupervised Data Analysis11.6.2 Method Interfaces11.6.3 Shared Sample Mode Analyses11.6.4 Shared Variable Mode11.6.5 Common Output Elements Across Methods11.6.6 Scores and Loadings11.6.7 Plot From Imported Package11.7 ANOVA Simultaneous Component Analysis11.7.1 Formula Interface11.7.2 Simulated Data11.7.3 ASCA Modelling11.7.4 ASCA Scores11.7.5 ASCA Loadings11.8 Supervised Methods11.8.1 Formatting Data for Supervised Analyses11.8.2 Multiblock Partial Least Squares11.8.2.1 MB-PLS Modelling11.8.2.2 MB-PLS Summaries and Plotting11.8.3 Sparse Multiblock Partial Least Squares11.8.3.1 Sparse MB-PLS Modelling11.8.3.2 Sparse MB-PLS Plotting11.8.4 Sequential and Orthogonalised Partial Least Squares11.8.4.1 SO-PLS Modelling11.8.4.2 Måge Plot11.8.4.3 SO-PLS Loadings11.8.4.4 SO-PLS Scores11.8.4.5 SO-PLS Prediction11.8.4.6 SO-PLS Validation11.8.4.7 Principal Components of Predictions11.8.4.8 CVANOVA11.8.5 Parallel and Orthogonalised Partial Least Squares11.8.5.1 PO-PLS Modelling11.8.5.2 PO-PLS Scores and Loadings11.8.6 Response Optimal Sequential Alternation11.8.6.1 ROSA Modelling11.8.6.2 ROSA Loadings11.8.6.3 ROSA Scores11.8.6.4 ROSA Prediction11.8.6.5 ROSA Validation11.8.6.6 ROSA Image Plots11.8.7 Multiblock Redundancy Analysis11.8.7.1 MB-RDA Modelling11.8.7.2 MB-RDA Loadings and Scores11.9 Complex Data Structures11.9.1 L-PLS11.9.1.1 Simulated L-shaped Data11.9.1.2 Exo-L-PLS11.9.1.3 Endo-L-PLS11.9.1.4 L-PLS Cross-validation11.9.2 SO-PLS-PM11.9.2.1 Single SO-PLS-PM Model11.9.2.2 Multiple Paths in an SO-PLS-PM Model11.10 Software Packages11.10.1 R Packages11.10.2 MATLAB Toolboxes11.10.3 Python11.10.4 Commercial Software
13 References
14 Index