Читать книгу Multiblock Data Fusion in Statistics and Machine Learning - Tormod Næs - Страница 14

1 Introduction 1.1 Scope of the Book

Оглавление

In many areas of the natural and life sciences, data sets are collected consisting of multiple blocks of data measured on the same or similar systems. Examples are abundant, e.g., in genomics it is becoming increasingly common to measure gene-expression, protein abundances and metabolite levels on the same biological system (Clish et al., 2004; Heijne et al., 2005; Kleemann et al., 2007; Curtis et al., 2012; Brink-Jensen et al., 2013; Franzosa et al., 2015). In sensory science, the interest is often in relations between the chemical and sensory properties of the samples involved as well as consumer liking of the same samples (Næs et al., 2010). In chemistry, sometimes different types of instruments are utilised to characterise different properties the same set of samples (de Juan and Tauler, 2006). In cohort studies, it is increasingly popular to perform the same type of measurements in different cohorts to confirm results and perform meta-analyses. In (bio-)chemical process industry, plant-wide measurements are available collected by several sensors in the plant (Lopes et al., 2002). Clinical trials are often supported by auxiliary measurements such as gene-expression and cytokines to characterise immune responses (Coccia et al., 2018). Challenge tests to establish the health status of individuals usually contain multiple types of data collected for the same individuals as a function of time (Wopereis et al., 2009; Pellis et al., 2012; Kardinaal et al., 2015). All these examples show that simple data sets are increasingly becoming less common.

Unfortunately, there is no consensus yet about terminology regarding the structure of such data sets and the related research questions. In bioinformatics, the terms data fusion or data integration are often used where the latter distinguishes also N- or P-integration (N means the same samples and P means the same variables), horizontal and vertical integration. In psychometrics, the terms multiset and multigroup data analysis are used; in chemometrics, multiblock data analysis is in use and in the computational sciences and machine learning the term multiview or multitable data analysis is used. We will encounter all these terms in this book but we will use the noun multiblock as much as possible.1

In Elaboration 1.1 we define the terms concerning data sets we will use throughout in this book. Sometimes, we will sidestep this to some extent to make connections between fields. At those places we will clarify exactly what we mean.

ELABORATION 1.1

Multiblock Data Fusion in Statistics and Machine Learning

Подняться наверх