Читать книгу Data Science in Theory and Practice - Maria Cristina Mariani - Страница 28

3.2 Multivariate Analysis: Overview

Оглавление

We begin with the formal definition of multivariate analysis.

Definition 3.1 (Multivariate analysis) Multivariate analysis consists of a collection of techniques that can be used when several measurements are made on each experimental unit.

These measurements (i.e. data) must frequently be arranged and displayed in various ways. We now discuss the concepts underlying the first steps of data organization.

Multivariate data arise whenever an investigator, practitioner, or researcher seeks to study some physical phenomenon and selects a number of variables to record. We will use the notation to indicate the particular value of the th variable that is observed on the th unit (i.e. subject ). Hence, measurements on variables can be displayed as a rectangular array called data matrix , of rows and columns:


The rectangular array contains the data consisting of all of the observations on all of the variables.

Example 3.1 (A data array) A selection of three receipts from Bestbuy was obtained in order to investigate the nature of movie sales. Each receipt provided, among other things, the number of movies sold and the total amount of each sale. Let the first variable be total dollar sales and the second variable be number of movies sold. Then we can take the corresponding numbers on the receipts as three measurements on two variables. From the above description, we obtain the tabular form of the data as follows:


Then the data matrix is


with three rows and two columns.

We now present some descriptive statistics. We will begin with the mean vectors.

Data Science in Theory and Practice

Подняться наверх