Читать книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen - Страница 32

2.2.2 Sample Mean Vector and Sample Covariance Matrix

Оглавление

A multivariate data set consists of n observations collected from n items or units and each observation contains measurements on p variables, x1, x2,…, xp. The measurement vector for the ith observation is denoted by


The sample mean vector is the vector of sample means for the p variables, which is defined as


where x̄k is the sample mean of

The sample covariance matrix S is the matrix of sample variances and covariances of the p variables:


The off-diagonal elements of S is the sample covariances of each pair of variables. For j ≠ k,

(2.5)

The diagonal elements of S, sjj, j = 1,…,p are the sample variance of the jth variable. It is easy to see that when k = j, the sample covariance in (2.5) is equal to sj2, the sample variance of the jth variable. So both notations sjj and sj2 represent the sample variance of xj. It is also obvious from (2.5) that skj. So the sample covariance matrix S is a symmetric matrix. The sample covariance matrix S can also be written by the observation vector xi as

(2.6)

Similarly, we define the sample correlation matrix as


The (j, k)th element of R is the sample correlation of the jth and kth variables:


The sample correlation between a variable and itself is equal to 1. So the diagonal elements of a sample correlation matrix are all equal to 1. The sample correlation matrix R is obviously symmetric since rjk = rkj.

Example 2.4 Consider the data set in Table 2.1. In Example 2.2, we found that 1 = 2479.5 and 2 = 170.35. Similarly, we can obtain 3 = 65.41. So the mean vector of x = (x1 x2 x3)T is given by


In Example 2.2, we calculated the sample variances, sample covariance, and sample correlation of x1 and x2. Similarly, we can obtain the sample variance of x3 and its sample covariance and correlation with the other two variables as


Note that while s23 is much smaller than s13, r23 is greater than r13, which indicates that the linear association between x2 and x3 is stronger than that of x1 and x3. This clearly shows that the magnitude of the covariance itself is not meaningful in characterizing how strong the relationship of two variables is. Combining all the sample variance, covariance, and correlation information, the sample covariance matrix and sample correlation matrix of x = (x1 x2 x3)T can be written as


Industrial Data Analytics for Diagnosis and Prognosis

Подняться наверх