Читать книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen - Страница 32

2.2.2 Sample Mean Vector and Sample Covariance Matrix

A multivariate data set consists of n observations collected from n items or units and each observation contains measurements on p variables, x₁, x₂,…, xp. The measurement vector for the ith observation is denoted by

The sample mean vector is the vector of sample means for the p variables, which is defined as

where x̄k is the sample mean of

The sample covariance matrix S is the matrix of sample variances and covariances of the p variables:

The off-diagonal elements of S is the sample covariances of each pair of variables. For j ≠ k,

(2.5)

The diagonal elements of S, sjj, j = 1,…,p are the sample variance of the jth variable. It is easy to see that when k = j, the sample covariance in (2.5) is equal to sj², the sample variance of the jth variable. So both notations sjj and sj² represent the sample variance of xj. It is also obvious from (2.5) that skj. So the sample covariance matrix S is a symmetric matrix. The sample covariance matrix S can also be written by the observation vector xi as

(2.6)

Similarly, we define the sample correlation matrix as

The (j, k)th element of R is the sample correlation of the jth and kth variables:

The sample correlation between a variable and itself is equal to 1. So the diagonal elements of a sample correlation matrix are all equal to 1. The sample correlation matrix R is obviously symmetric since rjk = rkj.

Example 2.4 Consider the data set in Table 2.1. In Example 2.2, we found that x̄₁ = 2479.5 and x̄₂ = 170.35. Similarly, we can obtain x̄₃ = 65.41. So the mean vector of x = (x₁ x₂ x₃)T is given by

In Example 2.2, we calculated the sample variances, sample covariance, and sample correlation of x₁ and x₂. Similarly, we can obtain the sample variance of x₃ and its sample covariance and correlation with the other two variables as

Note that while s₂₃ is much smaller than s₁₃, r₂₃ is greater than r₁₃, which indicates that the linear association between x₂ and x₃ is stronger than that of x₁ and x₃. This clearly shows that the magnitude of the covariance itself is not meaningful in characterizing how strong the relationship of two variables is. Combining all the sample variance, covariance, and correlation information, the sample covariance matrix and sample correlation matrix of x = (x₁ x₂ x₃)T can be written as

Industrial Data Analytics for Diagnosis and Prognosis

Подняться наверх