Читать книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen - Страница 33

2.2.3 Linear Combination of Variables

Оглавление

We are often interested in some linear combinations of the variables x1, x2,…, xp. For example, for the auto_spec data set, two of the variables are city.mpg and highway.mpg. If you expect that 60% of the mileage for a car is on highway and 40% is on local roads, then the average MPG for a car can be estimated as 0.6 × highway.mpg + 0.4 × city.mpg, which is a linear combination of city.mpg and highway.mpg. In general, let c1, c2,…, cp be constants and consider the linear combination of the variables x1, x2,…, xp given by


For each observation of the data set, the corresponding value of the variable z can be found by


where cT = (c1 c2cp). It can be seen that the sample mean of z is

(2.7)

The sample variance of z can be found as

(2.8)

Because sample variance is always non-negative, for any cℛp we have cT Sc ≥ 0 from (2.8). Therefore, the sample covariance matrix S is always a positive semidefinite matrix.

In general, if we have q linear combinations of x1, x2,…, xp defined by:


or in matrix notation,


The sample mean vector and sample covariance matrix of


are given by

(2.9)

(2.10)

Obviously, (2.9) and (2.10) are generalizations of (2.7) and (2.8), respectively.

Example 2.5 For the auto.spec data set, using the mean() function of R the sample means of the variables city.mpg and highway.mpg can be found as 25.22 and 30.75, respectively. If we are interested in the overall MPG of a car, denoted by z, as the following weighted average of x1 = city.mpg and x2 = highway.mpg:


where c = (0.4 0.6)T. Then by (2.7) the sample mean of the overall MPG in the data set is


To find the sample variance of z, first we obtain the sample covariance matrix for city.mpg and highway.mpg using the cov() function of R:

cov(auto.spec.df[, c("city.mpg", "highway.mpg")]) cor(auto.spec.df[, c("city.mpg", "highway.mpg")])

The function cor() calculates the sample correlation matrix. Based on the output from the above R codes, we have


By (2.8), the sample variance of z is


Industrial Data Analytics for Diagnosis and Prognosis

Подняться наверх