Читать книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen - Страница 33

2.2.3 Linear Combination of Variables

We are often interested in some linear combinations of the variables x₁, x₂,…, xp. For example, for the auto_spec data set, two of the variables are city.mpg and highway.mpg. If you expect that 60% of the mileage for a car is on highway and 40% is on local roads, then the average MPG for a car can be estimated as 0.6 × highway.mpg + 0.4 × city.mpg, which is a linear combination of city.mpg and highway.mpg. In general, let c₁, c₂,…, cp be constants and consider the linear combination of the variables x₁, x₂,…, xp given by

For each observation of the data set, the corresponding value of the variable z can be found by

where cT = (c₁ c₂ … cp). It can be seen that the sample mean of z is

(2.7)

The sample variance of z can be found as

(2.8)

Because sample variance is always non-negative, for any c ∈ ℛp we have c^T Sc ≥ 0 from (2.8). Therefore, the sample covariance matrix S is always a positive semidefinite matrix.

In general, if we have q linear combinations of x₁, x₂,…, xp defined by:

or in matrix notation,

The sample mean vector and sample covariance matrix of

are given by

(2.9)

(2.10)

Obviously, (2.9) and (2.10) are generalizations of (2.7) and (2.8), respectively.

Example 2.5 For the auto.spec data set, using the mean() function of R the sample means of the variables city.mpg and highway.mpg can be found as 25.22 and 30.75, respectively. If we are interested in the overall MPG of a car, denoted by z, as the following weighted average of x₁ = city.mpg and x₂ = highway.mpg:

where c = (0.4 0.6)T. Then by (2.7) the sample mean of the overall MPG in the data set is

To find the sample variance of z, first we obtain the sample covariance matrix for city.mpg and highway.mpg using the cov() function of R:

cov(auto.spec.df[, c("city.mpg", "highway.mpg")]) cor(auto.spec.df[, c("city.mpg", "highway.mpg")])

The function cor() calculates the sample correlation matrix. Based on the output from the above R codes, we have

By (2.8), the sample variance of z is

Industrial Data Analytics for Diagnosis and Prognosis

Подняться наверх