Читать книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen - Страница 33
2.2.3 Linear Combination of Variables
ОглавлениеWe are often interested in some linear combinations of the variables x1, x2,…, xp. For example, for the auto_spec
data set, two of the variables are city.mpg
and highway.mpg
. If you expect that 60% of the mileage for a car is on highway and 40% is on local roads, then the average MPG for a car can be estimated as 0.6 × highway.mpg + 0.4 × city.mpg, which is a linear combination of city.mpg
and highway.mpg
. In general, let c1, c2,…, cp be constants and consider the linear combination of the variables x1, x2,…, xp given by
For each observation of the data set, the corresponding value of the variable z can be found by
where cT = (c1 c2 … cp). It can be seen that the sample mean of z is
(2.7)
The sample variance of z can be found as
(2.8)
Because sample variance is always non-negative, for any c ∈ ℛp we have cT Sc ≥ 0 from (2.8). Therefore, the sample covariance matrix S is always a positive semidefinite matrix.
In general, if we have q linear combinations of x1, x2,…, xp defined by:
or in matrix notation,
The sample mean vector and sample covariance matrix of
are given by
(2.9)
(2.10)
Obviously, (2.9) and (2.10) are generalizations of (2.7) and (2.8), respectively.
Example 2.5 For the auto.spec
data set, using the mean()
function of R
the sample means of the variables city.mpg
and highway.mpg
can be found as 25.22 and 30.75, respectively. If we are interested in the overall MPG of a car, denoted by z, as the following weighted average of x1 = city.mpg
and x2 = highway.mpg
:
where c = (0.4 0.6)T. Then by (2.7) the sample mean of the overall MPG in the data set is
To find the sample variance of z, first we obtain the sample covariance matrix for city.mpg
and highway.mpg
using the cov()
function of R
:
cov(auto.spec.df[, c("city.mpg", "highway.mpg")]) cor(auto.spec.df[, c("city.mpg", "highway.mpg")])
The function cor()
calculates the sample correlation matrix. Based on the output from the above R
codes, we have
By (2.8), the sample variance of z is