Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 35
Design Example 2.5
ОглавлениеAssume that we are analyzing scientific articles related to a specific domain. Each article will be represented by a vector x of word frequencies; that is, we choose a set of M words representative of our scientific area, and we annotate how many times each word appears in each article. Each vector x is then orthogonally projected onto the new subspace defined by the vectors wi. Each vector wi has dimension M, and it can be understood as a “topic” (i.e. a topic is characterized by the relative frequencies of the M different words; two different topics will differ in the relative frequencies of the M words). The projection of x onto each wi gives an idea of how important topic wi is for representing the article. Important topics have large projection values and, therefore, large values in the corresponding component of χ.
It can be shown [43, 47], as already indicated in Section 2.1, that when the input vectors, x, are zero‐mean (if they are not, we can transform the input data simply by subtracting the sample average vector), then the solution of the minimization of JPCA is given by the m eigenvectors associated to the largest m eigenvalues of the covariance matrix of x {, note that the covariance matrix of x is a M × M matrix with M eigenvalues). If the eigenvalue decomposition of the input covariance matrix is (since Cx is a real‐symmetric matrix), then the feature vectors are constructed as , where Λm is a diagonal matrix with the m largest eigenvalues of the matrix ΛM and Wm are the corresponding m columns from the eigenvectors matrix WM. We could have constructed all the feature vectors at the same time by projecting the whole matrix X, . Note that the i‐th feature is the projection of the input vector x onto the i‐th eigenvector, . The computed feature vectors have an identity covariance matrix, Cχ = I, meaning that the different features are decorrelated.
Univariate variance is a second‐order statistical measure of the departure of the input observations with respect to the sample mean. A generalization of the univariate variance to multivariate variables is the trace of the input covariance matrix. By choosing the m largest eigenvalues of the covariance matrix Cx, we guarantee that we are making a representation in the feature space explaining as much variance of the input space as possible with only m variables. As already indicated in Section 2.1, in fact, w1 is the direction in which the data exhibit the largest variability, w2 is the direction with largest variability once the variability along w1 has been removed, w3 is the direction with largest variability once the variability along w1 and w2 has been removed, and so on. Thanks to the orthogonality of the wi vectors, and the subsequent decorrelation of the feature vectors, the total variance explained by PCA decomposition can be conveniently measured as the sum of the variances of each feature,