Читать книгу Modern Characterization of Electromagnetic Systems and its Associated Metrology - Magdalena Salazar-Palma - Страница 16
1.2 Reduced‐Rank Modelling: Bias Versus Variance Tradeoff
ОглавлениеAn important problem in statistical processing of waveforms is that of feature selection, which refers to a transformation whereby a data space is transformed into a feature space that, in theory, has exactly the same dimensions as that of the original space [2]. However, in practical problems, it may be desirable and often necessary to design a transformation in such a way that the data vector can be represented by a reduced number of “effective” features and yet retain most of the intrinsic information content of the input data. In other words, the data vector undergoes a dimensionality reduction [1, 2]. Here, the same principle is applied by attempting to fit an infinite‐dimensional space given by (1.3) to a finite‐dimensional space of dimension p.
An important problem in this estimation of the proper rank is very important. First if the rank is underestimated then a unique solution is not possible. If on the other hand the estimated rank is too large the system equations involved in the parameter estimation problem can become very ill‐conditioned, leading to inaccurate or completely erroneous results if straight forward LU‐decomposition is used to solve for the parameters. Since, it is rarely a “crisp” number that evolves from the solution procedure determining the proper rank requires some analysis of the data and its effective noise level. An approach that uses eigenvalue analysis and singular value decomposition for estimating the effective rank of given data is outlined here.
As an example, consider an M‐dimensional data vector u(n) representing a particular realization of a wide‐sense stationary process. (Stationarity refers to time invariance of some, or, all of the statistics of a random process, such as mean, autocorrelation, nth‐order distribution. A random process X(t) [or X(n)] is said to be strict sense stationary (SSS) if all its finite order distributions are time invariant, i.e., the joint cumulative density functions (cdfs), or probability density functions (pdfs), of X(t1), X(t2), …, X(tk) and X(t1+ τ), X(t2 + τ), …, X(tk + τ) are the same for all k, all t1, t2, …, tk, and all time shifts τ. So for a SSS process, the first‐order distribution is independent of t, and the second‐order distribution — the distribution of any two samples X(t1) and X(t2) — depends only on τ = t2 − t1 To see this, note that from the definition of stationarity, for any t, the joint distribution of X(t1) and X(t2) is the same as the joint distribution of X{t–1 + (t − t1)} = X(t) and X{t2 + (t − t1)} = X{t + (t2 – t)}. An independent and identically distributed (IID) random processes are SSS. A random walk and Poisson processes are not SSS. The Gauss‐Markov process (as we defined it) is not SSS. However, if we set X1 to the steady state distribution of Xn, it becomes SSS. A random process X(t) is said to be wide‐sense stationary (WSS) if its mean, i.e., ε (X(t)) = μ, is independent of t, and its autocorrelation functions RX(t1, t2) is a function only of the time difference t2 − t1 and are time invariant. Also ε [X(t)2] < ∞ (technical condition) is necessary, where ε represents the expected value in a statistical sense. Since RX(t1, t2) = RX(t2, t1), for any wide sense stationary process X(t), RX(t1, t2) is a function only of |t2 – t1|. Clearly a SSS implies a WSS. The converse is not necessarily true. The necessary and sufficient conditions for a function to be an autocorrelation function for a WSS process is that it be real, even, and nonnegative definite By nonnegative definite we mean that for any n, any t1, t2, …, tn and any real vector a = (a1, …, an), and X(n); aiajR(ti − tj) ≥ 0. The power spectral density (psd) SX(f) of a WSS random process X(t) is the Fourier transform of RX(τ), i.e., SX(f ) = ℑ{RX(τ)} = . For a discrete time process Xn, the power spectral density is the discrete‐time Fourier transform (DTFT) of the sequence RX(n): SX( f ) = . Therefore RX(τ) (or RX(n)) can be recovered from SX( f ) by taking the inverse Fourier transform or inverse DTFT.
In summary, WSS is a less restrictive stationary process and uses a somewhat weaker type of stationarity. It is based on requiring the mean to be a constant in time and the covariance sequence to depend only on the separation in time between the two samples. The final goal in model order reduction of a WSS is to transform the M‐dimensional vector to a p‐dimensional vector, where p < M. This transformation is carried out using the Karhunen‐Loeve expansion [2]. The data vector is expanded in terms of qi, the eigenvectors of the correlation matrix [R], defined by
(1.4)
and the superscript H represents the conjugate transpose of u(n). Therefore, one obtains
(1.5)
so that
(1.6)
where {λi} are the eigenvalues of the correlation matrix, {qi} represent the eigenvectors of the matrix R, and {ci(n)} are the coefficients defined by
(1.7)
To obtain a reduced rank approximation of u(n), one needs to write
(1.8)
where p < M. The reconstruction error Ξ is then defined as
(1.9)
Hence the approximation will be good if the remaining eigenvalues λp + 1, …λM are all very small.
Now to illustrate the implications of a low rank model [2], consider that the data vector u(n) is corrupted by the noise v(n). Then the data y(n) is represented by
(1.10)
Since the data and the noise are uncorrelated,
(1.11)
where [0] and [Ι] are the null and identity matrices, respectively, and the variance of the noise at each element is σ2. The mean squared error now in a noisy environment is
(1.12)
Now to make a low‐rank approximation in a noisy environment, define the approximated data vector by
(1.13)
In this case, the reconstruction error for the reduced‐rank model is given by
(1.14)
This equation implies that the mean squared error Ξrr in the low‐rank approximation is smaller than the mean squared error Ξo to the original data vector without any approximation, if the first term in the summation is small. So low‐rank modelling provides some advantages provided
(1.15)
which illustrates the result of a bias‐variance trade off. In particular, it illustrates that using a low‐rank model for representing the data vector u(n) incurs a bias through the p terms of the basis vector. Interestingly enough, introducing this bias is done knowingly in return for a reduction in variance, namely the part of the mean squared error due to the additive noise vector v(n). This illustrates that the motivation for using a simpler model that may not exactly match the underlying physics responsible for generating the data vector u(n), hence the bias, but the model is less susceptible to noise, hence a reduction in variance [1, 2].
We now use this principle in the interpolation/extrapolation of various system responses. Since the data are from a linear time invariant (LTI) system that has a bounded input and a bounded output and satisfy a second‐order partial differential equation, the associated time‐domain eigenvectors are sums of complex exponentials and in the transformed frequency domain are ratios of two polynomials. As discussed, these eigenvectors form the optimal basis in representing the given data and hence can also be used for interpolation/extrapolation of a given data set. Consequently, we will use either of these two models to fit the data as seems appropriate. To this effect, we present the Matrix Pencil Method (MP) which approximates the data by a sum of complex exponentials and in the transformed domain by the Cauchy Method (CM) which fits the data by a ratio of two rational polynomials. In applying these two techniques it is necessary to be familiar two other topics which are the singular value decomposition and the total least squares which are discussed next.