Читать книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen - Страница 40
3.3 Maximum Likelihood Estimation for Multivariate Normal Distributions
ОглавлениеIf the population distribution is assumed to be multivariate normal with mean vector μ and covariance matrix Σ. The parameters μ and Σ can be estimated from a random sample of n observations x1, x2,…, xn. A commonly used method for parameter estimation is the maximum likelihood estimation (MLE), and the estimated parameter values are called the maximum likelihood estimates. The idea of the maximum likelihood estimation is to find μ and Σ that maximize the joint density of the x’s, which is called the likelihood function. For multivariate normal distribution, the likelihood function is
(3.13)
It is often easier to find the MLE by minimizing the negative log likelihood function, which is given by
(3.14)
Taking the derivative of (3.14) with respect to μ, we have
(3.15)
Setting the partial derivative in (3.15) to zero, the MLE of μ is obtained as
(3.16)
which is the sample mean vector of the data set x1, x2,…, xn. The derivation of the MLE of Σ is more involved and beyond the scope of this book. The result is given by
(3.17)
where S is the sample covariance matrix as given in (2.6). Since the MLE uses n instead of n – 1 in the denominator, it is a biased estimator. So the sample covariance matrix S is more commonly used to estimate Σ, especially when n is small.
One useful property of MLE is the invariance property. In general, let denote the MLE of the parameter vector θ. Then the MLE of a function of θ, denoted by h(θ), is given by h(). This result makes it very convenient to find the MLE of any function of a parameter, given the MLE of the parameter. For example, based on (3.17), it is easy to see that the MLE of the variance of Xj, the jth element of X, is given by
Then based on the invariance property, the MLE of the standard deviation √σjj is .
The MLE has some good asymptotic properties and usually performs well for data sets of large sample sizes. For example, under mild regularity conditions, MLE satisfies the property of consistency, which guarantees that the estimator converges to the true value of the parameter as the sample size becomes infinite. In addition, under certain regularity conditions, the MLE is asymptotically normal and efficient. That is, as the sample size becomes infinite, the distribution of MLE will converge to a normal distribution with variance equal to the optimal asymptotic variance. The details of the regularity conditions are beyond the scope of this book. But these conditions are quite general and often satisfied in common circumstances.