Читать книгу Data Science in Theory and Practice - Maria Cristina Mariani - Страница 30

3.4 Variance–Covariance Matrices

Оглавление

The variance–covariance matrix (or simply the covariance matrix) of a random vector is given by


where is the mean vector.

The sample covariance matrix is the matrix of sample variances and covariances of the variables.

(3.1)

In (3.1), the covariance matrix consists of the variances of the variables along the main diagonal and the covariances between each pair of variables in the other matrix positions. The sample covariance of the th and th variables, , is calculated using the th and th columns of :

(3.2)

where is the number of measurements.

For example if :

(3.3)

and if :

(3.4)

we have the sample variance.

The sample covariance measures the association between the th and th variables. The sample covariance reduces to the sample variance when as observed in (3.4). We note that the sample covariance matrix (3.1) is symmetric, i.e. for all and because of its definition. Other names used for the covariance matrix are variance matrix, variance–covariance matrix, and dispersion matrix. In finance the concept of covariance is applied in portfolio theory, in the diversification method, that reduces the risk by choosing assets that do not present a high positive covariance with each other.

If is a random vector taking on any possible value in a multivariate population, the population covariance matrix is defined as

(3.5)

Just like the sample covariance case defined in (3.1), the diagonal elements are the population variances of the 's, and the off‐diagonal elements are the population covariances of all possible pairs of s, i.e. for .

The notation for the covariance matrix is widely used and seems natural because is the uppercase version of .

Example 3.3 Consider the following data matrix introduced in Example 3.1:


Each receipt yields a pair of measurements, total dollar sales, and number of movies sold. Since there are three receipts, we have a total of three observations on each variable. We find the sample variances and covariance as follows:


Therefore,


Data Science in Theory and Practice

Подняться наверх