Читать книгу The Investment Advisor Body of Knowledge + Test Bank - IMCA - Страница 35
CHAPTER 3
Statistics and Methods
Part III Basic Statistics
ОглавлениеIn this section we will learn how to describe a collection of data in precise statistical terms. Many of the concepts will be familiar, but the notation and terminology might be new. This notation and terminology will be used throughout the rest of the book.
Averages
Everybody knows what an average is. We come across averages every day, whether they are earned-run averages in baseball or grade point averages in school. In statistics there are actually three different types of averages: means, modes, and medians. By far the most commonly used average in risk management is the mean.
POPULATION AND SAMPLE DATA
If you wanted to know the mean age of people working in your firm, you would simply ask every person in the firm his or her age, add the ages together, and divide by the number of people in the firm. Assuming there are n employees and ai is the age of the ith employee, then the mean, μ, is simply:
It is important at this stage to differentiate between population statistics and sample statistics. In this example, μ is the population mean. Assuming nobody lied about his or her age, and forgetting about rounding errors and other trivial details, we know the mean age of people in your firm exactly. We have a complete data set of everybody in your firm; we've surveyed the entire population.
This state of absolute certainty is, unfortunately, quite rare in finance. More often, we are faced with a situation such as this: estimate the mean return of stock ABC, given the most recent year of daily returns. In a situation like this, we assume there is some underlying data generating process, whose statistical properties are constant over time. The underlying process still has a true mean, but we cannot observe it directly. We can only estimate that mean based on our limited data sample. In our example, assuming n returns, we estimate the mean using the same formula as before:
where (pronounced “mu hat”) is our estimate of the true mean based on our sample of n returns. We call this the sample mean.
The median and mode are also types of averages. They are used less frequently in finance, but both can be useful. The median represents the center of a group of data; within the group, half the data points will be less than the median, and half will be greater. The mode is the value that occurs most frequently.
Sample Problem
Question:
Calculate the mean, median, and mode of the following data set:
Answer:
If there is an even number of data points, the median is found by averaging the two center-most points. In the following series:
the median is 15 percent. The median can be useful for summarizing data that is asymmetrical or contains significant outliers.
A data set can also have more than one mode. If the maximum frequency is shared by two or more values, all of those values are considered modes. In the following example, the modes are 10 percent and 20 percent:
In calculating the mean in Equation 3.21 and Equation 3.22, each data point was counted exactly once. In certain situations, we might want to give more or less weight to certain data points. In calculating the average return of stocks in an equity index, we might want to give more weight to larger firms, perhaps weighting their returns in proportion to their market capitalization. Given n data points, xi = x1, x2, … , xn, with corresponding weights, wi, we can define the weighted mean, μw, as:
(3.23)
The standard mean from Equation 3.21 can be viewed as a special case of the weighted mean, where all the values have equal weight.
DISCRETE RANDOM VARIABLES
For a discrete random variable, we can also calculate the mean, median, and mode. For a random variable, X, with possible values, xi, and corresponding probabilities, pi, we define the mean, μ, as:
(3.24)
The equation for the mean of a discrete random variable is a special case of the weighted mean, where the outcomes are weighted by their probabilities, and the sum of the weights is equal to one.
The median of a discrete random variable is the value such that the probability that a value is less than or equal to the median is equal to 50 percent. Working from the other end of the distribution, we can also define the median such that 50 percent of the values are greater than or equal to the median. For a random variable, X, if we denote the median as m, we have:
(3.25)
For a discrete random variable, the mode is the value associated with the highest probability. As with population and sample data sets, the mode of a discrete random variable need not be unique.
Sample Problem
Question:
At the start of the year, a bond portfolio consists of two bonds, each worth $100. At the end of the year, if a bond defaults, it will be worth $20. If it does not default, the bond will be worth $100. The probability that both bonds default is 20 percent. The probability that neither bond defaults is 45 percent. What are the mean, median, and mode of the year-end portfolio value?
Answer:
We are given the probability for two outcomes:
At year-end, the value of the portfolio, V, can only have one of three values, and the sum of all the probabilities must sum to 100 percent. This allows us to calculate the final probability:
The mean of V is then $140:
The mode of the distribution is $200; this is the most likely single outcome. The median of the distribution is $120; half of the outcomes are less than or equal to $120.
CONTINUOUS RANDOM VARIABLES
We can also define the mean, median, and mode for a continuous random variable. To find the mean of a continuous random variable, we simply integrate the product of the variable and its probability density function (PDF). In the limit, this is equivalent to our approach to calculating the mean of a discrete random variable. For a continuous random variable, X, with a PDF, f(x), the mean, μ, is then:
(3.26)
The median of a continuous random variable is defined exactly as it is for a discrete random variable, such that there is a 50 percent probability that values are less than or equal to, or greater than or equal to, the median. If we define the median as m, then:
(3.27)
Alternatively, we can define the median in terms of the cumulative distribution function. Given the cumulative distribution function, F(x), and the median, m, we have:
(3.28)
The mode of a continuous random variable corresponds to the maximum of the density function. As before, the mode need not be unique.
Sample Problem
Question:
Using the now-familiar probability density function discussed previously:
What are the mean, median, and mode of x?
Answer:
As we saw in a previous example, this probability density function is a triangle, between x = 0 and x = 10, and zero everywhere else.
Probability Density Function
For a continuous distribution, the mode corresponds to the maximum of the PDF. By inspection of the graph, we can see that the mode of f(x) is equal to 10.
To calculate the median, we need to find m, such that the integral of f(x) from the lower bound of f(x), zero, to m is equal to 0.50. That is, we need to find:
First we solve the left-hand side of the equation:
Setting this result equal to 0.50 and solving for m, we obtain our final answer:
In the last step we can ignore the negative root. If we hadn't calculated the median, looking at the graph it might be tempting to guess that the median is 5, the midpoint of the range of the distribution. This is a common mistake. Because lower values have less weight, the median ends up being greater than 5.
The mean is approximately 6.67:
As with the median, it is a common mistake, based on inspection of the PDF, to guess that the mean is 5. However, what the PDF is telling us is that outcomes between 5 and 10 are much more likely than values between 0 and 5 (the PDF is higher between 5 and 10 than between 0 and 5). This is why the mean is greater than 5.
Expectations
On January 15, 2005, the Huygens space probe landed on the surface of Titan, the largest moon of Saturn. This was the culmination of a seven-year-long mission. During its descent and for over an hour after touching down on the surface, Huygens sent back detailed images, scientific readings, and even sounds from a strange world. There are liquid oceans on Titan, the landing site was littered with “rocks” composed of water ice, and weather on the moon includes methane rain. The Huygens probe was named after Christiaan Huygens, a Dutch polymath who first discovered Titan in 1655. In addition to astronomy and physics, Huygens had more prosaic interests, including probability theory. Originally published in Latin in 1657, De Ratiociniis in Ludo Aleae, or The Value of All Chances in Games of Fortune, was one of the first texts to formally explore one of the most important concepts in probability theory, namely expectations.
Like many of his contemporaries, Huygens was interested in games of chance. As he described it, if a game has a 50 percent probability of paying $3 and a 50 percent probability of paying $7, then this is, in a way, equivalent to having $5 with certainty. This is because we expect, on average, to win $5 in this game:
(3.29)
As one can already see, the concepts of expectations and averages are very closely linked. In the current example, if we play the game only once, there is no chance of winning exactly $5; we can win only $3 or $7. Still, even if we play the game only once, we say that the expected value of the game is $5. That we are talking about the mean of all the potential payouts is understood.
We can express the concept of expectation more formally using the expectations operator. We could state that the random variable, X, has an expected value of $5 as follows:
(3.30)
where E[ ·] is the expectation operator.1
In this example, the mean and the expected value have the same numeric value, $5. The same is true for discrete and continuous random variables. The expected value of a random variable is equal to the mean of the random variable.
While the value of the mean and the expected value may be the same in many situations, the two concepts are not exactly the same. In many situations in finance and risk management the terms can be used interchangeably. The difference is often subtle.
As the name suggests, expectations are often thought of as being forward-looking. Pretend we have a financial asset for which the mean annual return is equal to 15 percent. This is not an estimate; in this case, we know that the mean is 15 percent. We say that the expected value of the return next year is 15 percent. We expect the return to be 15 percent, because the probability-weighted mean of all the possible outcomes is 15 percent.
Now pretend that we don't actually know what the mean return of the asset is, but we have 10 years' worth of historical data, for which the sample mean is 15 percent. In this case the expected value may or may not be 15 percent. In most cases if we say that the expected value is equal to 15 percent, we are making two assumptions: first, we are assuming that the returns in our sample were generated by the same random process over the entire sample period; second, we are assuming that the returns will continue to be generated by this same process in the future. These are very strong assumptions. In finance and risk management, we often assume that the data we are interested in are being generated by a consistent, unchanging process. Testing the validity of this assumption can be an important part of risk management in practice.
The concept of expectations is also a much more general concept than the concept of the mean. Using the expectations operator, we can derive the expected value of functions of random variables. As we will see in subsequent sections, the concept of expectations underpins the definitions of other population statistics (variance, skew, kurtosis), and is important in understanding regression analysis and time series analysis. In these cases, even when we could use the mean to describe a calculation, in practice we tend to talk exclusively in terms of expectations.
Sample Problem
Question:
At the start of the year, you are asked to price a newly issued zero-coupon bond. The bond has a notional value of $100. You believe there is a 20 percent chance that the bond will default, in which case it will be worth $40 at the end of the year. There is also a 30 percent chance that the bond will be downgraded, in which case it will be worth $90 in a year's time. If the bond does not default and is not downgraded, it will be worth $100. Use a continuous interest rate of 5 percent to determine the current price of the bond.
Answer:
We first need to determine the expected future value of the bond, that is, the expected value of the bond in one year's time. We are given the following:
Because there are only three possible outcomes, the probability of no downgrades and no default must be 50 percent:
The expected value of the bond in one year is then:
To get the current price of the bond we then discount this expected future value:
The current price of the bond, in this case $80.85, is often referred to as the present value or fair value of the bond. The price is considered fair because the discounted expected value of the bond is the rational price to pay for the bond, given our knowledge of the world.
The expectations operator is linear. That is, for two random variables, X and Y, and a constant, c, the following two equations are true:
(3.31)
If the expected value of one option, A, is $10, and the expected value of option B is $20, then the expected value of a portfolio containing A and B is $30, and the expected value of a portfolio containing five contracts of option A is $50.
Be very careful, though; the expectations operator is not multiplicative. The expected value of the product of two random variables is not necessarily the same as the product of their expected values:
(3.32)
Imagine we have two binary options. Each pays either $100 or nothing, depending on the value of some underlying asset at expiration. The probability of receiving $100 is 50 percent for both options. Further, assume that it is always the case that if the first option pays $100, the second pays $0, and vice versa. The expected value of each option separately is clearly $50. If we denote the payout of the first option as X and the payout of the second as Y, we have:
(3.33)
It follows that E[X]E[Y] = $50 × $50 = $2,500. In each scenario, though, one option is valued at zero, so the product of the payouts is always zero: $100 · $0 = $0 · $100 = $0. The expected value of the product of the two option payouts is:
(3.34)
In this case, the product of the expected values and the expected value of the products are clearly not equal. In the special case where E[XY] = E[X]E[Y], we say that X and Y are independent.
If the expected value of the product of two variables does not necessarily equal the product of the expectations of those variables, it follows that the expected value of the product of a variable with itself does not necessarily equal the product of the expectations of that variable with itself; that is:
(3.35)
Imagine we have a fair coin. Assign heads a value of +1 and tails a value of –1. We can write the probabilities of the outcomes as follows:
(3.36)
The expected value of any coin flip is zero, but the expected value of X2 is +1, not zero:
(3.37)
As simple as this example is, this distinction is very important. As we will see, the difference between E[X2] and E[X]2 is central to our definition of variance and standard deviation.
Sample Problem
Question:
Given the following equation:
What is the expected value of y? Assume the following:
Answer:
Note that E[x2] and E[x3] cannot be derived from knowledge of E[x]. In this problem, E[x2] ≠ E[x]2. As forewarned, the expectations operator is not necessarily multiplicative. To find the expected value of y, then, we first expand the term (x + 5)3 within the expectations operator:
Because the expectations operator is linear, we can separate the terms in the summation and move the constants outside the expectations operator. We do this in two steps:
At this point, we can substitute in the values for E[x], E[x2], and E[x3], which we were given at the start of the exercise:
This gives us the final answer, 741.
Variance and Standard Deviation
The variance of a random variable measures how noisy or unpredictable that random variable is. Variance is defined as the expected value of the difference between the variable and its mean squared:
(3.38)
where σ2 is the variance of the random variable X with mean μ.
The square root of variance, typically denoted by σ, is called standard deviation. In finance we often refer to standard deviation as volatility. This is analogous to referring to the mean as the average. Standard deviation is a mathematically precise term, whereas volatility is a more general concept.
Sample Problem
Question:
A derivative has a 50/50 chance of being worth either +10 or –10 at expiry. What is the standard deviation of the derivative's value?
Answer:
In the previous example, we were calculating the population variance and standard deviation. All of the possible outcomes for the derivative were known.
To calculate the sample variance of a random variable X based on n observations, x1, x2, … , xn, we can use the following formula:
(3.39)
where is the sample mean from Equation 3.22. Given that we have n data points, it might seem odd that we are dividing the sum by (n – 1) and not n. The reason has to do with the fact that itself is an estimate of the true mean, which also contains a fraction of each xi. We leave the proof for a problem at the end of the chapter, but it turns out that dividing by (n – 1), not n, produces an unbiased estimate of σ2. If the mean is known or we are calculating the population variance, then we divide by n. If instead the mean is also being estimated, then we divide by n – 1.
Equation 3.38 can easily be rearranged as follows (we leave the proof of this for an exercise, too):
Note that variance can be nonzero only if E[X2] ≠ E[X]2.
When writing computer programs, this last version of the variance formula is often useful, since it allows you to calculate the mean and the variance in the same loop. Also, in finance it is often convenient to assume that the mean of a random variable is close to zero. For example, based on theory, we might expect the spread between two equity indexes to have a mean of zero in the long run. In this case, the variance is simply the mean of the squared returns.
Sample Problem
Question:
Assume that the mean of daily Standard & Poor's (S&P) 500 returns is zero. You observe the following returns over the course of 10 days:
Estimate the standard deviation of daily S&P 500 returns.
Answer:
The sample mean is not exactly zero, but we are told to assume that the population mean is zero; therefore:
Note, because we were told to assume the mean was known, we divide by n = 10, not (n – 1) = 9.
As with the mean, for a continuous random variable we can calculate the variance by integrating with the probability density function. For a continuous random variable, X, with a probability density function, f(x), the variance can be calculated as:
(3.41)
It is not difficult to prove that, for either a discrete or a continuous random variable, multiplying by a constant will increase the standard deviation by the same factor:
In other words, if you own $10 of an equity with a standard deviation of $2, then $100 of the same equity will have a standard deviation of $20.
Adding a constant to a random variable, however, does not alter the standard deviation or the variance:
(3.43)
This is because the impact on the mean is the same as the impact on any draw of the random variable, leaving the deviation from the mean unchanged. If you own a portfolio with a standard deviation of $20, and then you add $1,000 of cash to that portfolio, the standard deviation of the portfolio will still be $20.
Standardized Variables
It is often convenient to work with variables where the mean is zero and the standard deviation is one. From the preceding section it is not difficult to prove that, given a random variable X with mean μ and standard deviation σ, we can define a second random variable Y:
(3.44)
such that Y will have a mean of zero and a standard deviation of one. We say that X has been standardized, or that Y is a standard random variable. In practice, if we have a data set and we want to standardize it, we first compute the sample mean and the standard deviation. Then, for each data point, we subtract the mean and divide by the standard deviation.
The inverse transformation can also be very useful when it comes to creating computer simulations. Simulations often begin with standardized variables, which need to be transformed into variables with a specific mean and standard deviation. In this case, we simply take the output from the standardized variable, multiply by the desired standard deviation, and then add the desired mean. The order is important. Adding a constant to a random variable will not change the standard deviation, but multiplying a non-mean-zero variable by a constant will change the mean.
Covariance
Up until now we have mostly been looking at statistics that summarize one variable. In risk management, we often want to describe the relationship between two random variables. For example, is there a relationship between the returns of an equity and the returns of a market index?
Covariance is analogous to variance, but instead of looking at the deviation from the mean of one variable, we are going to look at the relationship between the deviations of two variables:
where σXY is the covariance between two random variables, X and Y, with means μX and μY, respectively. As you can see from the definition, variance is just a special case of covariance. Variance is the covariance of a variable with itself.
If X tends to be above μX when Y is above μY (both deviations are positive), and X tends to be below μX when Y is below μY (both deviations are negative), then the covariance will be positive (a positive number multiplied by a positive number is positive; likewise, for two negative numbers). If the opposite is true and the deviations tend to be of opposite sign, then the covariance will be negative. If the deviations have no discernible relationship, then the covariance will be zero.
Earlier in this chapter, we cautioned that the expectations operator is not generally multiplicative. This fact turns out to be closely related to the concept of covariance. Just as we rewrote our variance equation earlier, we can rewrite Equation 3.45 as follows:
(3.46)
In the special case where the covariance between X and Y is zero, the expected value of XY is equal to the expected value of X multiplied by the expected value of Y:
(3.47)
If the covariance is anything other than zero, then the two sides of this equation cannot be equal. Unless we know that the covariance between two variables is zero, we cannot assume that the expectations operator is multiplicative.
In order to calculate the covariance between two random variables, X and Y, assuming the means of both variables are known, we can use the following formula:
If the means are unknown and must also be estimated, we replace n with (n – 1):
If we replaced yi in these formulas with xi, calculating the covariance of X with itself, the resulting equations would be the same as the equations for calculating variance from the previous section.
Correlation
Closely related to the concept of covariance is correlation. To get the correlation of two variables, we simply divide their covariance by their respective standard deviations:
(3.48)
Correlation has the nice property that it varies between –1 and +1. If two variables have a correlation of +1, then we say they are perfectly correlated. If the ratio of one variable to another is always the same and positive then the two variables will be perfectly correlated.
If two variables are highly correlated, it is often the case that one variable causes the other variable, or that both variables share a common underlying driver. We will see later, though, that it is very easy for two random variables with no causal link to be highly correlated. Correlation does not prove causation. Similarly, if two variables are uncorrelated, it does not necessarily follow that they are unrelated. For example, a random variable that is symmetrical around zero and the square of that variable will have zero correlation.
Sample Problem
Question:
X is a random variable. X has an equal probability of being –1, 0, or +1. What is the correlation between X and Y if Y = X2?
Answer:
We have:
First we calculate the mean of both variables:
The covariance can be found as:
Because the covariance is zero, the correlation is also zero. There is no need to calculate the variances or standard deviations.
As forewarned, even though X and Y are clearly related, the correlation is zero.
Application: Portfolio Variance and Hedging
If we have a portfolio of securities and we wish to determine the variance of that portfolio, all we need to know is the variance of the underlying securities and their respective correlations.
For example, if we have two securities with random returns XA and XB, with means μA and μB and standard deviations σA and σB, respectively, we can calculate the variance of XA plus XB as follows:
where ρAB is the correlation between XA and XB. The proof is left as an exercise. Notice that the last term can either increase or decrease the total variance. Both standard deviations must be positive; therefore, if the correlation is positive, the overall variance will be higher compared to the case where the correlation is negative.
If the variance of both securities is equal, then Equation 3.49 simplifies to:
Now we know that the correlation can vary between –1 and +1, so, substituting into our new equation, the portfolio variance must be bound by 0 and 4σ2. If we take the square root of both sides of the equation, we see that the standard deviation is bound by 0 and 2σ. Intuitively this should make sense. If, on the one hand, we own one share of an equity with a standard deviation of $10 and then purchase another share of the same equity, then the standard deviation of our two-share portfolio must be $20 (trivially, the correlation of a random variable with itself must be one). On the other hand, if we own one share of this equity and then purchase another security that always generates the exact opposite return, the portfolio is perfectly balanced. The returns are always zero, which implies a standard deviation of zero.
In the special case where the correlation between the two securities is zero, we can further simplify our equation. For the standard deviation:
(3.51)
We can extend Equation 3.49 to any number of variables:
(3.52)
In the case where all of the Xi's are uncorrelated and all the variances are equal to σ, Equation 3.50 simplifies to:
(3.53)
This is the famous square root rule for the addition of uncorrelated variables. There are many situations in statistics in which we come across collections of random variables that are independent and have the same statistical properties. We term these variables independent and identically distributed (i.i.d). In risk management we might have a large portfolio of securities, which can be approximated as a collection of i.i.d. variables. As we will see, this i.i.d. assumption also plays an important role in estimating the uncertainty inherent in statistics derived from sampling, and in the analysis of time series. In each of these situations, we will come back to this square root rule.
By combining Equation 3.49 with Equation 3.42, we arrive at an equation for calculating the variance of a linear combination of variables. If Y is a linear combination of XA and XB, such that:
(3.54)
then, using our standard notation, we have:
Correlation is central to the problem of hedging. Using the same notation as before, imagine we have $1 of Security A, and we wish to hedge it with $h of Security B (if h is positive, we are buying the security; if h is negative, we are shorting the security). In other words, h is the hedge ratio. We introduce the random variable P for our hedged portfolio. We can easily compute the variance of the hedge portfolio using Equation 3.55:
(3.56)
As a risk manager, we might be interested to know what hedge ratio would achieve the portfolio with the least variance. To find this minimum variance hedge ratio, we simply take the derivative of our equation for the portfolio variance with respect to h, and set it equal to zero:
(3.57)
You can check that this is indeed a minimum by calculating the second derivative. Substituting h* back into our original equation, we see that the smallest variance we can achieve is:
(3.58)
At the extremes, where ρAB equals –1 or +1, we can reduce the portfolio volatility to zero by buying or selling the hedge asset in proportion to the standard deviation of the assets. In between these two extremes we will always be left with some positive portfolio variance. This risk that we cannot hedge is referred to as idiosyncratic risk.
If the two securities in the portfolio are positively correlated, then selling $h of Security B will reduce the portfolio's volatility to the minimum possible level. Sell any less and the portfolio will be underhedged. Sell any more and the portfolio will be overhedged. In risk management it is possible to have too much of a good thing. A common mistake made by portfolio managers is to overhedge with a low-correlation instrument.
Notice that when ρAB equals zero (i.e., when the two securities are uncorrelated), the optimal hedge ratio is zero. You cannot hedge one security with another security if they are uncorrelated. Adding an uncorrelated security to a portfolio will always increase its volatility.
This last statement is not an argument against diversification. If your entire portfolio consists of $100 invested in Security A and you add any amount of an uncorrelated Security B to the portfolio, the dollar standard deviation of the portfolio will increase. Alternatively, if Security A and Security B are uncorrelated and have the same standard deviation, then replacing some of Security A with Security B will decrease the dollar standard deviation of the portfolio. For example, $80 of Security A plus $20 of Security B will have a lower standard deviation than $100 of Security A, but $100 of Security A plus $20 of Security B will have a higher standard deviation – again, assuming Security A and Security B are uncorrelated and have the same standard deviation.
Moments
Previously, we defined the mean of a variable X as:
It turns out that we can generalize this concept as follows:
(3.59)
We refer to mk as the kth moment of X. The mean of X is also the first moment of X.
Similarly, we can generalize the concept of variance as follows:
(3.60)
We refer to μk as the kth central moment of X. We say that the moment is central because it is central around the mean. Variance is simply the second central moment.
While we can easily calculate any central moment, in risk management it is very rare that we are interested in anything beyond the fourth central moment.
Skewness
The second central moment, variance, tells us how spread-out a random variable is around the mean. The third central moment tells us how symmetrical the distribution is around the mean. Rather than working with the third central moment directly, by convention we first standardize the statistic. This standardized third central moment is known as skewness:
(3.61)
where σ is the standard deviation of X.
By standardizing the central moment, it is much easier to compare two random variables. Multiplying a random variable by a constant will not change the skewness.
A random variable that is symmetrical about its mean will have zero skewness. If the skewness of the random variable is positive, we say that the random variable exhibits positive skew. Figures 3.3 and 3.4 show examples of positive and negative skewness.
FIGURE 3.3 Positive Skew
FIGURE 3.4 Negative Skew
Skewness is a very important concept in risk management. If the distributions of returns of two investments are the same in all respects, with the same mean and standard deviation but different skews, then the investment with more negative skew is generally considered to be more risky. Historical data suggest that many financial assets exhibit negative skew.
As with variance, the equation for skewness differs depending on whether we are calculating the population skewness or the sample skewness. For the population statistic, the skewness of a random variable X, based on n observations, x1, x2, … , xn, can be calculated as:
(3.62)
where μ is the population mean and σ is the population standard deviation. Similar to our calculation of sample variance, if we are calculating the sample skewness, there is going to be an overlap with the calculation of the sample mean and sample standard deviation. We need to correct for that. The sample skewness can be calculated as:
(3.63)
Based on Equation 3.40 for variance, it is tempting to guess that the formula for the third central moment can be written simply in terms of E[X3] and μ. Be careful, as the two sides of this equation are not equal:
(3.64)
The correct equation is:
Sample Problem
Question:
Prove that the left-hand side of Equation 3.65 is indeed equal to the right-hand side of the equation.
Answer:
We start by multiplying out the terms inside the expectation. This is not too difficult to do, but, as a shortcut, we could use the binomial theorem as mentioned previously:
Next we separate the terms inside the expectations operator and move any constants, namely μ, outside the operator:
E[X] is simply the mean, μ. For E[X2], we reorganize our equation for variance, Equation 3.40, as follows:
Substituting these results into our equation and collecting terms, we arrive at the final equation:
For many symmetrical continuous distributions, the mean, median, and mode all have the same value. Many continuous distributions with negative skew have a mean that is less than the median, which is less than the mode. For example, it might be that a certain derivative is just as likely to produce positive returns as it is to produce negative returns (the median is zero), but there are more big negative returns than big positive returns (the distribution is skewed), so the mean is less than zero. As a risk manager, understanding the impact of skew on the mean relative to the median and mode can be useful. Be careful, though, as this rule of thumb does not always work. Many practitioners mistakenly believe that this rule of thumb is in fact always true. It is not, and it is very easy to produce a distribution that violates the rule.
Kurtosis
The fourth central moment is similar to the second central moment, in that it tells us how spread-out a random variable is, but it puts more weight on extreme points. As with skewness, rather than working with the central moment directly, we typically work with a standardized statistic. This standardized fourth central moment is known as the kurtosis. For a random variable X, we can define the kurtosis as K, where:
(3.66)
where σ is the standard deviation of X, and μ is its mean.
By standardizing the central moment, it is much easier to compare two random variables. As with skewness, multiplying a random variable by a constant will not change the kurtosis.
The following two populations have the same mean, variance, and skewness. The second population has a higher kurtosis.
Notice, to balance out the variance, when we moved the outer two points out six units, we had to move the inner two points in 10 units. Because the random variable with higher kurtosis has points further from the mean, we often refer to distribution with high kurtosis as fat-tailed. Figures 3.5 and 3.6 show examples of continuous distributions with high and low kurtosis.
FIGURE 3.5 High Kurtosis
FIGURE 3.6 Low Kurtosis
Like skewness, kurtosis is an important concept in risk management. Many financial assets exhibit high levels of kurtosis. If the distribution of returns of two assets have the same mean, variance, and skewness, but different kurtosis, then the distribution with the higher kurtosis will tend to have more extreme points, and be considered more risky.
As with variance and skewness, the equation for kurtosis differs depending on whether we are calculating the population kurtosis or the sample kurtosis. For the population statistic, the kurtosis of a random variable X can be calculated as:
(3.67)
where μ is the population mean and σ is the population standard deviation. Similar to our calculation of sample variance, if we are calculating the sample kurtosis, there is going to be an overlap with the calculation of the sample mean and sample standard deviation. We need to correct for that. The sample kurtosis can be calculated as:
Later we will study the normal distribution, which has a kurtosis of 3. Because normal distributions are so common, many people refer to “excess kurtosis,” which is simply the kurtosis minus 3.
(3.69)
In this way, the normal distribution has an excess kurtosis of 0. Distributions with positive excess kurtosis are termed leptokurtotic. Distributions with negative excess kurtosis are termed platykurtotic. Be careful; by default, many applications calculate excess kurtosis.
When we are also estimating the mean and variance, calculating the sample excess kurtosis is somewhat more complicated than just subtracting 3. The correct formula is:
(3.70)
where is the sample kurtosis from Equation 3.68. As n increases, the last term on the right-hand side converges to 3.
1
Those of you with a background in physics might be more familiar with the term expectation value and the notation _X_ rather than E[X]. This is a matter of convention. Throughout this book we use the term expected value and E[ · ], which is currently more popular in finance and econometrics. Risk managers should be familiar with both conventions.