Читать книгу Seismic Reservoir Modeling - Dario Grana - Страница 18

Example 1.2

Оглавление

In this example, we illustrate the calculation of the probability of the random variable X to belong to the interval (2, 3], assuming that X is distributed according to the triangular PDF shown in Figure 1.2. The random variable in Figure 1.2 could represent, for example, the P‐wave velocity of a porous rock, expressed in km/s.

The PDF fX(x) can be written as follows:


The triangular function fX(x) is a non‐negative function with integral equal to 1 (i.e. the area of the triangle with base of length 4 and height 0.5); hence, it satisfies the conditions in Eqs. (1.12)(1.14), and it is a valid PDF. We now compute the probability P(2 < X ≤ 3) using the definition in Eq. (1.15):


The probability P(2 < X ≤ 3) represents the area of the region highlighted in Figure 1.2 and can also be computed as the area of a right trapezoid, rotated by 90°.

In many applications, it is useful to describe the distribution of the continuous random variable using the concept of cumulative distribution function (CDF), FX : → [0, 1]. We assume that a random variable X has a PDF fX(x); then, the CDF FX(x) is defined as the integral of fX(x) in the interval (−∞, x]:

(1.16)


Figure 1.2 Graphical interpretation of the probability P(2 < X ≤ 3) of a continuous random variable X as the definite integral of the PDF in the interval (2, 3].

Using the definition in Eq. (1.15), we conclude that FX(x) is the probability of the random variable X being less than or equal to the outcome x, i.e. FX(x) = P(Xx). As a consequence, we can also define the PDF fX(x) as the derivative of the CDF, i.e. .

The CDF takes values in the interval [0, 1] and it is non‐decreasing, i.e. if a < b, then FX(a) ≤ FX(b). When FX(x) is strictly increasing, i.e. if a < b, then FX(a) < FX(b), we can define its inverse function, namely the quantile (or percentile) function. Given a CDF FX(x) = p, the quantile function is and it associates p ∈ [0, 1] to the corresponding value x of the random variable X. The definition of percentile is similar to the definition of quantiles, but using percentages rather than fractions in [0, 1]. For example, the 10th percentile (or P10) corresponds to the quantile 0.10. In many practical applications, we often use quantiles and percentiles. The 25th percentile is the value x of the random variable X such that P(Xx) = 0.25, which means that the probability of taking a value lower than or equal to x is 0.25. The 25th percentile is also called the lower quartile. Similarly, the 75th percentile (or P75) is the upper quartile. The 50th percentile (or P50) is called the median of the distribution, since it represents the value x of the random variable X such that there is an equal probability of observing values lower than or equal to x and observing values greater than x, i.e. P(Xx) = 0.50 = P(X > x).

A continuous random variable is completely defined by its PDF (or by its CDF); however, in some special cases, the probability distribution of the continuous random variable can be described by a finite number of parameters. In this case, the PDF is said to be parametric, because it is defined by its parameters. Examples of parameters are the mean and the variance.

The mean is the most common measure used to describe the expected value that a random variable can take. The mean μX of a continuous random variable is defined as:

(1.17)

The mean μX is also called the expected value and it is often indicated as E[X]. However, the mean itself is not sufficient to represent the full probability distribution, because it does not contain any measure of the uncertainty of the outcomes of the random variable. A common measure for the uncertainty is the variance :

(1.18)

The variance describes the spread of the distribution around the mean. The standard deviation σX is the square root of the variance, . Other parameters are the skewness coefficient and the kurtosis coefficient (Papoulis and Pillai 2002). The skewness coefficient is a measure of the asymmetry of the probability distribution with respect to the mean, whereas the kurtosis coefficient is a measure of the weight of the tails of the probability distribution.

Other useful statistical estimators are the median and the mode. The median is the 50th percentile (or P50) defined using the CDF and it is the value of the random variable that separates the lower half of the distribution from the upper half. The mode of a random variable is the value of the random variable that is associated with the maximum value of the PDF. For a unimodal symmetric distribution, the mean, the median, and the mode are equal, but, in the general case, they do not necessarily coincide. For example, in a skewed distribution the larger the skewness, the larger the difference between the mode and the median. For a multimodal distribution, the mode can be a more representative statistical estimator, because the mean might fall in a low probability region. A comparison of these three statistical estimators is given in Figure 1.3, for symmetric unimodal, skewed unimodal, and multimodal distributions.

The parameters of a probability distribution can be estimated from a set of n observations {xi}i=1,…,n. The sample mean is an estimate of the mean and it is computed as the average of the data:

(1.19)


Figure 1.3 Comparison of statistical estimators: mean (square), median (circle), and mode (diamond) for different distributions (symmetric unimodal, skewed unimodal, and multimodal distributions).

whereas the sample variance is estimated as:

(1.20)

where the constant 1/(n − 1) makes the estimator unbiased (Papoulis and Pillai 2002).

Seismic Reservoir Modeling

Подняться наверх