Читать книгу Statistical Methods and Modeling of Seismogenesis - Eleftheria Papadimitriou - Страница 13
1.2. Complexity of magnitude distribution
ОглавлениеThe most often used model of magnitude distribution, fM(M), FM(M), results from the empirical Gutenberg–Richter relation, which predicts a linear dependence of the logarithm of the number of earthquakes with magnitudes greater than or equal to M on M. This yields a piecewise distribution of magnitude:
where β is the distribution parameter and MC is the magnitude value beginning from where all earthquakes have been statistically recorded and are in the earthquake catalog. MC is called the catalog completeness level. Earthquake magnitude represents the physical size of an earthquake; hence in an environment of finite dimensions, as seismogenic zones are, it cannot be unlimited. For this reason, among others, we often amend the model [1.19] with an endpoint. The upper-bounded model of magnitude PDF with a hard endpoint is:
where Mmax is the upper limit of magnitude distribution (e.g. Cosentino et al. 1977).
The great popularity of the model [1.19] stems from two facts. First, it is a one-parameter model with a very simple form, and its parameter can be readily estimated, even from poorly populated samples. Second, the linear Gutenberg–Richter relation, at first glance, seems to fit almost all seismicity cases. This caused the Gutenberg–Richter relation to often be considered as a universal law, though there were works indicating breaks in scaling in the frequency-magnitude (early examples: Schwartz and Coppersmith 1984; Davison and Scholz 1985; Pacheco et al. 1992). The fit of the Gutenberg–Richter relation, and thus of the models [1.19] and [1.20] to actual earthquake samples, was rarely verified by rigorous statistical testing, and visible deviations from the model were interpreted as statistical scatter.
For obvious reasons, in seismic hazard problems, the attention is devoted to stronger magnitudes, whose exceedance probability, 1−FM(M), is low. This exceedance probability of magnitude enters the equations characterizing hazards in either the negative exponent (equations [1.16] and [1.17]), or in the denominator (equation [1.18]). When a magnitude distribution model was inappropriate in the range of larger magnitudes, this would result in significant systematic errors of hazard parameters.
There are parametric alternatives to the exponential models [1.19] and [1.20] (see, e.g., Lasocki 1993; Utsu 1999, and the references therein; Jackson and Kagan 1999; Kagan 1999; Pisarenko and Sornette 2003). The drawback in all of them is that their PDFs either decrease monotonically, or have at most one mode, that is, one local maximum at the catalog completeness level, MC. Such distributions cannot correctly model the modes at larger magnitudes and faster than linear drops of the logarithmic number of observations in the range of larger magnitudes, that is, the features observed and expected based on physical models of seismicity. The same drawback holds for the exponential distribution model.
Two approaches can be proposed for investigating how well the exponential models [1.19] and [1.20] fit data. Within the first approach, the null hypothesis is H0 (the magnitude distribution is exponential) against the alternative that it is not exponential. This hypothesis can be readily assessed by means of the Kolmogorov-Smirnov one-sample test, or with better precision by the Anderson- Darling test. These tests can also verify some of the mentioned alternative models. When we estimate model parameters from the data sample, and then this model is tested, the Kolmogorov-Smirnov test statistics should be modified. Both tests are very popular but will not be described here. Examples of when the above null hypothesis was disproved can be found in Urban et al. (2016) and Leptokaropoulos (2020). It is worth noting that both tests are suitable for continuous random variables, whose samples do not contain repeated values. Magnitude is a continuous random variable, but in seismic catalogs magnitudes have a limited number of digits, usually one digit after the decimal point. Therefore, magnitude data samples have many repetitions. To remedy the problem, Lasocki and Papadimitriou (2006) recommended randomizing the data before testing through the following equation:
where M is the magnitude value taken from catalog, δM is the length of the magnitude round off interval, u is the random value drawn from the uniform distribution in the [0,1] interval, F(•) is the CDF of an exponential model fitting the data, F-1(•) denotes its inverse function and M* is the randomized value of magnitude. This randomization procedure will return the original catalog magnitude when M*is rounded to δM, and the randomized values in every interval [M-0.5δM, M+0.5δM] repeat exponential distribution of the whole dataset. We can object that the best fitted exponential distribution is used first to randomize the data, and next, this fit is tested. Indeed, this randomization slightly amplifies the probability that H0 will not be rejected. However, other randomizations, e.g., the one assuming normal distribution in [M−0.5δM, M+0.5δM], is logically incorrect and strongly acts against H0.
Within the second approach, two independent null hypotheses are checked: H01 (the magnitude distribution is at most unimodal) and H02 (the number of bumps in the magnitude density ≤ 1). A bump is an interval [a,b] such that the probability density is concave over [a,b] but not over any larger interval. PDFs with two modes and two bumps, respectively, are illustrated in Figure 1.3.
Figure 1.3. Left – PDF with two modes. Right – PDF with two bumps. Reprinted from Lasocki and Papadimitriou (2006, Figure 1)
This approach is suitable to test the applicability of the exponential model, as well as the above-mentioned alternatives, because all of these models have at most, one mode.
These two null hypotheses, H01 and H02, are studied using the smooth bootstrap test for multimodality presented by Silverman (1986) and Efron and Tibshirani (1993), with some adaptations to the magnitude distribution problem provided by Lasocki and Papadimitriou (2006). Given the magnitude data sample {Mi}, i = 1,.., N, the procedure consists of the following steps:
– Step 1. Estimating the catalog (sample) completeness level, MC. We can do this either visually, selecting MC as the smallest magnitude of the linear part of the semi-logarithmic magnitude-frequency graph, or using more sophisticated methods presented, e.g., in Mignan and Woessner (2012), Leptokaropoulos et al. (2013) and the references therein.
– Step 2. Reducing the sample to its complete part {Mi|Mi ≥ Mc}, i = 1,.., n .
– Step 3. Estimating the exponential model of magnitude distribution [1.19]. The maximum likelihood estimator of β is (Aki 1965; Bender 1983):
[1.22]
where 〈M〉 stands for the mean value of the reduced (complete) sample, and the other symbols are the same as previously.
– Step 4. Randomizing {Mi}, i = 1,.., n according to equation [1.21], in which F(•) is CDF of the exponential distribution with parameter . The result is
The next steps begin from the kernel density estimate of magnitude, equation [1.3]. As it is shown in Figure 1.2, the number of modes and bumps of a kernel estimate depends on the bandwidth value, h. The greater h is, the fewer modes/bumps occur. Thus, there exists a critical value of bandwidth, hcrit, such that only has one mode for h≥hcrit and more than one mode for h<hcrit. Similarly for bumps. The critical bandwidths are denoted by for modes and bumps, respectively.
– Step 5. Estimating The estimator [1.3] is differentiable:
For only has one zero. For only has one zero. The search for begins from small values of h, for which the derivatives [1.23] and [1.24] have more than one zero, respectively. One increases h gradually until the number of zeros of the respective derivatives becomes one. Acceptable critical bandwidth estimates are obtained when the step change of h is equal to 0.001. The estimates are, in this case, attained with precision 10-3. However, to search the critical bandwidths, more complex numerical methods may also be applied.
– Step 6. Drawing R n-element samples from (equation [1.3]) when testing H01 (modes), and from when testing H02 (bumps). The sampling is done through the smoothed bootstrap technique. For testing H01, the drawn samples are:
where comes from n-times uniform selection with replacement from the original data (standard bootstrap), and εi are the standard normal random numbers.
Silverman (1986) and Efron and Tibshirani (1993) indicated that the samples have greater variance than the original data, and that this leads to an artificial increase of the significance of the null hypothesis. To remedy this problem, Efron and Tibshirani (1993) recommended using:
where are the mean and the sample variance of rather than because has approximately the same variance as However, when testing magnitudes with the use of samples [1.26], we were occasionally receiving distinctly erroneous results; therefore, we suggest carrying out the tests independently twice, using samples [1.25] and [1.26], respectively. The tests results should be comparable.
When testing H02 we use in [1.25] and [1.26].
– Step 7. Calculating the test p-value. To do this we count how many samples of R, drawn in step 6, result in the kernel estimates of PDF [1.3], which only have one mode (or only one bump when testing H02). This can be achieved by studying the first derivative of the estimate obtained from [1.23] (the second derivative obtained from [1.24] when testing H02) in which [1.25] or [1.26] replaces The tests p-values are:
and
for testing H01 and H02, respectively.
– Step 8. Calibrating p-values. Lasocki and Papadimitriou (2006) checked the agreement between the p-values obtained from [1.27] and [1.28], and the expected p-values from the p-value definition. To do this they applied the testing procedure to a large number of Monte Carlo samples drawn from the exponential distribution [1.19]. Their tests showed that the p-values estimates, [1.27] and [1.28], could significantly deviate from their actual values, probably because of the skewness of the exponential distribution. Because this can impact the final test result, the cited authors suggested calibrating the p-values estimates, [1.27] and [1.28]. For this purpose, we draw a large number of samples from the exponential distribution [1.19] with the parameter β obtained in step 3. The calibrated p-value estimate, pm_cat, is the proportion of these samples, whose p-values obtained after performing steps 5–7 are less than or equal to pm. The calibration of pm is done in the same way.
The comparison of the calibrated p-value with the assumed test significance level α finalizes the test(s).
In a considerable proportion of studied cases, the above-presented tests questioned the exponential model for magnitude distribution. Such results were obtained for both worldwide and regional catalogs of natural earthquakes (Lasocki and Papadimitriou 2006; Lasocki 2007), as well as for anthropogenic seismicity data (e.g. Lasocki 2001; Lasocki and Orlecka-Sikora 2008, Urban et al. 2016; Leptokaropoulos 2020). Furthermore, the additional modes or bumps located in the ranges of larger magnitudes, as shown above, can yield significant errors of seismic hazard estimates. In this connection, new approaches for estimating the magnitude distributions are needed.
The smooth bootstrap test for multimodality of magnitude distribution has been implemented on the IS-EPOS Platform (tcs.ah-epos.eu). The platform integrates and provides free access to the research infrastructure of anthropogenic seismicity and the related hazards (Orlecka-Sikora et al. 2020).