Читать книгу Probability and Statistical Inference - Robert Bartoszynski - Страница 63

2.6 Necessity of the Axioms*

Оглавление

Looking at Axiom 3, one may wonder why do we need it for the case of countable (and not just finite) sequences of events. Indeed, the necessity of all three axioms, with only finite additivity in Axiom 3, can be easily justified simply by using probability to represent the limiting relative frequency of occurrences of events. Recall the symbol from Section 2.1 for the number of occurrences of the event in the first experiments. The nonnegativity axiom is simply a reflection of the fact that the count cannot be negative. The norming axiom reflects the fact that event is certain and must occur in every experiment so that , and hence, . Finally, (taking the case of two disjoint events and ), we have , since whenever occurs, does not, and conversely. Thus, if probability is to reflect the limiting relative frequency, then should be equal to , since the frequencies satisfy the analogous condition .

The need for countable additivity, however, cannot be explained so simply. This need is related to the fact that to build a sufficiently powerful theory, one needs to take limits. If is a monotone sequence of events (increasing or decreasing, i.e., or ) then , where the event has been defined in Section 1.4. Upon a little reflection, one can see that such continuity is a very natural requirement. In fact, the same requirement has been taken for granted for over 2,000 years in a somewhat different context: in computing the area of a circle, one uses a sequence of polygons with an increasing number of sides, all inscribed in the circle. This leads to an increasing sequence of sets “converging” to the circle, and therefore the area of the circle is taken to be the limit of the areas of approximating polygons. The validity of this idea (i.e., the assumption of the continuity of the function = area of ) was not really questioned until the beginning of the twentieth century. Research on the subject culminated with the results of Lebesgue.

To quote the relevant theorem, let us say that a function , defined on a class of sets (events), is continuous from below at the set if the conditions and imply that . Similarly, is continuous from above at the set if the conditions and imply . A function that is continuous at every set from above or from below is simply called continuous (above or below). Continuity from below and from above is simply referred to as continuity.

We may characterize countable additivity as follows:

Theorem 2.6.1 If the probability satisfies Axiom 3 of countable additivity, then is continuous from above and from below. Conversely, if a function satisfies Axioms 1 and 2, is finitely additive, and is either continuous from below or continuous from above at the empty set , then is countably additive.

Proof: Assume that satisfies Axiom 3, and let be a monotone increasing sequence. We have

(2.8)

the events on the right‐hand side being disjoint. Since (see Section 1.5), using (2.8), and the assumption of countable additivity, we obtain


(passing from the first to the second line, we used the fact that the infinite series is defined as the limit of its partial sums). This proves continuity of from below. To prove continuity from above, we pass to the complements, and proceed as above.

Let us now assume that is finitely additive and continuous from below, and let be a sequence of mutually disjoint events. Put so that is a monotone increasing sequence with . We have then, using continuity from below and finite additivity,


again by definition of a numerical series being the limit of its partial sums. This shows that is countably additive.

Finally, let us assume that is finitely additive and continuous from above at the empty set (impossible event). Taking again a sequence of disjoint events let . We have and . By finite additivity, we obtain

(2.9)

Since (2.9) holds for every , we can write


Again, by the definition of series and the assumption that , is countably additive, and the proof is complete.

As an illustration, we now prove the following theorem:

Theorem 2.6.2 (First Borel–Cantelli Lemma) If is a sequence of events such that

(2.10)

then


Proof: Recall (1.7) from Chapter 1, where = “infinitely many events occur” = (because the unions form a decreasing sequence). Consequently, using the continuity of , subadditivity property (2.2), and assumption (2.10), we have


Paraphrasing the assertion of the lemma, if probabilities of events decrease to zero fast enough to make the series converge, then with probability 1 only finitely many among events will occur. We will prove the converse (under an additional assumption), known as the second Borel–Cantelli lemma, in Chapter 4.

In the remainder of this section, we will discuss some theoretical issues related to defining probability in practical situations. Let us start with the observation that the analysis above should leave some more perceptive readers disturbed. Clearly, one should not write a formula without being certain that it is well defined. In particular, when writing two things ought to be certain: (1) that what appears in the parentheses is a legitimate object of probability, that is, an event and (2) that the function is defined unambiguously at this event.

With regard to the first point, the situation is rather simple. All reasonable questions concern events such as and , and hence events obtained by taking countable unions, countable intersections, and complementations of the events . Thus, the events whose probabilities are discussed belong to the smallest ‐field containing all the events (see Definition 1.4.2 and Theorem 1.4.3). Consequently, to make the formulas at least apparently legitimate, it is sufficient to assume that the class of all the events under considerations is a ‐field, and that probability is a function satisfying the probability axioms defined on this ‐field.

This assumption alone, however, is not enough to safeguard us from possible trouble. Let us consider the following hypothetical situation: Suppose that we do not know how to calculate the area of a circle. We could start from the beginning and define the areas of simple figures: first rectangles, then pass to right triangle, and then to arbitrary triangles, which could be reduced to sums and differences of right triangles. From there, the concept of area could be extended to figures that could be triangulated. It is a simple matter to show that the area of such a figure does not depend on how it is triangulated.

From here, we may pass to areas of more complicated figures, the first of these being the circle. The area of the latter could be calculated by inscribing a square in it, and then taking areas of regular polygons with sides and passing to the limit. The result is . The same result is obtained if we start by inscribing an equilateral triangle, and then take limits of the areas of regular polygons with sides. The same procedure could be tried with an approximation from above, that is, starting with a square or equilateral triangle circumscribed on the circle. Still the limit is . We could then be tempted to conclude that the area of the circle is . The result is, of course, true, but how do we know that we will obtain the limit always equal to , regardless of the way of approximating the circle? What if we start, say, from an irregular seven‐sided polygon, and then triple the number of sides in each step?

A similar situation occurs very often in probability: Typically, we can define probabilities on “simple” events, corresponding to rectangles in geometry, and we can extend this definition without ambiguity to finite unions of the simple events (“rectangles”). The existence and uniqueness of a probability of all the events from the minimal ‐field containing the “rectangles” is ensured by the following theorem, which we state here without proof.

Theorem 2.6.3 If P is a function defined on a field of events satisfying the probability axioms (including countable additivity), then P can be extended in a unique way to a function satisfying the probability axioms, defined on the minimal ‐field containing .

This means that if the function is defined on a field of events and satisfies all the axioms of probability, and if is the smallest ‐field containing all sets in , then there exists exactly one function defined on that satisfies the probability axioms, and if .

A comment that is necessary here concerns the question: What does it mean that a function defined on a field satisfies the axioms of probability? Specifically, the problem concerns the axiom of countable additivity, which asserts that if events are disjoint, then

(2.11)

However, if is defined on a field, then there is no guarantee that the left‐hand side of formula ((2.11)) makes sense, since need not belong to the field of events on which is defined. The meaning of the assumption of Theorem 2.6.3 is that formula (2.11) is true whenever the union belongs to the field on which is defined.

The way of finding the probability of some complicated event is to represent as a limit of some sequence of events whose probabilities can be computed, and then pass to the limit. Theorem 2.6.3 asserts that this procedure will give the same result, regardless of the choice of sequence of events approximating the event .

Probability and Statistical Inference

Подняться наверх