The mathematical model that has dominated the study of probability was formalized by the Russian mathematician A. N. Kolmogorov in a monograph published in 1933. The central concept in this model is a probability space, which is assumed to have three components:
S A sample space, a universe of “possible” outcomes for the experiment in question.
1I. Hacking, The Emergence of Probability, Cambridge University Press, 1975, Chapter 2: Duality.
C A designated collection of “observable” subsets (called events) of the sample space.
P A probability measure, a function that assigns real numbers (called probabilities) to events.
We describe each of these components in turn.
The Sample Space The sample space is a set. Depending on the nature of the experiment in question, it may or may not be easy to decide upon an appropriate sample space.
Example 3.1 A coin is tossed once.
A plausible sample space for this experiment will comprise two outcomes, Headsand Tails. Denoting these outcomes by H and T, we have
S ={H, T}.
Remark: We have discounted the possibility that the coin will come to rest on edge. This is the first example of a theme that will recur throughout this text, that mathematical models are rarely—if ever—completely faithful representations of nature. As described by Mark Kac,
“Models are, for the most part, caricatures of reality, but if they are good, then, like good caricatures, they portray, though per-haps in distorted manner, some of the features of the real world.
The main role of models is not so much to explain and predict—
though ultimately these are the main functions of science—as to polarize thinking and to pose sharp questions.”2
In Example 3.1, and in most of the other elementary examples that we will use to illustrate the fundamental concepts of axiomatic probability, the fi-delity of our mathematical descriptions to the physical phenomena described should be apparent. Practical applications of inferential statistics, however, often require imposing mathematical assumptions that may be suspect. Data analysts must constantly make judgments about the plausibility of their as-sumptions, not so much with a view to whether or not the assumptions are completely correct (they almost never are), but with a view to whether or not the assumptions are sufficient for the analysis to be meaningful.
2Mark Kac, “Some mathematical models in science,” Science, 1969, 166:695–699.
Example 3.2 A coin is tossed twice.
A plausible sample space for this experiment will comprise four outcomes, two outcomes per toss. Here,
S =
( HH TH HT TT
) .
Example 3.3 An individual’s height is measured.
In this example, it is less clear what outcomes are possible. All human heights fall within certain bounds, but precisely what bounds should be specified? And what of the fact that heights are not measured exactly?
Only rarely would one address these issues when choosing a sample space.
For this experiment, most statisticians would choose as the sample space the set of all real numbers, then worry about which real numbers were actually observed. Thus, the phrase “possible outcomes” refers to conceptual rather than practical possibility. The sample space is usually chosen to be mathe-matically convenient and all-encompassing.
The Collection of Events Events are subsets of the sample space, but how do we decide which subsets of S should be designated as events? If the outcome s ∈ S was observed and E ⊂ S is an event, then we say that E occurred if and only if s ∈ E. A subset of S is observable if it is always possible for the experimenter to determine whether or not it occurred. Our intent is that the collection of events should be the collection of observable subsets. This intent is often tempered by our desire for mathematical con-venience and by our need for the collection to possess certain mathematical properties. In practice, the issue of observability is rarely considered and certain conventional choices are automatically adopted. For example, when S is a finite set, one usually designates all subsets of S to be events.
Whether or not we decide to grapple with the issue of observability, the collection of events must satisfy the following properties:
1. The sample space is an event.
2. If E is an event, then Ec is an event.
3. The union of any countable collection of events is an event.
A collection of subsets with these properties is sometimes called a sigma-field.
Taken together, the first two properties imply that both S and ∅ must be events. If S and ∅ are the only events, then the third property holds;
hence, the collection{S, ∅} is a sigma-field. It is not, however, a very useful collection of events, as it describes a situation in which the experimental outcomes cannot be distinguished!
Example 3.1 (continued) To distinguish Heads from Tails, we must assume that each of these individual outcomes is an event. Thus, the only plausible collection of events for this experiment is the collection of all subsets of S, i.e.,
C = {S, {H}, {T}, ∅} .
Example 3.2 (continued) If we designate all subsets of S as events, then we obtain the following collection:
C =
S,
{HH, HT, TH}, {HH, HT, TT}, {HH, TH, TT}, {HT, TH, TT}, {HH, HT}, {HH, TH}, {HH, TT}, {HT, TH}, {HT, TT}, {TH, TT}, {HH}, {HT}, {TH}, {TT},
∅
.
This is perhaps the most plausible collection of events for this experiment, but others are also possible. For example, suppose that we were unable to distinguish the order of the tosses, so that we could not distinguish be-tween the outcomes HT and TH. Then the collection of events should not include any subsets that contain one of these outcomes but not the other, e.g., {HH, TH, TT}. Thus, the following collection of events might be deemed appropriate:
C =
S,
{HH, HT, TH}, {HT, TH, TT}, {HH, TT}, {HT, TH},
{HH}, {TT},
∅
.
The interested reader should verify that this collection is indeed a sigma-field.
The Probability Measure Once the collection of events has been des-ignated, each event E ∈ C can be assigned a probability P (E). This must
be done according to specific rules; in particular, the probability measure P must satisfy the following properties:
1. If E is an event, then 0≤ P (E) ≤ 1.
2. P (S) = 1.
3. If{E1, E2, E3, . . .} is a countable collection of pairwise disjoint events, then
P ̰
[
i=1
Ei
!
= X∞ i=1
P (Ei).
We discuss each of these properties in turn.
The first property states that probabilities are nonnegative and finite.
Thus, neither the statement that “the probability that it will rain today is −.5” nor the statement that “the probability that it will rain today is infinity” are meaningful. These restrictions have certain mathematical con-sequences. The further restriction that probabilities are no greater than unity is actually a consequence of the second and third properties.
The second property states that the probability that an outcome occurs, that something happens, is unity. Thus, the statement that “the probability that it will rain today is 2” is not meaningful. This is a convention that simplifies formulae and facilitates interpretation.
The third property, called countable additivity, is the most interesting.
Consider Example 3.2, supposing that {HT} and {TH} are events and that we want to compute the probability that exactly one Head is observed, i.e., the probability of
{HT} ∪ {TH} = {HT, TH}.
Because {HT} and {TH} are events, their union is an event and therefore has a probability. Because they are mutually exclusive, we would like that probability to be
P ({HT, TH}) = P ({HT}) + P ({TH}) .
We ensure this by requiring that the probability of the union of any two disjoint events is the sum of their respective probabilities.
Having assumed that
A∩ B = ∅ ⇒ P (A ∪ B) = P (A) + P (B), (3.1)
it is easy to compute the probability of any finite union of pairwise disjoint events. For example, if A, B, C, and D are pairwise disjoint events, then
P (A∪ B ∪ C ∪ D) = P (A ∪ (B ∪ C ∪ D))
= P (A) + P (B∪ C ∪ D)
= P (A) + P (B∪ (C ∪ D))
= P (A) + P (B) + P (C∪ D)
= P (A) + P (B) + P (C) + P (D) Thus, from (3.1) can be deduced the following implication:
If E1, . . . , En are pairwise disjoint events, then
P Ã n
[
i=1
Ei
!
= Xn i=1
P (Ei) .
This implication is known as finite additivity. Notice that the union of E1, . . . , En must be an event (and hence have a probability) because each Ei is an event.
An extension of finite additivity, countable additivity is the following implication:
If E1, E2, E3, . . . are pairwise disjoint events, then P
̰ [
i=1
Ei
!
= X∞ i=1
P (Ei) .
The reason for insisting upon this extension has less to do with applications than with theory. Although some axiomatic theories of probability assume only finite additivity, it is generally felt that the stronger assumption of countable additivity results in a richer theory. Again, notice that the union of E1, E2, . . . must be an event (and hence have a probability) because each Ei is an event.
Finally, we emphasize that probabilities are assigned to events. It may or may not be that the individual experimental outcomes are events. If they are, then they will have probabilities. In some such cases (see Chapter 4), the probability of any event can be deduced from the probabilities of the individual outcomes; in other such cases (see Chapter 5), this is not possible.
All of the facts about probability that we will use in studying statistical inference are consequences of the assumptions of the Kolmogorov probability model. It is not the purpose of this book to present derivations of these facts;
however, three elementary (and useful) propositions suggest how one might proceed along such lines. In each case, a Venn diagram helps to illustrate the proof.
Theorem 3.1 If E is an event, then
P (Ec) = 1− P (E).
S
E
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·
Figure 3.1: A Venn diagram for the probability of Ec.
Proof Refer to Figure 3.1. Ec is an event because E is an event. By definition, E and Ec are disjoint events whose union is S. Hence,
1 = P (S) = P (E∪ Ec) = P (E) + P (Ec)
and the theorem follows upon subtracting P (E) from both sides. 2
Theorem 3.2 If A and B are events and A⊂ B, then P (A)≤ P (B).
S
· · · ·· · · ·
· · · ·· · · ·
· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·· · · · ·
· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · · A
B
Figure 3.2: A Venn diagram for the probability of A⊂ B.
Proof Refer to Figure 3.2. Acis an event because A is an event. Hence, B∩ Ac is an event and
B = A∪ (B ∩ Ac) . Because A and B∩ Ac are disjoint events,
P (B) = P (A) + P (B∩ Ac)≥ P (A), as claimed. 2
Theorem 3.3 If A and B are events, then
P (A∪ B) = P (A) + P (B) − P (A ∩ B).
S
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · · A
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·· · · ·
· · · ·
B
Figure 3.3: A Venn diagram for the probability of A∪ B.
Proof Refer to Figure 3.3. Both A∪ B and A ∩ B = (Ac∪ Bc)c are events because A and B are events. Similarly, A∩ Bc and B∩ Ac are also events.
Notice that A∩Bc, B∩Ac, and A∩B are pairwise disjoint events. Hence, P (A) + P (B)− P (A ∩ B)
= P ((A∩ Bc)∪ (A ∩ B)) + P ((B ∩ Ac)∪ (A ∩ B)) − P (A ∩ B)
= P (A∩ Bc) + P (A∩ B) + P (B ∩ Ac) + P (A∩ B) − P (A ∩ B)
= P (A∩ Bc) + P (A∩ B) + P (B ∩ Ac)
= P ((A∩ Bc)∪ (A ∩ B) ∪ (B ∩ Ac))
= P (A∪ B), as claimed. 2
Theorem 3.3 provides a general formula for computing the probability of the union of two sets. Notice that, if A and B are in fact disjoint, then
P (A∩ B) = P (∅) = P (Sc) = 1− P (S) = 1 − 1 = 0
and we recover our original formula for that case.