of a finite or countably infinite set Ω called the sample space and a probability measurePr : Ω−→R+with
ω∈ΩPr[ω] = 1.2
The elements of the sample spaceΩare called simple events, indecomposable events, or—as used in this book—elementary events.
1 In some literature, a discrete probability space is called a discrete random experiment. 2 Alternative notations for the probability measure Pr[·]are P(·), P[·], or Prob[·].
If we run a (discrete) random experiment in a probability space, then every elementary event of the sample space represents a possible outcome of the experi- ment. The probability measure or probability distribution Pr[·]assigns a nonnegative real value to every elementary eventω ∈ Ω, such that all (probability) values sum up to one. There is no general and universally valid requirement on how to assign probability values. In fact, it is often the case that many elementary events of Ω
occur with probability zero. If all|Ω|possible values occur with the same proba- bility (i.e., Pr[ω] = 1/|Ω|for allω ∈Ω), then the probability distribution is called uniform. Uniform probability distributions are frequently used in probability theory and applications thereof.
As mentioned in Definition 4.1, sample spaces are assumed to be finite or countably infinite for the purpose of this book (things get more involved if this assumption is not made). The term discrete probability theory is sometimes used to refer to the restriction of probability theory to finite or countably infinite sample spaces. In this book, however, we only focus on discrete probability theory, and hence the terms probability theory and discrete probability theory are used synonymously and interchangeably. Furthermore, we say a “finite” sample space when we actually mean a “finite or countably infinite” sample space.
For example, flipping a coin can be understood as a random experiment taking place in a discrete probability space. The sample space is{head, tail}(or{0,1}if
0and1are used to encode head and tail, respectively) and the probability measure assigns1/2to eitherheadortail(i.e., Pr[head] =Pr[tail] = 1/2). The resulting probability distribution is uniform. If the coin is flipped five times, then the sample space is{head, tail}5(or{0,1}5, respectively) and the probability measure assigns
1/25= 1/32to every possible outcome of the experiment. Similarly, rolling a dice can be understood as a random experiment taking place in a discrete probability space. In this case, the sample space is {1, . . . ,6} and the probability measure assigns 1/6 to every possible outcome of the experiment (i.e., Pr[1] = . . . =
Pr[6] = 1/6). If the dice is rolledntimes (or ndice are rolled simultaneously), then the sample space is{1, . . . ,6}n and the probability measure assigns1/6n to
every possible outcome of the experiment. In either case, the probability distribution is uniform if the coins are unbiased and if the dice are fair.
Instead of looking at elementary events of a sample space, one may also look at sets of elements. In fact, an event refers to a subsetA ⊆Ωof the sample space, and its probability equals the sum of the probabilities of the elementary events of which it consists. This is formally expressed as follows:
Pr[A] =
ω∈A
Pr[Ω]is conventionally set to one, and Pr[∅]is set to zero. Furthermore, one frequently needs the complement of an eventA. It consists of all elements ofΩthat are not elements ofA. The complement ofAis denoted asA, and its probability can be computed as follows:
Pr[A] =
ω∈Ω\A
Pr[ω]
If we knowPr[A], then we can easily compute
Pr[A] = 1−Pr[A]
becausePr[A]andPr[A]must sum up to one.
0
1
W
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
w Pr[w] A Pr[A]Figure 4.1 A discrete probability space.
A discrete probability space is illustrated in Figure 4.1. There is a sample spaceΩand a probability measure Pr[·]that assign a value between 0 and 1 to every elementary eventω∈Ωor eventA ⊆Ω.
If, for example, we want to compute the probability of the event that, when flipping five coins, we get three heads, then the sample space isΩ = {1,0}5and the probability distribution is uniform. This basically means that every element
ω ∈ Ω occurs with the same probability Pr[ω] = 1/25 = 1/32. Let Abe the subset ofΩ ={1,0}5containing strings with exactly three ones and let us ask for the probability Pr[A]. It can easily be shown that Aconsists of the following 10 elements: 00111 10110 01011 10101 01101 11001 01110 11010 10011 11100
Consequently, Pr[A] = 10/32 = 5/16. The example can be generalized to
nflips with a biased coin. If the coin flips are independent and the probability that each coin turns out heads is0 ≤ p≤ 1, then the sample space is{1,0}nand the
probability for a specific eventωin this space is
Pr[ω] =pk(1−p)n−k
wherekis the number of ones inω. In the example given earlier, we hadp= 1−p= 1/2, and the corresponding distribution over{1,0}nwas uniform. Ifp= 1(p= 0),
then 0n (1n) has probability one and all other elements have probability zero.
Consequently, the interesting cases occur whenpis greater than zero but smaller than one (i.e.,p∈(0,1)). This brings us to the notion of a binominal distribution. If we have such a distribution with parameterpand ask for the probability of the event
Akthat we get a string withkones, then the probability Pr[Ak]can be computed as
follows: Pr[Ak] = n k pk(1−p)n−k In this formula, n k
is read “nchoosek” and can be computed as follows:
n k = n! k!(n−k)!
In this notation,n!refers to the factorial of integern. It is recursively defined with0! = 1andn! = (n−1)!n.
More generally, if we have two eventsA, B ⊆Ω, then the probability of the union eventA ∪ Bis computed as follows:
Pr[A ∪ B] = Pr[A] + Pr[B]−Pr[A ∩ B]
Consequently,Pr[A ∪ B]≤Pr[A] + Pr[B]andPr[A ∪ B] = Pr[A] + Pr[B]
if and only ifA ∩ B=∅. The former inequality is known as the union bound. Similarly, we may be interested in the joint event A ∩ B. Its probability is computed as follows:
Pr[A ∩ B] = Pr[A] + Pr[B]−Pr[A ∪ B]
A
B
Figure 4.2 A Venn diagram with two events.
Venn diagrams can be used to illustrate the relationship of specific events. A Venn diagram is made up of two or more overlapping circles (each circle represents an event). For example, Figure 4.2 shows a Venn diagram with two eventsAandB. The intersection of the two circles representsA ∩ B, whereas the union represents
A ∪ B.
The two eventsAandBare independent ifPr[A∩B] = Pr[A]·Pr[B], meaning that the probability of one event does not influence the probability of the other.
The notion of independence can be generalized to more than two events. In this case, it must be distinguished whether the events are pairwise or mutually independent. LetA1, . . . ,An⊆Ωbenevents in a given sample spaceΩ.
• A1, . . . ,Anare pairwise independent if for everyi, j∈ {1, . . . , n}withi=j
• A1, . . . ,An are mutually independent if for every subset of indices I ⊆
{1,2, . . . , n}withI=∅it holds that
Pr[
i∈I
Ai] = i∈I
Pr[Ai].
Sometimes it is necessary to compute the probability of an elementary event
ωgiven that an eventAwithPr[A]>0holds. The resulting conditional probability is denoted as Pr[ω|A]and can be computed as follows:
Pr[ω|A] =
Pr[ω]
Pr[A] ifω∈ A
0 otherwise
Ifω∈ A, then Pr[ω|A]must have a value that is proportional to Pr[ω], and the factor of proportionality must be1/Pr[A](so that all probabilities sum up to one). Otherwise (i.e., ifω /∈ A), it is impossible thatωholds, and hence Pr[ω|A]must be equal to zero (independent from the probability ofA).
The definition of Pr[ω|A]can be generalized to arbitrary events. In fact, ifA andBare two events, then the probability of eventBgiven that eventAholds is the sum of the probabilities of all elementary eventsω ∈ Bgiven thatAholds. This is formally expressed as follows:
Pr[B|A] =
ω∈B
Pr[ω|A]
In the literature, Pr[B|A]is sometimes also defined as follows:
Pr[B|A] = Pr[A ∩ B] Pr[A]
Consequently, if two events A and B are independent and Pr[A] > 0
(Pr[B]>0), thenPr[B|A] = Pr[A ∩ B]/Pr[A] = Pr[A]·Pr[B]/Pr[A] = Pr[B]
(Pr[A|B] = Pr[B ∩A]/Pr[B] = Pr[B]·Pr[A]/Pr[B] = Pr[A]). Put in other words: ifAandBare independent, then whetherAholds or not is not influenced by the knowledge thatBholds or not, and vice versa. Consequently, one can also write
Pr[A|B] = Pr[B ∩ A]
and putPr[B|A]andPr[A|B]into perspective. In this case, the formula
Pr[A|B] = Pr[A]Pr[B|A] Pr[B]
is known as Bayes’ theorem and is frequently used in probability theory. Further- more, one can also formally express a law of total probability as suggested in Theo- rem 4.1.