FOUNDATI<lNS
OF THE
THEORY OF PROBABILITY
BY
A. N. KOLMOGOROV
Second Encli•h Edition
TRANSLATION EDITED BY NATHAN MORRISON
WITH AN ADDED JUBLIOGRAPHY BY A. T. BHARUCHA-REID
'UNIYERSI'IY OF OR .. �GON
CHELSEA PUBLISHING COMPANY
NEW YORK
•COPYRIGHT 1950 BY CHELSEA PUBLISHING COMPANY COPYRlGHT C )956, CHELSEA PUBUSHING COMPANY
LIBRARY OF CO.NGRESS CATAWGUE CARD NUMBER 56-11512
EDITOR'S NOTE.
In th
e preparation
of
this
English
translationof Professor
Kolmogorov's fundamental work, the original German monograph
Grundbegriffe der WahrsckeinLichkeitrecknung
which appeared
i
n
the Ergebnisse Der Mathematik in1983,
and also aRussian
translationby G.
M.Bavli published in 1936 have been used.
It
is
a pleas
ure toacknowledge the invaluable assistance of
two
friends and
fo
rmer colleagues, Mrs
.Ida Rhodes and
Mr.
D.
V. Varley, and also
of my
niece,
Gizella Gross.
Thanks are also
due to Mr.Roy
Kueblerwho
made
available
for comparison purposes h
i
sindependent English translation
of
the
o
ri
gina
l German monograph.
PREFACE
The purpo
s
�of this
mon
ograph is
togive
anaxiomatic
foundation
for
the theory ofprobability. The
author sethimself
the task of
putting
in their natural place, among thegen
e
ralnotions
ofmodern mathematics,
thebasic
concepts of probabilityt
h
eo
ry-concepts
whichuntil
recentlywere
con
si
dered
to be quite peculiar.This task
would have been
a rather hopeless one beforethe
introduction of Lebesg
u
e�stheories
of mea
sure and integration.
However,
after Lebesgue's
publication ofhis
investigations,the
analogies
bet
we
en measureof
a
setand probability
ofan
event,and
between integral of
af
uncti
o
n and mathematical expectation
of
a random vari
able,became apparent.
These analogiesallowed
of further
extensions
;thus, for
example, various properties ofinde
p
endent random
variableswere seen
to be in co
mp
leteanalogy
with the corresponding properties of
o
r
th
ogonal
functions.But
if
·
prob
ab
il
itytheory
was
to be based on the above analogies, itstill
was
nec
es
s
a
ry to makethe
theo
ries
of
m
ea
s
ure and integra tion independen
t of thegeometric elements
whichwere
in
the
foreground
withLebesgue. This
has
been done by
Frechet...
While a
con
cep
tion of probabilitytheory
based on
the
above
gener
a
lviewpoints has
been curr
en
tfor some
time
am
ong
certainmathematicians,
t
h
erewas
lacking
acomplete exposition of the
whole
system, free
ofe
x
tran
eous complications.( Cf.,
h
ow
e
v
e
r,the
bookby Frechet, [2] in
thebibliography.)
I
wish to
callattention
to tho
se
points
oft
hepresent exposition
whichare
outside the above-mentioned
rang
eof ideas
familiarto
t
h
e
specialist.
Theyare the
following:Probability
distributions ininfinite-dimensional spaces (Chapter III, § 4)
; differentiation·and
integration of mathematical
expectations withrespect
to aparameter (Chapter IV, § 5); and especi
al
lythe th
eory ofcondi-tional
probabilitiesand
c
on
dit
ional expectations (Chapter V).
It should
be
emphasized that
th
e
se new proble
msarose, of neces
sity,
fromsome perfectly
con
cretephysical
problems.1• Cf., e.g., the paper by M. Leontovich quoted in footnote
6
on p. 46; also thejoint paper by the author and M. Leontovich, Zur Sta.tistik dsr kcmtinuier
lichen Sy8teme und des zeitlichen. Ve-rlaufea der ph1Jsikalischen Vorgiinge.
Phys. Jour. of the USSR, Vol. 3, 1933, pp. 36�63.
vi Preface
The sixth chapter contains a
survey,
without proofs, of
some results of A.Khinehine
and the author of the limitationson
the applicability of the ordinary and of the strong law of large num bers. The bibliography CQntainssome
recentworks
which shouldbe
of interest from the point of view of the foundationsof
the subject.I
wisq
to expressmy warm
thanks to Mr. Khinchine, who hasread
carefullythe
whole manuscript and propos'ed several improvements.Kljasma near
Moscow,
Easter1933.
CONTENTS
Pa,ge
EDITOR·s N 'OTE • .. . • • . • • • .
·
• . . . • • • • • I •�
• • • • a • • • • • • • • • • • .. •iii
PREFACE . . .. . . .. . . .. . . .
I. ELEMENTARY THEORY OF PRoBABILITY
v
§
1" Axioms.
. . . ...
..
..
. . ..
. ..
..
. ..
. . ...
..
.
.. a • • • • • • • • 2 §2.
The relation toexperimental
data . . ....
.
. . ....:. . . 3
§ 3.Notes
on terminology. . • • .. . . 5
§ 4.
Immed
iate corollaries of the axioms; conditional proba-bilities ; Theorem of Bayes . . . . . . . . . . . . . • . . . . . . . . . . 6 § 5.Independence
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8§
6.
C
onditionalprobabilities
as random va
riab
les
;Markov
chains .
. .. ..
. . . . . . .. . . . . . 4 • • • • ... • • • • • • • • • • • • • • • • • • • 12D. INFINITE PRoBABU.ITY FIELDS
§ 1. Axiom
of
C
ontinu
it
y.. . . . . . . . . . . . . . . . . . .. . . . . .. . . 14
§ 2.
Borel fields of probability ..
. .· ..
. . . . . . . . . . . . . . . . . . 16'·.• § 3. Exampl
e
s ofinfi
ni
t
e fields of probability... . . • . . . 18
�
---m. RANDOM v AJUA.BLEs
§
1.
Probability fun
ctions . . . . . . . . . . . . . . . . . . . . . . . 21§
2.
Definit
ion of random vari
ablesand
of distribution func.;. tiona . . ..
. . . I • • I • • • • • • , • • • • • • • • • • • • 22 § 8. Multi-dimensional distribution functions.. . . . . . . . . . . . . 24
§ 4.
Probabilities in infinite-dimensionalspaces.
.. . . . . . . . 27
§5.
Equivalent random
variables
; various kinds ofconverg-ence . . . . . . . . . . . . . 83
IV. MATHEMATICAL EXPECTATIONS
§ 1.
A
bstractLebesgue
integrals . . . . . . . . . . . . . . . 37§ 2.
· Absoluteand conditional mathematical expectations.
.
. . 39
§ 3. The Tchebycheff inequ
ality . . . . . . • .. . . 42
§ 4.Some
criteria
for con
v
ergence . . . .·. . . . . . . . . . . . . . . . 43 §5.
Differentiation and integration of mathematicalexpecta-tions with respect to a p
a
rameter.. . . . . . . . . . . . 44
V. CONDITIONAL PftOBABU..ITIES AND MATHEMATICAL EXPECTATIONS
§
1. Conditional probabilities . . . . . . . . . . . . . . . •47
§
2. Explanation of a Borel paradox.. . . . . . . . . 50
§
3.
Conditional probabilities with respect to a random vari-able . . . . . . . . . . . 51§
4.
Conditional mathematical expectations. . . . . . . . . . 52../
§1.
§
2.
§
s.§ 4.
§ 5.
VI. INDEPENDENCE; THE LAw OF LARGE NuMBERS Independence . . . lit • , • .. • • •57
Independent ran
d
om variables. .. . . 58
The
Law of Large N urnhers
.. . . . . . . . . . . . . . . . . . . . . 61
Notes on the concept of mathematical expectation . .
. . .
.64
The Strong Law of Large Numbers; convergence of a
series . . . .. . . .. . . . . .. . . . .. . . .. . . . 66
APPENDIX-Zero-or-one Jaw in
the
theory of pr
o
bability
.. . . . 69
BIB.LI.OGRA.PHY . • .. . . . . .. . .. . . . . . . . . • . • .
. . . 73
NOTES TO SUPFLEMENTARY BIBLIOGRAPHY • . • . . . • • . . • , . •
77
Chapter I
ELEMENTARY
THEORY OF PROBABILITY
We define
as elementary theory of proba
bility
that
part
of
t
he
theory
in
whic
hwe
ha
ve
to
deal
with
probabilities
of
only
a
finite number of e
v
en
ts. The theoremswh
ic
h
we derive here can
be applied also
tothe problems connected with an
infinite
num
b
er
of
random events. However, whenthe
la
tte
rare
studied, essen
tially
new
principles
ar
e
used. Therefore
the
on
ly
axiom
of
the
mathematical theory of
probability which
d
eals particularly
wi
t
h
the
case
ofan infinite nu
mbe
rof
randome
ven
ts
is
not
introduced
u
nt
i
l the
beginning
of Chapter
II
(Axiom
VI).
The theory of probability, as
a
mathematicaldi
sc
ip
lin
e
, canand should be
developed
f
r
om
axioms in
e
xactly
the same way asG
eometr
yand Algebra.
This
meansthat
after we have defined
the
e
le
me
n
ts tobe
studied
and
t
h
e
ir basic relations,
and
have
stated
the
axiomsby
whichth
e
se
relations are
tobe
governed,
all
further
e
xpo
si
t
ion
must
be based
exclusively
on
th
ese
axioms,i
n
depe
n
dent
ofth
e
usual concrete meaning of these
el
ement
s and
their
relations.In
accordance with
th
eabove, in §
1t
h
e concept of a
field of
probabilitie8
is defined
as a
systemof sets which satisfies certain
c
o
ndi
ti
o
n
s
.What the elements
of this
set
rep
rese
n
tis
of no
im po
rtan
ce
inthe purely
ma
th
ema
ti
ca
l
deve
l
opme
nt
of
the
th
eory
of p
roba
bi
l
ity
( cf. the
introduction of bas
i
c
geometric concepts
in
the Foundatio'TUJ of Geometry
by Hilbert,
or the
definitions of
groups,
rings
and
fields in
abstractalgebra).
Every
a
xio
matic
(abstract)
theory
adm
its
,as is
well known,
of a
n
unlimited
number of concrete interpretations besides those
from
which
itwas
derived. Thus
we find applications in
fi
eld
sof
s
cie
n
ce
which
ha
ve
no
relation to
the concepts
of random event
and
of
probability inthe precise
meaning
of
these
words.
The
postulational
basis
of
th
e
theory of probability
can
beestablished
by
different
meth
od
sin r
es
p
e
c
t
tothe s
e
le
ction of axiomsas
wel
l as inthe
s
ele
cti
o
nof basic concepts and
relations.
However,
ifour
a
i
m
is
toa
chiev
e
the
utmosts
impl
icityboth
in
2 I. Elementary Theory of Probablllty
the
systemof axioms and
in the further development of the theory,t
he
n thepostulational concepts of
a random eventand
its probability seem the most
suitable. There are other postula tional systems of thet
heory
of probability, particularlyt
hose inwhich the co
n
cept of probability isnot
treated aso
ne of
the basic concepts, butis itself expressed
b
y
means of other coneepts.1However,
in that case, the aim is different,namely, to
tie up as closely as possible the mathematicaltheory w
it
h the empi
rica
ldevelopment of the
theory of
probability.
§
1.
Axiome1Let E be a c
o
llect
i
onof elements
f,
..,, C, . . ., which we
shall eallelementa,ry
ev
e
nt
s
, and it a setof subsets of
E
; theelements of
the set i'f will be called random
events.
I. if
i8a fielda
ofsets.
II.
tJ
contains
the setE.
III. To
each set A in i}
is assigned a non-negativereal number
P(A). This number
P(
A
)
i8ca
l
led the probability
of
the
eventA.
IV. P(E)
equals
1.V.
If
A
andB
have no element in common, thenP(A+B) =
P(A)
+P(B)
A syste
m
ofsets, i},
to
ge
th
er with a definite as
s
ignmen
t
of
numbers P(A), satisfying
Axioms I-V, iscalled a
field of prob
ability.
Our
s
ys
temof Axioms
I-V is conBistent. This is proved bythe
following ex
ampl
e.
LetE
consist of the single element�
and let tf consist of
E
and the null
set 0. P(E) is then set equalto
1 and P(O) equals0.
1 For example, R. von Misea[l]and
[2]
and S. Bernstein [1].1 The reader who wishes from the outset to give a concrete meaning to the
following axioms, is referred to § 2.
• Cf. HAUSDORFF, Mengenlehre, 1927, p. 78. A system of sets is called a field
if the sum, product, and difference of two sets of the system also belong to the
same system. Every non�empty field contains the null set 0. Using Hausdorff's
notation, we designate the product of A and
B
by AB; .the sum br_ A.+ Bin the case where AB = 0; and in the general ease by A+ 8; the difference ofA and B by A-B. The set E-A, which is the complement of A, will be denoted
by A". We shall assume that the reader is familiar with the fundamental rules of operations of sets and their sums, products, and differences. All subsets
§ 2.
The Relation to Experimental Data 3 Our system of axioms is not, however, complete, for invarious
problems
in thet
heo
ryof
probability different fiel
ds of proba
bility
have tobe examined.
The Construction of Fields of
Probability. Thesimplest fi
e
l
ds of probability are constructed as follows. We take
an
arbitrary finiteset
E
={e1• E1,
• . • ,E1J
and an ar
bi
trary
set
{P1• Pa• . • ..•P.t}
of
non-negative numbers
witht
he sum Pt+
P2 + . . . + Pt =1.
li is
taken as the
s
e
t of alls
ubs
ets
in E,
an
dwe put
P{E,., �'-'.
· .,�i.a}
=
p,l +Pit+··· + /Ji,.
In such cases,
PH P2r
• • • ,p,.
arecalled the
probabilities
oft
heelementary events
elr
tz,.
..
' e" or simply eleme
ntar
y probabilities. In this way are derived
all
possiblefinite
fields of
p
r
obab
ility
in which0: consists
of
the setof
all sub
s
ets
of
E. (The
fieldo
fprobability is called
finite
if thes
e
tE is finite.) For
furtherexamples see
C
hap.II,
§
3.
§ 2.
The Relation to Experimental Data�We apply
the
theory of probabilityto the
a
ctu
al
world
ofexper
i
ments
inthe
following manner:1)
Ther
eis
assumed a complex
ofconditions,
e, whichallows
ofa
ny numberof repetitions.
2) We
s
tudy a de
fi
nit
e
set
of events
whichcould
take
placeas
a
result oft
heestablishment
ofthe
con
diti
ons
e. In individualcases
where the conditionsare
realized,
theevents occur, gener
ally, in
different ways. LetE be
theset
of allp
o
ssi
ble variants �1. ��� • • •of
the outcome
of the given events. Someof
th
es
evari
ants mi
g
ht in general not occur. Wei
nc
l
ude
ins
etE all the vari
ant
s
which weregard
a
priori as possible.
3) If
the
variant of thee
v
e
nts
which hasa
ctuall
y occu
r
re
d4 The reader who is interested in the purely mathematical development of
the theory only, need not read this section, sinee the work following it is based only upoJ\ the axioms in
§
1 and makes no use of the present discussion. Here we limit ourselves to a simple explanation of how the axioms of the theory ofprobability arose and disregard the deep philosophical dissertation• on the
concept of probability in the experimental world. In establishing the premises
neceasary for the applicability of tbe theory of probability to the world of actual events, the author has used, in large measure, the work of R. v. Mise&,
I. Elementary Theory of Probability
upon
realization ofconditions
Sbelongs to the set
A(defined
in anyway) , then we
say
that
theevent
A
has taken place.
Example:
Letthe complex
5of conditions be the tossing
ofa
coin two times.
Theset
ofevents mentioned in Paragraph
Z)
con
sists
ofthe fact that at
eacht
oss eithera head or
tailmay come up.
From this
itfollows that
onlyfour different variants (elementary
events) are possible, namely:
HH, HT, TH, TT. If the"event A"
conn
otes the occurrence of a repetition,t
henit will consist of a
happaning
ofeither
ofthe
first orfourth
ofthe
fourelementary
events. In this manner, every event may
be
regarded as
a set of elementaryevents.
4) Under
cert
ai
n conditions,which we shall
notdiscuss
here,we
may assume
that toan
event
A
which may ormay not occur
under conditions
S, isassigned a real
number P(A)
w
hich has
thefollowing
characteristics :(a}
One
can be pr
ac
tic
al
ly certain
that
ifthe
comple
x ofcon
ditions Sis repeated a large number
oftimes,
n, thenif
m
be the
number
of occurrencesof
eventA,
the ratiornjn
will differ
very slightly fromP(A).
(b)
IfP (A) is
very
smaUt onecan
be practica
llycertain
thatwhen conditions
eare realized
onlyonce,
the
event
A
would not
occur at all.
The Empirical Deduction of the
Axioms.
In general, onemay
assumethat
the system a of the
observed eventsA,
B, C, . . . towhich
areassigned definite probabilities, form
a
field containing
as an
e
le
ment
theset
E(Axioms
I,II,
and
thefirst part
ofIII,
postulating the
existence ofprobabilities).
Itis clear that
O<m/nSl
so that
the second part
ofAxiom
Illis
quitenatural.
For theevent E,
m isalways
equalto
n,so
thatit
isnatural
topostulate
P(E)
=1
(Axiom IV).
If,
finally,
A
and
Bare
nonintersecting
(incompatible),then
m ==m1
+m2
where
m, mh m2
ar
erespectively
the number ofexperiments
inwhich the events
A +
B, A,
and
B
occur. From
this itfollows that
It therefore
seems
appropriateto postulate
thatP
(A
+B)
§ 3. Notes on Terminology 5
Rema,rk 1.
If
two separate statements are
each
practically
re
li
ab
le, thenwe
may
say
thatsimultaneously
th
ey
are
b
o
t
h reli ..able, although
t
h
edegree of reliability is somewhat lowered
in
theprocess.
If,
howe
ver
,the number of such
sta
tements is very large,then from
the p
ractic
a
l reliability
of
each,
one cannotdeduce
anything about the
simultaneous correctness ofall of
them. Therefo
refrom
the principle stated
in(a) it
does
not
follow
thatin
avery
large
numb
er of ser
ies
of
n
tests each, ineack
the ratiomjn
will differ onlyslightly
from P(A) .
Remark
2.To
animpossible
event (anempty
set) co
rre..
sponds,
in ac
c
o
rda
nce witho
ur
axioms, thep
rob
abil
i
t
yP(O)
=05,
burt
th
econverse is
not true:P(A)
=0
d
o
es
not imply the
im
possibility of
A.
When P(A)
=0, from principle
(b)
al
lwe can
assert
is
that when
theconditions
5 ar
e realized buto
nce,event
A
is practically impossible. I
t does notat all
ass
er
t
. however. thatin a sufficiently
l
on
gseries of
tes
ts
theevent A
wi
ll not occur. On
the
o
th
erhand, one
can
deduce
from
the prineiple(a)m
e
re
l
y that
when
P(AJ
= 0 and n is very large, theratio
m/
n
will be
verysmall
(it
might, fo
r
example, b
eequal to
1/n).
§ 3.
Notes on Terminolo8YWe
havedefined
th
eobjects of our f
uture study, random
events, as
sets.However,
in thetheory of probability
many
settheoretic
conceptsa
re
designated byo
th
e
rterms. We
shall giveh
er
e a brief listof
suc
hconcepts.
Theory of Sets
1.
A
and
B do
not
intersect, i.e.,AB
=0.
2.
AB
. . . N = 0.3.
AB
. . . N==X.
4.
A
+
B
-f-
. . .+
N =X.
• ��- S 4. Fonnula (8).
Random
Events
1.
Events A and B are
incom
pat
ib
le.2.
Even
tsA, B,
. . . , N areincompatible.
3.
Event
X isdefined
as the si
m
u
l
ta
neous
occurrence of
e
vents A,B�
. . . , N.4.
Event
X isdefined as
theoc
cur
ren
ceof
at least o
neof
6
A.
I. Elementary Theory of Probability
Tkecwy of Sets
5.
The complementary set6.
A=
0.
7.
A =E.
Random
Even.ts5.
The
opposit
e event Aconsisting of
the
non .. oceur ence of eventA.
6.
Event A is impossible.7.
Event
A must occur.8.
The system
91
of the setsAh A2, • • • ,
A,.
formsa
decomposition
of the setE
ifAt
+ Az + ..
. +A.
=E.
8.
E1:Periment
Zconsists
ofdetermining which of
the
events
A1, Ah
. . ., A,.
occurs.We therefore call Ah Az, . . .
,
A,.
thepossible results
of ex·periment
!I.
(This
assumes th
a
t
the
sets A, do not intersect,in
pairs.)
9. B
is
a subset ofA:
Bt=
A.
9.
From the occurrence ofevent
B
follows the inevitable occurrence ofA.
§ 4. Immediate Corollarie8 of the Axioms; Conditional Probabilities; Theorem of Bayes
From A +
A
=E
andthe
AxiomsIV
andV
it follows that
P(A) +
P(A)
= 1P(A)
=1-
P(A)
•Since
E
=0,
then, in particular,P(O) =
0 ..
(1)
(2)
(3)
If
A,
B,
. ..
, N are incompatible, then from AxiomV
follows
t
h
eformula
(the Addition Theorem)P(A
+ B +
. . . +N)
=P{A)
+P(B)
+.I.+P(N).
(4)
If
P(A) >0, then
the quotientp
{B)=
P(AB)A
P(A)
( 5 )
is defined
to be the
conditional probability
of the eventB
underthe
condition
A.§ 4. Immediate Corollaries of the Axioms 7
P(AB)
=P(A) P.t(B).
(6)
And by induction
we
obtain
the ge
n
eral formula
(the Multi-plication Theorem)P
(At
AI·.·
An)
=P (AI) P
A1(A,)
P
A1A1(A
a)
• • •P
A1 At • • • A11-1{A
..), (7)
The following th
e
ore
msfollow easily :
P.c(B)
>0,
P.c(E)
=1,
p
..t(B
+
C)=
p A(B)
+P.a{C).
(8)
(9)
(10)
Comparing
formulae
(8)-(10)
withaxioms 111-V, we find t
hat
the system
�
of
sets
togetherwith the
setfunction
PA(B) (pro
vided
A
isa fixed set),
form afield
of
probability andtherefore,
all the
above
general
theorems
concerningP
(B)
hold true
for
the
conditional probability P
.. (B) (providedthe event
A
isfixed).
It
is
also
easy
to
seethat
P
..4(A)
= 1.From
(6)
and
thea
n
alogou
s formu
l
a P(A
B)
=P (B)
P8
(A}
we
obtain the
important formula:(11)
p
(A)= P(A)_P..c(B)
(12)
B
P(B) ,
which contains,
in essence, theTheorem
of
Bayes.THE THEOREM ON
TOTAL PROBABILITY:
Let
At+
A2 + . . . +A,.=
E (this
assumes that the events
Ah A2,
. . . , A,. are mutuallyexclusive)
and
letX be
arbitraty.Then
P(X)
=I='(A1)P..t.(X)
+P(A1)PA,(X)
+ · · · +P(An) P.A"(X)
. .(13)
Proof:
X =
A1X
+A%X +
..
. +A,.X;
using (4)
wehave
P(X)
=P(A1
X)
-tP(At
X) + ... +
P(A��
X)
and according to
(6) we h
a
ve
atthe
same time P(AX) =P
(
A,)
P
Ar(X)
•THE THEOREM
OFBAYES: Let
At+ A:z
+ . . .+A,.= E
andX be arbitrary,
then
P(Ai)
pAI(X)
}
P..r (A,)=
P(Al)
p ...
(X)+ P(A.) p
At(X)
+ ...
:
P(A,.)
pA.(X)'
<14>
8 I. Elementary Theory of Probability
A1, A2,
• . . ,An are
oftencalled "hypotheses"
and formula (14)is
considered as the probabilityPx
(A,)
of the hypothesisA, after the occurrence of event X.
[P(A,)
then denotes thea
priori
probability ofA,.]
Proof:
From (12) we havep
(A·)
=P(Ai)
p
At(X)
x ' p
(X)
.
To obtain the formula
( 14)
it only remains to substitute for the prob
ab
ilityP (X)
its value derived from ( 13) by applying thetheorem on
total
probability.§ 5. Independence
The concept
of
mutualindependence
of two or more experi·m
e
nts holds, ina
certain sense, a central position
in
the theory of probability. Indeed, as we have already seen, the theory ofprobability can be regarded from the mathematical poi
n
t of view as a special application of the general theory of add
itive set func·tions. One naturally asks, how did it happen that the theory of probability developed into
a
large individual science possessingits
own methods?In order to answer this question, we must point out the spe cialization undergone by general problems in the
theory of
addi tive set functions when they are proposed in the theory ofprobability.
The fact that our additive
s
etfunction P (A)
is non-negative and satisfies thecondition P (E)
=1,
does not in itself cause new difficulties. Random variables (see Chap.III)
from a mathematical point of view represent merely functions measurable with respect to P(A), while their mathematical expectations are abstract Lebesgue integrals. (This analogy was explained fully for the first time in the work of Frechet6.) The mere introduction of the above concepts, therefol'e, would not be sufficient to pro duce a basis for the development of a large new
theory.
Historically,
the independence of experiments and random variables represents the very mathematical concept that has giventhe theory of probability its peculiar stamp.
The
classical work or LaPlace,Poisson,
Tchebychev, Markov, Liapounov, Mises, and§ 5. Independenee 9
Bernstein is actually
dedicated to
the
fu
nd
a
m
e
nt
a
l investigation ofse
ri
es of independent random variables. Though thelatest
dissertations (Markov,
B
e
rn
ste
in
andothers)
fr
e
q
ue
ntly
fail
toas
su
m
e
complete
independence,t
h
ey
nevertheless reveal the necess
ityof introducing
analogous,weaker,
conditions,in
o
rd
e
r
to obtain
sufficiently significant
results(see
in this chapter§ 6,
M
ark
o
vchains ).
We thus see,in
the concept
of
independence, at least the germof the pecul
i
ar type o
fp
r
o
blem
in
pro
bab
ility
theory.
In this book, however, we shall not stressthat
fact,fo
r
here we are
int
er
es
t
ed mainly inthe
logical foundation for the specializedinvestigations of
the th
eo
ry
of probability.In consequence, on
e
of
the most important problems in theph
ilo
so
phy
of the
natural sci
en
ces
is-in ad
diti
onto t
h
e wellknown
one
regardingthe
essence oft
h
e concept of probability
itself-to make
pre
ci
sethe
premises
whichwould
make
it possible
t
oregard any
gi
ve
n
real events as independent. This question,however, is beyond the scope of
this book.
Let
usturn to the
defi
nit
i
o
n ofindependence.
Given n experim
e
nts
11<1>, W<2> � • • • ,91<">, that is,
n
decompositions
of the
basic set
E.
It is the
n possible toa
s
si
gn
r
=r1r2
• • •r" probabilities (in
the general case)Pq,9 • . . . 9,. =
P(A�!)
A�!>
. . .A��>)�
owhich are entirely a
r
bi
t
r
ary
exceptf
o
r the single condition 1
that
(1)
DEFINITION
I.
n
ex
perime
n
ts
�< 1>, U<2>, • • • , !(<•>a
re
called
mutually independent,
if for any
qh qz, . • • , q,. the followingequation
holds
true :P(AmA<2>
,, ,. • . .A<">)
q,. =P(Au>)
P (A<2>)
P(A<"1)
(2)
• q. IJt • • • f• •
' One may construct a field of/robability with arbitrary probabilities sub
ject only to the above-mentione conditions, as follows: E is composed of.,. elements �IU ft • . . 9,. • Let the corresponding elementary probabilities be
Pfh fa • • • ,,. , an<l finally let
A�
be the set of all �q, , ••• • 9• for which10 I. Elementary T�eory of Probability
Among the
r
equations in (2), there are only
r-r.-rz-. . .-r,.+
n-1independent equations8•
THEOREM I. If
n experiments
�:c•), m;<z>, • • •,
I:C•>are mutu
ally independent, then any
m of
them
(m<
n).
�<,�J.�t<'->,
. • .,
2((i,..�
are
also independent11•
In the case of independence
we then
have the equations:
P(A�:�>A��> ... A��>)= P{A�:l,)
P(A�:·>)
. . .P(A�::'>)
(8)
(all
i10 must
be
different.)
DEFINITION
II. n
events
A1,
A2,
• • • ,A,..
are
mutually ifUlepen-
dent,
ifthe
deconpositions
(trials)
E
=
At + A�c(k
= 1,2,
... , n)
are independent.
In
this
case r1 = rz
= . . . = r,.. =2,
r =2"'; therefore,
of the
2"
equations
in(
2) only 21'l
-n
-1
areindependent. The
necessary
and sufficient conditions
for
the independence of the
events
Att Az,
. . ., A,.
are
the
following 2•
-n -
1
equations10 :
(4)
m
= i, 2 • . . .• n,AU
of these equations
are mutually independent.
In
the
c
as
e n.
=2 we
obtain
from (4) only
one condition (22
-2-•
Actually, in the case of independenee, one may choose arbitrarily only r. + r, + . . . + 1,. probabilities p•h == p (Au') so as to comply with the nconditions ' '�
EP:l
= 1.9
Therefore, in the general case1 we have r-1 degrees of freedom, but in the case of .independence only .,., -r ,., + . .. + ,. "-n.
' To prove this it is sufficient to show that from the mutual independence
of n decompositions follows the mutual independence of the first n-1. Let us assume that the equations (2) hold. Then
P{Au'Ar.!l
!h !Ia • • •A1·-••) �P(A111A111
'I•- 1A''"')
= � q, ""' • • • Y ..
'"
-P(A'11)P(A121)
1/1 Y2 " ' 'P(A'"-11) �P(A'"')
f11- =P(.A11')P(A121)
P(A••-I))
I """"" f11 91 'It ' " •
fw-I It
f• Q.E.D.
• SeeS. N. Bernstein [1] p:p. 47-57. However, the reader can easily prove
§ 5. Independence 11
1
=1)
for the
independence oftwo
eventsA
1a
n
dA
:a :P (A1A1)
=P (At ) P (A2) ,
(5 )
The
system
of
equations (2)
re
d
u
ce
s itself, in this case, to three eq
u
at
i
ons
,besides (5) :
P (AtAt)
=·P (At)
P (A2)
P (AtA,)
=P (At)
P (A2)P (AtA.)
=P (At) P (Az)
which obviously
follow from
(
5)
.11It need h
ar
dl
ybe
remarked that from
the
independenceof
theevents A
1,A2,
. . • ,A,.
in.
pairs,
i.e.
from therelations
it
does not at all followthat
whenn
> 2 theseevents
are inde pendent12. (Forthat
we
needthe
e
x
i
ste
nce ofall
eq
u
a
tio
ns
(4) . )
In
int
roduci
ng
the concept of independence, nouse
was made ofconditional
probability.Our aim
has beento
exp
l
ai
n as clearly as possibletin ap
u
r
el
y mathematical
manner, the meaning of thisconcept. Its
applications, however,
generally dependupon
the properties of certain conditionalprobabilities.
If
we ass
u
m
e thatall probabilities
P(Av{i>)
are
p
o
si
ti
v
e, then from the equations (3) itfollows13 that
p ,.tfl) �"·· • • • ..t<'--l)
(A�::->)
=p (A�=)) .
•• " ,. - l
( 6 )
From
the
fact that formulas ( 6)
hold,and
from
theMultiplica
tionTheorem (Formula (7) , § 4), follow the
formulas ( 2) .
We obtain, therefore,THEOREM
II :
A necessarya,nd sufficient condition /or inde
pendence of experiments !J(t), 9£C•>,
• • • , we•>in the case of
poli-u P(At At) -
P(A 1)
- P(A1 A1) =-P(A1) - P(A 1} P(A1) = P(A]){t
-P(.A,)}
-
P(A.J P{A.)
, etc.sa This can be shown by the following simple example (S. N. Bernstein) :
Let set E be composed of four elements l1 , � • • f's , E. ; the corresponding elemen tary probabilities fh, p�, p,, P• are each assumed to be "' and
A
= {Et. ll}.
B - {�t . l3},
C = fet. �c} •
It is easy to compute thatP (A ) = P(B) = P (C) = %,
P (AB) c::::P (BC) = P (AC) = "' = ( �) ',
P(ABC)
= "' + ('h )'.11 To prove it, one must keep in mind the definition of conditional proba
bility ( Formula (5 ) , § 4) and substitute for the probabilities of products the
12 I. Elementary Theory of Probability
tive p1"0babilities
P (A �'l)
isthat the conditional
probability
of
the
results
AqO>
of
experiments
mw un
der the hypothesis
that
several
other
tests
vr<M, ¥J(i.>• � • • • w<i.t>have had definite reBUlts
A
<t:,) A <it> A
<it>A
<f.t)
i
oequal to
t
h
e absolute probability
'• ' till ' r, ' • • • , D r.oP(AaCi>).
On the basis of formulas ( 4 ) we
can p
ro
ve
inan analogous
manner
t
he
following theorem :
THEOREM III.
If all
probabilities
P (A�:) are positive, then a
necessary and sufficient c
o
nd
i
t
ionfor mutual
independence
of
the events
A11 A:�u
• • • ,A
.. isthe satisfaction of the equations
p ..c.,, ..t�s . . . A��
(A,)
=p (.A.)
(7)
for
a,n,ypairwise
d
i
ff
e
r
en
ti1, i2, . . . , i�c, i.
In
th
e
case n =
2
the conditions (7) reduce
to two equations :
p A.
(A.)
=P(A,) ,
I
(8)
pAs (Al)
=p (Al)
•It
is easy to see that
the firstequation in (8) alone is a necessary
a
n
d sufficient condition for the independence
of
A1 and Az pro
vided
P(Al)
>0.
§
6. Conditional Probahilltiea as Random Variables, MarkovChains
Let
I bea decomposition of
the
fundamental
s
et
E :
E
=A 1
+A2
+ . . . +Ar,
and
xa real
function of
the
elementary e
v
en
t
f,
which for every
set AtZ
is
equal to a corresponding c
on
s
ta
n
t
a.q.x
is
then
called a
random variable,
and t
he
sum
E ( x)
= �ct0P(A0)
fis
called
the
mat
h
e
'lfULti
calexpectation
of the variable x.The
theory
of
ra
n
dom variables willbe
developed in Chaps.
III
and
IV.We
shall not limit ourselves there merely to
thoser
and
o
mvari
ables which
can assumeonly
a finite number of different va
lu
es.!'-
randomv
a
ria
bl
e which forevery set
A11
assumes the value
P
...(B) ,
we
shall call
the conditional
pro
ba
b
i
lit
y of the
event B
after the given
experi
ment
!land sh!Ul desig'11Ll.f£
itby P• (B) . T
wo
§ 6. Conditional Probabilities as Random Variables, Markov Chains 13
q = 1, 2 , .. . , ,.t .
Given
any decompositions (experiments)
�c1>, m<z>, • • •,
91<">, wewe shall represent by
�(1)�(2)
• • •
i[(ff)the decomposition of set E into the products
A.,l<t) A�!2> . • •
A o-�!'>
Experiments �(1>, ·2(<2>, • • • , i{Cn)
a
r
e mutually independent whenand
on
ly whenPwu
•(I) • • • 1•• - u(A�1)
= P(A�1)
,k
and
qbeing
arbitrary14•DEFINITION :
The sequence
�[(lJ, �<2>,
• • .,
W(is), • • • formsa Markov chain
if for arbitraryn
and
qP91m
9fltl • • . 9{1• -n(A
�1)
=Pwr• -
11 (A;'1).
Thus, Markov
chains form a natural generalization
of sequences
of mutuallyi
ndepend
e
nt experiments.If
we setPt
...,.(m, n)
= PAf•t(A�111)
m <
n ,"• .
then
the basic formula of the
theory ofM
arkov chains will assumethe
form :
If
we denotethe
matrix
JI:Pq
..g,.(m, n)Jj
writtenas15 :
p(k,n)
=p ( k,m) p (m,n)
k
< m <
n .
·( 1 )
by p
(
m,
n) ,
(
1 )
canbe
k
<
m
<
n.
(2)
u The necessity of these conditions follows from Theorem II, § 5 ; that they are also sufficient follows immediately from the Multiplication Theorem
( Formula (7) of
§
4 ) .u For further development of the theory of Markov chains, see R. v. Mises
pl,
§ 16, and B. HosTINSKY, Methodes generales du oolcu.l des proba.bilites,Chapter II
INFINITE PROBABILITY FIELDS
§ I. Axiom of Continuity
We denote
by � Am,
as
is
customary, the product of
the sets
"'
A.,. (whether finit
e or
infinite in number)a
n
dtheir sum
by<S
Am
•..
Only in the case-of disjoint
sets A,.
is theform
I A,.
used insteadof
6A... Consequently,"'
�A"' = A1 + A1 -f. · · ·,
IA
..= Al + A. + · · ·�
fll
"'
In
all fu
t
ure investigations, we
shall assumethat besides Axioms
I -V,
stilla
nother
holds true :VI.
For a decreasing Bequence of e-ventBof tj, for
which
XIA,.
= o , "tke
following equa,tion holds �lim P (A,.) = 0 .
( 1)
(2)
(3)
In thefuture
we
shall designateby
probability fieldo
n
l
y
afi
e
ldof
probabilityas outlined
in the firstchapter, which also
satisfies
AxiomVI.
The
fie
lds
of probability as
defined in th
efirst
chapter without Axiom VImight
be called generalized fields of probability.If
the
system tr ofsets
is finite, AxiomVI
follows
f
r
om Axioms I - V.For
actually,
in thatcase
there
existonly
a finite numberof different
sets
in th
esequence ( 1) . Let A" be the smallest
among
t
h
e
m,then
all sets-Alofp coincide with A�c and we obtain then§ 1. Axiom of Continuity
Ai
=At+p
= i:lA,.
::::: o ,•
limP (A,.)
=P(o)
= 0 .15
All examples of
finite
fields of probability, in the first chapter,
satisfy,
therefore, Axiom
VI.
The
sy
s
t
em
ofAxioms
I - VI
then
proves to be
comi8tent
and
incomplete.
For infinite fields,
onthe
other hand,.
the Axiom
ofContinuity,
VI, p
r
o
v
ed to be independent of Axioms
I
-V. Since the new axiom
is essential for
i
n
fin
i
tefields of probability only,
itis almost
im
possible
toelucidate its empirical meaning, as has
been
done, for
example, in the case
ofAxioms
I - Vin §
2
of the first
chapter.
For,
in describing
any observable random
process we
canobtain
only fi
nit
e fields
of probability.
In
fi
n
i
tefields
of probability occur
only
as
idealized models of
real
random processes.
We limit our
selves, arbitrarily,
toonly tkos.e models which
s
a.ti
s
f
y
Axiom VI.
Thi
s
limitation has been found expedient in researches of the
most diverse so.rt.
GENERALIZED ADDITION THEOREM :
If A1r A2,
• . .,
A,"
. . .and
A
belong to �, then from
A = l: A.
(4)
fl
foUows the equa.ticm
P(A)
=� P (A,.) .
(5)
tl
Proof :
Let
Then, obviously
and, therefore,
according to
Axiom
VI
lim
P (R.)
=0
" -+ 00 •(6)
On
the
otherhand,
by
the additiontheorem
P (A)
=P (At)
+P (Az)
+ . . . +P (A,.)
+P(R.) .
(7)
From ( 6) and
( 7 ) we immediately
obtain
(5) .
We
have shown, then,
tha,t
theprobability
P
(A)
is a, completely
additive setfunction on iJ.
Conversely,
Ax
i
o
msV
and
VI
16 II. Infinite Probability Fields
any field
\}.
* We canp therefore, define the concept of
afield of
probability in the following
way :
Let
E
b
e
an
ar
bit
ra
ryset,
it a
field of subsets of E, containing E,
a,nd
P ( A ) a non--nega.tive com
..pletely additive
set
function
defined
on
if; the field 3' together
with the set function P (A ) forms a field of
probability.
A COVERING
THEOREM
:If A, Au A2,
•,
• ,A", . . . belong to
3 andA
c 6 .A" ,
(8)
"
then
P (.A)
;:a �
P (A,.) .
(9)
"
Proof :
A =
A
6 (An) = A A1
+
A (A1 - A2 A1)
+A (A3 - A3A1 - A3A 1)
+ · · · , "P (A)
=P (AA1)
+P
{
A (
A�
-A2A1)}
+ · · · �P (A1)
+P(A3)
+ · · · .§
2. Borel Fields of Probability>-\
The
field ij
is called a
Borel field,
if all countable sums
X
A
•of
the
sets
A.
from ij belong to
s:.
Borel fields are also called c'bm
pletely
additive systems
of sets.From
the formula6
A,.
=A1
+(A1 - A1 A1)
+
(A3
-
A3A1 - A3A1)
+ · · ·( 1 )
•
we can deduce that
a
Borel field contains also all the sums
@) A"
II
composed
of
acountable number
of setsA"
belonging
toit. From
the formula
� A.
=E -
(5A,.
(2 )
. "
the same
can
besaid for the product
ofsets.
A field
of probability
is
a Borel field o/ probability
if
the
corresponding
field
if is
aBorel
field. Only
in
the case of Borel
fields
of probability
do
we
obtain full freedom of
action,without
danger of
the
occurrence of events having no probability. We
shall now
prove that
wemay limit ourselves
to the investigationof Borel fields of
probability.
Thiswill follow from
theso-called
extension theorem,
to
which we shallnow
turn.
Given a field of
probability
( ij,
P) .
A
s
is
known1, thereexists
asmallest Borel field
Bit
containing
'fl. And we
have the• See, for example, 0. NtxooYM, Su.,. utte generalisation des intig.,.ales de
M. J. Radon, Fund. Math. v. 15, 1930, p. 136.
§ 2. Borel Fields of Probability 17
EXTENSION THEOREM : It is al·ways possible to extend a non
n
egativ
e completelyadditive
set funution P( A ) ,
defined in if,to all sets
of Bi;
without losing either ofits
properties (non negativeness and complete additivity) and this can be done inonly one way.
The
extended field B� forms with the extended set
function P (A )
afield
of probability( BS,
P) .
This field
of prob
abil
it
y( B\}, P)
weshall call
the Borelextemion
of thefield OJ,
P )
The proof of this theorem, which belongs to
thetheory of
additive set
functions and
�hichsometimes appears in other
forms, can be
given
as follows :
Let
A
beany
subsetof
E ; weshall denote
byP*
(A )
thelower
limit of the sums
for all coverings
A
c <S A,.ft
of the set A by a finite or countable number
ofsets An of
ij.
It is
easy to prove that
P*( A ) is then an outer measure in the
Caratheodory
sense2• In accordance with the Covering Theorem
("§ 1 ) , P* ( A ) coincides with
P (A )for all sets of
ij.
Itcan be
further shown that all sets of rf are measurable
inthe
Caratheodorysense. Since all measurable sets form a Borel field, all sets of BW
are consequently measurable. The set function
P* (A ) is,
therefore, completely additive on
BSo,
and on
Bitwe may
set
P (A )
=p•
(A ) .
We have
thusshown
theexistence of the extension. The unique
ness of
thisextension follows immediately from
theminimal
property
ofthe
field BW.
Remark
: Even ifthe sets (events) A
of it can beinterpreted
as actualand (perhaps only
approximately)observable events,
it
does
not, of course,follow
fromthis
thatthe
setsof the extended
field
B'ij
reasonably admit of
suchan interpretation.
Thus
thereis the possibility that
while afield
ofprobability
(tf, P) may be regarded as the image (idealized, however )
of' CAKATHifiODORY, Vorle8ungm ii.be-r reelle Funktionen, pp. 237-258. ( New