Foundations of the Theory of Probability – Kolmogorov

(1)

FOUNDATI<lNS

OF THE

THEORY OF PROBABILITY

BY

A. N. KOLMOGOROV

Second Encli•h Edition

TRANSLATION EDITED BY NATHAN MORRISON

WITH AN ADDED JUBLIOGRAPHY BY A. T. BHARUCHA-REID

'UNIYERSI'IY OF OR .. �GON

CHELSEA PUBLISHING COMPANY

NEW YORK

_•

(2)

LIBRARY OF CO.NGRESS CATAWGUE CARD NUMBER 56-11512

(3)

EDITOR'S NOTE.

In th

e preparation

of

this

English

translation

of Professor

Kolmogorov's fundamental work, the original German monograph

Grundbegriffe der WahrsckeinLichkeitrecknung

which appeared

i

n

the Ergebnisse Der Mathematik _in

1983,

and als_oa

Russian

translation

by G.

M.

Bavli published in 1936 have been used.

It

is

a plea

s

_ure _to

acknowledge the invaluable assistance of

two

friends and

f

o

rmer colleagues, M_r

s

.

Ida Rhodes and

Mr.

D.

V. Varley, and also

of my

niece,

Gizella Gross.

Thanks are also

_{due to}Mr.

Roy

Kuebler

who

made

available

for comparison purposes h

i

s

independent English translation

of

the

o

_r

i

gin

a

l German monograph.

(4)

(5)

PREFACE

The pur_po

s

_�

of this

mo

_n

ograp

h is

to

give

an

axiomatic

foundation

for

the theory of

probability. The

author set

himself

the task of

putti

ng

in their _{natural place, among the}

gen

_e

_ra_l

notions

_of

_{modern mathematics,}

_the

basic

_{concepts of probability}

t

h

e

o

ry-concept

_s

which

_until

recently

_were

co

n

s

i

dere

_d

to be quite peculiar.

This _task

_{would have been}

_{a rather}_hopeless_one_before

_the

introduction of Lebesg

u

e�s

theories

of me

a

sur

e and integration.

However,

after Lebesgue's

publication of

his

investigations,

the

analogies

be

_t

we

en measure

_of

a

set

and probability

of

an

event,

and

between integral of

a

f

unct

i

_o

n and mathematical expectation

of

a random var

i

able,

became apparent.

These analogies

allowed

of further

_extensions

;

thus, for

example, various properties of

inde

p

endent random

variables

were seen

to be in c

o

m

p

lete

analogy

with the corresponding properties of

_o

r

th

ogo

nal

functions.

But

if

·

pr

ob

a

b

i

l

ity

theory

was

to be based on the above analogies, it

still

_was

ne

c

e

_s

s

a

ry _{to make}

the

_o

rie

s

_of

m

e

a

_s

ure and integra tion independe

n

t of _the

geometric elements

_which

were

in

the

foreground

with

Lebesgue. This

_has

been done by

_Frechet.

..

While a

c

on

ce

p

tion of probability

theory

based on

the

above

gener

_a

l

viewpoints has

been cur

r

e

n

t

for some

_time

a

m

on

g

certain

mathematicians,

_t

h

ere

was

lackin

g

a

complete exposition of the

whole

system, free

of

e

_x

tra

n

eous complications.

( Cf.,

h

o

_w

e

v

e

r,

the

book

by Frechet, [2] in

the

bibliography.)

I

wish to

call

attention

_tot

ho

s

_e

points

of

_t

_he

present exposition

which

are

outsid

e the above-mentioned

ra

ng

e

of ideas

familiar

to

t

h

_e

specialist.

_They

are the

following:

Probability

distributions in

infinite-dimensional spaces (Chapter III, § 4)

; differentiation

·and

integration of mathematical

expectations with

respect

to a

parameter (Chapter IV, § 5); and especi

_al

ly

the th

eory of

condi-tional

probabilities

_and

c

o

n

d

it

ional expectations (Chapter V)

.

It should

be

emphasized that

t

h

_e

se new pr_ob

le

ms

arose, of neces

sity,

from

_{some perfectly}

c

on

crete

physical

problems.1

• Cf., e.g., the paper by M. Leontovich quoted in footnote

6

on p. 46; also the

joint paper by the author and M. Leontovich, Zur Sta.tistik dsr kcmtinuier

lichen Sy8teme und des zeitlichen. Ve-rlaufea der ph1Jsikalischen Vorgiinge.

Phys. Jour. of the USSR, _Vol.3, 1933, pp. 36�63.

(6)

vi Preface

The sixth chapter contains a

survey,

without proofs, of

some results of A.

Khinehine

and the author of the limitations

on

the applicability of the ordinary and of the strong law of large num bers. The bibliography CQntains

some

recent

works

_whichshould

be

of interest from the point of view of the foundations

of

the subject.

I

wisq

to express

my warm

thanks to Mr. Khinchine, who has

read

carefully

the

whole manuscript and propos'ed several improvements.

Kljasma near

Moscow,

Easter

1933.

(7)

Pa,ge

EDITOR·s N 'OTE • .. . • • . • • • .

·

• . . . • • • • • I •

�

• • • • a • • • • • • • • • • • .. •

iii

PREFACE . . .. . . .. . . .. . . .

I. ELEMENTARY THEORY OF _PRoBABILITY

v

§

_{1" Axioms}

.

. . . ..

.

. . .

.

. .

.

. .

.

. . ..

.

.. a • • • • • • • • 2 §

2.

The relation _to

experimental

data . . ...

.

. . ....

:. . . 3

§ 3.

Notes

on terminology. _.• • .

. . . 5

§ 4.

Imm

ed

iate corollaries of the axioms; conditional proba-bilities ; Theorem of Bayes . . . . . . . . . . . . . • . . . . . . . . . . 6 § 5.

Independence

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

§

6. C

onditional

probabilities

as random v

a

ria

b

le

s

;

Markov

chains .

. .. .

.

. . . . . . .. . . . . . 4 • • • • ... • • • • • • • • • • • • • • • • • • • 12

D. INFINITE PRoBABU.ITY FIELDS

§ 1. Axiom

of

C

on_tin

u

i

t

y.

. . . . . . . . . . . . . . . . . . .. . . . . .. . . 14

§ 2.

Borel fields of probability .

.

. .· .

.

. . . . . . . . . . . . . . . . . . 16

'·.• § 3. Exampl

e

s of

infi

n

i

t

e fields of probability..

. . . • . . . 18

�

---m. RANDOM v AJUA.BLEs

§

1.

Probability f

un

ctions . . . . . . . . . . . . . . . . . . . . . . . 21

§

2.

Defin

it

ion of random var

i

ables

and

of distribution func.;. tiona . . .

.

. . . I • • I • • • • • • , • • • • • • • • • • • • 22 § 8. Multi-dimensional distribution functions.

. . . . . . . . . . . . . 24

§ 4.

Probabilities _ininfinite-dimensional

spaces.

.

. . . . . . . . 27

§

5. Equivalent random

variable

s

_;various kinds of

converg-ence . . . . . . . . . . . . . 83

IV. MATHEMATICAL EXPECTATIONS

§ 1.

A

bstract

Lebesgue

integrals . . . . . . . . . . . . . . . 37

§ 2.

· Absolute

and conditional mathematical expectations.

.

. . 39

§ 3. The Tchebycheff ineq

u

alit_y. . . . . . • .

. . . 42

§ 4.

Some

criteri

a

_forco

n

v

ergence . . . .·. . . . . . . . . . . . . . . . 43 §

5.

Differentiation and integration of mathematical

expecta-tions with respect to a p

a

rameter.

. . . . . . . . . . . . 44

(8)

V. CONDITIONAL _{PftOBABU..ITIES}AND _MATHEMATICAL EXPECTATIONS

§

1. Conditional probabilities . . . . . . . . . . . . . . . •

47 §

_2. Explanation of a Borel paradox.

. . . . . . . . . 50

§

3.

Conditional probabilities with respect to a random vari-able . . . . . . . . . . . 51

§

4.

Conditional mathematical expectations. . . . . . . . . . 52

../

§1.

§

2. §

_s.

§ 4.

§ 5.

VI. INDEPENDENCE; _{THE LAw}_OF _LARGE_NuMBERS Independence . . . lit • , • .. • • •

57

Independent ran

d

om variables. .

. . . 58

The

_Lawof _{Large N}_urn

hers

.

. . . . . . . . . . . . . . . . . . . . . 61

Notes on the concept of mathematical expectation . .

. . .

.

64

The Strong Law of _Large Numbers; convergence of a

series . . . .. . . .. . . . . .. . . . .. . . .. . . . 66

APPENDIX-Zero-or-one Jaw in

the

theory of p

r

o

babilit

y

.

. . . . 69

BIB.LI.OGRA.PHY . • .. . . . . .. . .. . . . . . . . . • . • .

. . . 73

NOTES TO SUPFLEMENTARY BIBLIOGRAPHY • . • . . . • • . . • , . •

77

(9)

Chapter I

ELEMENTARY

THEORY OF PROBABILITY

We define

as elementary theory of proba

bil

_ity

_that

_part

_of

t

he

theory

in

wh

ic

h

we

h

a

v

e

to

deal

with

probabilities

of

only

a

finite number of e

v

e

_n

_ts. The theorems

wh

i

c

h

we derive here can

be applied also

to

_{the problems connected with an}

_infinite

n

_um

_b

e

_r

of

random events. However, when

the

la

tt

e

r

are

studied, essen

tially

new

principles

ar

e

used. Therefore

the

on

l

_y

_axiom

_of

_the

mathematical theory of

probability which

d

e_al

_{s particularly}

_w

_i

_t

_h

the

ca_s

_e

of

_{an infinite nu}

m

_be

_r

_of

_random

_e

_ve

_n

_ts

_is

_not

_introduced

u

nt

i

l the

beginning

of Chapter

II

(Axiom

VI).

The theory of probability, as

a

mathematical

di

sc

ip

li

_n

_e

, can

and should be

developed

f

r

om

axioms in

e

xact

ly

the same way as

G

eome

tr

y

and Algebra.

This

means

that

after we have defined

the

e

l

_e

m

_e

_n

_ts _to

_be

_studied

_and

_t

_h

_e

i

_{r basic relations,}

_a

_nd

_have

stated

the

axioms

by

which

th

e

se

relations are

to

be

governed,

all

further

e

x

po

s

i

t

i

on

must

be based

exclusively

on

t

_h

es

e

axioms,

i

n

depe

n

d_e_n

_t

_of

_th

_e

_{usual concrete meaning of these}

_el

e_m

_ent

_{s and}

their

relations.

In

accordance with

t

_h

e

_{above, in §}

₁

_t

_h

_{e concept of a}

_{field of}

probabilitie8

is defined

as a

system

of sets which satisfies certain

c

_o

_ndi

t

_i

_o

_n

_s

_.

_{What the elements}

_{of this}

_set

_rep

_res

_e

_n

_t

_is

_{of no}

im p

o

rt

an

ce

in

the purely

m

_a

_th

_ema

_ti

_ca

_l

_deve

_l

_opme

_n

_t

_of

_the

_th

eo_r

_y

of p

ro

ba

b

i

l

i_t

_y

_{( cf. the}

_{introduction of bas}

_i

_c

_{geometric concepts}

in

the Foundatio'TUJ of Geometry

by Hilbert,

or the

definitions of

groups,

rings

and

fields in

abstract

algebra).

Every

a

xi

o

ma_ti

_c

(abstract)

theory

adm

i

ts

,

as is

well known,

of a

n

unlimited

number of concrete interpretations besides those

from

which

it

_was

_{derived. Thus}

_{we find applications in}

_fi

el

_d

s

_of

s

c

ie

n

c

e

which

ha

v

e

no

relation to

the concepts

of random event

and

of

probability in

the precise

meaning

of

these

words.

The

postulational

basis

of

th

e

theory of probability

can

be

established

by

different

m

eth

o

_d

_s

_{in r}

e

_s

p

e

c

t

to

the s

e

l

_e

cti_o_n _of axioms

as

w

el

l _as_in

_the

_s

e

_le

c

ti

o

n

of basic concepts and

relations.

However,

if

our

a

i

m

is

to

a

c_hie

v

e

the

utmost

_s

im_p

l

ici_ty

both

in

(10)

2 I. Elementary Theory of Probablllty

the

system

of axioms and

in _the further development of the theory,

t

h

e

n the

postulational concepts of

a _random event

and

its probability seem the most

suitable. There are other postula tional systems of the

t

heor

y

of probability, particularly

t

h_ose in

which _theco

n

cept _ofprobability is

not

treated as

o

n

e of

the basic concepts, but

is itself expressed

b

y

means of other coneepts.1

However,

in that case, the aim is different,

namely, to

tie up as closely as possible the mathematical

theory w

i

t

h the emp

i

ric

a

l

development of the

theory of

probability.

§

1.

Axiome1

Let E be a c

o

llec

t

i

on

of elements

f,

..,, C, . . .

, which we

shall eall

elementa,ry

ev

e

n

t

s

, and it a set

of subsets of

E

; the

elements of

the set i'f will be called random

events.

I. if

i8

a fielda

of

sets.

II.

tJ

contains

the set

E.

III. To

each set A in i}

_isassigned a non-negative

real number

P(A). This number

P

(

A

)

_i8

ca

l

_le

d the probability

of

the

_event

A. IV. P(E)

equals

1.

V.

If

A

_and

B

_haveno element in common, then

P(A+B) =

P(A)

+

P(B)

A s_yste

m

of

sets, i},

_t

o

g

e

t

h

er with a definite a

s

ignme

n

t

of

numbers P(A), satisfying

Axioms I-V, is

called a

field of prob

ability.

Our

s

y

s

tem

of Axioms

I-V is conBistent. This is proved by

the

following ex

amp

l

e.

Let

E

consist of the single element

�

and let tf consist o

f

E

and the null

set 0. P(E) is then set equal

to

1 and P(O) equals

0.

1 For example, R. von Misea[l]and

[2]

and S. Bernstein [1].

1 The reader who wishes from the outset to give a concrete meaning to the

following axioms, is referred to § 2.

• Cf. HAUSDORFF, Mengenlehre, 1927, p. 78. A system of sets is called a field

if the sum, product, and difference of two sets of the system also belong to the

same system. Every non�empty field contains the null set 0. Using Hausdorff's

notation, _wedesignate _{the product of A}and

B

by AB; .the sum _{br_}A.+ Bin the case where AB = 0; and in the general ease by A+ 8; the difference of

A and B by A-B. The set E-A, which is the complement of A, will be denoted

by A". We _sha_{ll assume that t}he reader is familiar with the fundamental rules of operations of _setsand their sums, products, and differences. All subsets

(11)

§ 2.

The Relation to Experimental Data 3 Our system of axioms is not, however, _complete,for in

various

problems

in the

t

he

o

ry

of

probability different fie

_l

d

s of proba

bility

have to

be examined.

The Construction of Fields of

Probability. The

_{simplest fi}

_e

_l

_ds of probability are _constructedas follows. We ta

ke

an

arbitrary finite

set

E

=

{e1• E1,

• . • ,

E1J

and an a

r

b

i

trar

y

set

{P1• Pa• . • ..•

P.t}

of

non-negative numbers

with

t

he _sum _Pt

+

_{P2 +}. . . + Pt =

1. li is

taken as the

s

e

t _{of all}

s

ub

s

e

ts

in E,

a

n

d

we put

P{E,., �'-'.

· .,

�i.a}

=

p,l +Pit+··· + /Ji,.

In such cases,

PH P2r

• • • ,

p,.

are

called the

probabilities

of

t

he

elementary events

elr

_tz,

.

' e" or simply elem

e

nta

r

y probabili

ties. In this way are derived

all

possible

finite

field

s of

p

r

o

bab

i

lity

in which

0: consists

of

t_he _set

_of

_alls

_ub

s

_e_t

_s

_of

_{E. (The}

_field

_o

_f

probability is called

finite

if the

s

e

t

E is finite.) For

further

examples see

C

hap.

II,

§

3. § 2.

The Relation to Experimental Data�

We apply

the

theory of probability

to the

a

ct

u

al

world

of

exper

i

ment

s

in

the

following manner:

1)

The

r

e

_is

_{assumed a complex}

_of

_conditions,

_e,_which

_allows

of

a

ny number

of repetitions.

2) We

s

tudy a _d

e

fi

ni

_t

_e

_set

_{of events}

_which

_could

_take

_place

_as

a

result of

t

he

_{establishment}

of

_the

_c_o

_n

_di

_ti

_o_n

_s

_e. _Inindividual

cases

where the conditions

are

realized,

the

events occur, gener

ally, in

different ways. Let

E be

the

set

of all

p

o

ss

i

ble variants �1. �� • • •

of

the outcome

of the given events. Some

of

th

es

e

vari

ants mi

g

ht in general not occur. We

i

n

c

l

ud

e

in

s

et

E all the vari

ant

s

which we

regard

a

priori as possible.

3) If

the

variant of the

_e

_v

_e

nt

s

which has

a

ctual

l

y occ

u

r

e

d

4 The reader who is interested in the purely mathematical development of

the theory only, ne_ed not read this section, sinee the work following it is based only upoJ\ the axioms in

§

1 and makes no use of the present discussion. Here we limit ourselves to a simple explanation of how the axioms of the t_h_eo_ryof

probability arose and disregard the deep philosophical dissertation• on the

concept of probability in the experimental world. In establishing the premises

neceasary for the applicability of tbe theory of probability to the world of actual events, the author has used, in large measure, the work of R. v. Mise&,

(12)

I. Elementary Theory of Probability

upon

realization of

conditions

S

belongs to the set

A

(defined

in any

way) , then we

say

that

the

event

A

has taken place.

Example:

Let

the complex

5

of conditions be the tossing

of

a

coin two times.

The

set

of

events mentioned in Paragraph

Z)

con

sists

of

the fact that at

each

t

oss _either

a head or

tail

may come up.

From this

it

follows that

only

four different variants (elementary

events) are possible, namely:

HH, HT, TH, TT. If the

"event A"

co

nn

otes the occurrence of a repetition,

t

hen

_{it will consist of a}

happaning

of

either

of

the

first or

fourth

of

the

four

elementary

events. In this manner, every event may

be

regarded as

a set of elementary

events.

4) Under

ce

rt

a

i

n conditions,

which we shall

not

discuss

here,

we

may assume

that to

an

event

A

which may or

may not occur

under conditions

S, is

assigned a real

number P

(A)

w

hic

h has

the

following

characteristics :

(a}

One

can be p

r

a

c

ti

c

a

_l

l

y certain

tha

t

if

the

compl

e

x of

con

ditions Sis repeated a large number

of

times,

n, then

if

m

be the

number

of occurrences

of

event

A,

the _ratio

rnjn

will differ

very slightly from

P(A).

(b)

If

P (A) is

very

smaUt one

can

be practic

a

lly

certain

that

when conditions

e

are realized

only

once,

the

event

A

would not

occur at all.

The Empirical Deduction of the

Axioms.

In general, one

may

assume

that

the system a of the

observed events

_A,

_{B, C,}. . . to

which

are

assigned definite probabilities, form

a

field containing

as an

e

l

e

_men

t

the

set

E

(Axioms

I,

II,

and

the

first part

of

III,

postulating the

existence of

probabilities).

It

is clear that

O<m/nSl

so that

the second part

of

Axiom

Ill

is

quite

natural.

For the

event E,

m is

always

equal

to

n,

so

that

it

is

natural

to

postulate

P(E)

=

1 (Axiom IV).

If,

finally,

A

and

B

are

non

intersecting

(incompatible),

then

m ==

m1

+

m2

where

m, mh m2

ar

e

respectively

the number of

experiments

in

which the events

A +

B, A,

and

B

occur. From

this _it

follows that

It therefore

seems

appropriate

to postulate

that

P

(A

+

B)

(13)

§ 3. Notes on Terminology 5

Rema,rk 1.

If

two separate statements are

each

practically

re

l

i

a

b

le, then

we

may

sa

y

that

simultaneously

t

h

e

y

are

b

o

t

h reli ..

able, although

t

h

e

degree of reliability is somewhat lowered

in

the

process.

If,

how

e

v

er

,

the number of such

s

ta

tements is very large,

then from

the p

racti

c

a

l reliability

of

each,

one cannot

deduce

any

thing about the

simultaneous correctness of

all of

them. Theref

o

re

from

the principle stated

in

(a) it

does

not

follow

that

in

a

very

large

num

b

er of s

er

ie

s

of

n

tests each, in

eack

the ratio

mjn

will differ only

slightly

from P

(A) .

Remark

2.

To

an

impossible

event (an

empty

set) c

o

rre

..

sponds,

_ina

c

o

r

da

nce with

o

u

r

axioms, the

p

r

ob

abi

l

i

t

y

P(O)

=

05,

burt

t

h

e

converse is

not true:

P(A)

=

0 d

o

e

s

not imply the

im

possibility of

A. When P(A)

=

0, from principle

(b)

a

l

we can

assert

is

that when

the

conditions

₅a

r

e realized but

o

nce,

event

A

is practically impossible. I

t does not

at all

as

s

e

r

t

. however. that

in a sufficiently

l

o

n

g

series of

te

s

t

s

the

event A

w

i

ll not occur. On

the

o

t

h

er

hand, one

can

deduce

from

the prineiple(a)

m

e

r

e

l

y that

when

P(AJ

= 0 and n is very large, the

ratio

m

/

n

will be

very

small

(it

might, f

o

r

example, b

e

equal to

1/n).

§ 3.

Notes on Terminolo8Y

We

have

defined

t

h

e

objects of our f

utur

e study, random

events, as

sets.

However,

in the

theory of probability

many

set

theoretic

concepts

a

re

designated by

o

th

e

r

terms. We

shall give

h

e

r

e a brief list

of

s

uc

h

concepts.

Theory of Sets

1.

A

and

B do

not

intersect, i.e.,

AB

=

0.

2. AB

. . . N = 0.

3.

AB

. . . N

==X.

4. A

+

B

-f-

. . .

+

N =

X.

• ��- S 4. Fonnula (8).

Random

Events

1. Events A and B are

in

com

pa

t

i

b

le.

2.

Eve

n

ts

A, B,

. . . , N are

incompatible.

3.

Event

X is

defined

as the s

i

m

u

l

ta

neou

s

occurrence of

e

vents A,

B�

. . . , N.

4. Event

X is

defined as

the

oc

cu

r

en

ce

of

at least o

ne

of

(14)

6

A.

I. Elementary Theory of Probability

Tkecwy of Sets

5.

The complementary set

6. A=

0.

7.

A =

E. Random

_Even.ts

5. The

o_ppos_i

t

e event A

consisting of

the

non .. oceur ence of event

A.

6.

Event A is impossible.

7.

Event

A must occur.

8. The system

91

of the sets

Ah A2, • • • ,

A,.

forms

a

de

composition

of the set

E

if

At

+ Az ₊ .

.

. +

A.

=

E.

8. E1:Periment

Z

consists

of

determining which of

the

events

A1, Ah

. . .

, A,.

occurs.

We therefore call Ah Az, . . .

,

A,.

the

possible results

of ex·

periment

!I.

(This

assumes t

h

a

t

_the

sets A, do not intersect,in

pairs.)

9. B

is

a _{subset of}

_A:

_Bt=

_A.

9.

From the occurrence of

event

B

follows the _inevitable occurrence of

A.

§ 4. Immediate Corollarie8 of the Axioms; Conditional Probabilities; Theorem of Bayes

From A +

A

=

E

and

the

Axioms

IV

and

V

_i

t follows that

P(A) +

P(A)

= 1

P(A)

=

1-

P(A)

•

Since

E

=

0,

then, in particular,

P(O) =

0 ..

(1)

(2)

(3)

If

A,

B,

. .

.

, N are incompatible, then from Axiom

V

follows

t

h

e

formula

(the Addition Theorem)

P(A

+ B +

. . . +

N)

=

P{A)

+

P(B)

+.I.+

P(N).

(4)

If

P(A) >0, then

_thequotient

p

{B)=

P(AB)

A

_P(A)

( 5 )

is defined

to be the

conditional probability

of the event

B

under

the

condition

A.

(15)

§ 4. Immediate Corollaries of the Axioms 7

P(AB)

=P(A) P.t(B).

(6)

And by induction

we

obtain

the g

e

n

era

l formula

(the Multi-plication Theorem)

P

(At

AI·.·

An)

=

P (AI) P

A1

(A,)

P

A1A1

(A

a

)

• • •

P

A1 At • • • A11-1

{A

..

), (7)

The following th

e

o_r

e

_ms

follow easily :

P.c(B)

>

0,

P.c(E)

=

1,

p

..t(B

+

C)=

_pA

(B)

+

P.a{C).

(8)

(9)

(10)

Comparing

formulae

(8)-(10)

_with

axioms 111-V, we find t

_ha

t

the system

�

of

sets

together

with the

set

function

_P

A(B) (pro

vided

_A

_is

a fixed set),

_forma

field

of

probability and

therefore,

all the

above

general

theorems

concerning

P

(B)

hold true

for

the

conditional probability P

.. (B) (provided

the event

A

is

fixed).

It

is

also

easy

to

see

that

P

..4(A)

= 1.

From

(6)

and

_the

a

n

a_logo

u

_{s fo}r

mu

l

a P

(A

B)

=

P (B)

P8

(A}

we

obtain the

important formula:

(11)

p

(A)= P(A)_P..c(B)

₍₁₂₎

B

_{P(B) ,}

which contains,

in essence, the

Theorem

of

Bayes.

THE THEOREM ON

TOTAL PROBABILITY:

Let

At+

A2 ₊. . . +

A,.=

E (this

assumes that the events

Ah A2,

. . . , A,. are mutually

exclusive)

and

_let

X be

arbitraty.

Then

P(X)

=

I='(A1)P..t.(X)

+

P(A1)PA,(X)

+ · · · +

P(An) P.A"(X)

. .

(13)

Proof:

X =

A1X

+

A%X +

..

. +

A,.X;

using (4)

we

have

P(X)

=

P(A1

X)

-tP(At

X) + ... +

P(A��

X)

and according to

(6) we h

a

v

e

at

the

same time P(AX) =

P

(

A,

)

P

Ar

(X)

•

THE THEOREM

_OF

BAYES: Let

At+ A:z

+ . . .

+A,.= E

and

X be arbitrary,

then

P(Ai)

_p

AI(X)

}

P..r (A,)=

P(Al)

p ..

.

(X)+ P(A.) p

At(X)

+ ...

:

P(A,.)

p

A.(X)'

<14>

(16)

8 I. Elementary Theory of Probability

A1, A2,

• . . ,

An are

often

called "hypotheses"

and formula (14)

is

considered as the probability

Px

(A,)

of the hypothesis

A, after the occurrence of event X.

[P(A,)

then denotes the

a

priori

probability of

A,.]

Proof:

From (12) we have

p

(A·)

=

P(Ai)

p

At(X)

x ' p

(X)

.

To obtain the formula

( 14)

it only remains to substitute for the pr

ob

a

b

ility

P (X)

its value derived from ( 13) by applying the

theorem on

total

probability.

§ 5. Independence

The co_ncept

of

mutual

independence

of two or more experi·

m

e

nts holds, in

a

certain sense, a central posi

tion

in

the theory of probability. Indeed, as we have already seen, the theory of

probability can be regarded from the mathematical poi

n

t of view as a special application of the general theory of a

dd

_itive set func·

tions. One naturally asks, how did it happen that the theory of probability developed into

a

large individual science possessing

its

own methods?

In order to answer this question, we must point out the spe cialization undergone by general problems in the

theory of

addi tive set functions when they are proposed in the theory of

probability.

The fact that our additive

s

_et

function P (A)

is non-negative and satisfies the

condition P (E)

=

1,

does not in itself cause new difficulties. Random variables (see _Chap.

III)

from a mathe

matical point of view represent merely _funct_ions measurable with respect to P(A), _{while their} mathematical expectations are abstract Lebesgue integrals. (This analogy was explained fully for the first time in the work of Frechet6.) _Themere introduction of the above concepts, therefol'e, would not _besufficient to pro duce a basis for the development of a large new

theory.

Historically,

_the independence of experiments and random variables represents the very mathematical concept that has given

the theory of probability its peculiar stamp.

The

classical work or LaPlace,

Poisson,

Tchebychev, Markov, Liapounov, Mises, and

(17)

§ 5. Independenee 9

Bernstein is actually

dedicated to

the

fu

n

d

a

m

e

n

t

a

l investigation of

se

r

i

es of independent random variables. Though the

latest

dissertations (Markov,

B

e

rn

s

te

i

n

and

others)

fr

e

q

u

e

n

tly

fail

to

as

su

m

e

complete

independence,

t

h

e

y

nevertheless _reveal the neces

s

ity

of introducing

analogous,

weaker,

conditions,

in

o

r

d

e

r

to obtain

sufficiently significant

results

(see

in this chapter

§ 6,

M

a

rk

o

v

chains ).

We thus see,in

the concept

of

independence, at least the germ

of the pecul

i

ar typ

e o

f

p

r

o

bl

em

in

p

ro

ba

b

ilit

y

theory.

In this book, however, we shall not stress

that

fact,

fo

r

here we are

in

t

e

r

e

s

t

ed mainly in

the

logical foundation for the specialized

investigations of

the t

h

e

o

ry

_{of probability.}

In consequence, on

e

of

the most important problems in the

ph

i

lo

so

ph

y

of the

natural s_c

i

e

n

ce

s

is-in a

d

iti

_on

to t

h

e well

known

one

regarding

the

essence of

t

h

e concept of proba_bilit

y

itself-to make

pr

e

c

i

se

the

premises

which

would

make

it possible

t

o

regard any

gi

v

e

n

real events as independent. This question,

however, is beyond the scope of

this book.

Let

us

turn to the

de

fi

ni

t

i

o

n of

independence.

Given n experi

m

e

n

ts

11<1>, W<2> � • • • ,

91<">, that is,

n

decompositions

of the

basic set

E.

It is t

he

n possible to

a

s

i

g

n

r

=

r1r2

• • •r" proba

bilities (in

the general case)

Pq,9 • . . . 9,. =

P(A�!)

A�!>

. . .

A��>)�

o

which are entirely a

r

_b

i

t

r

_a

ry

except

f

o

r the single cond

ition 1

that

(1)

DEFINITION

I. n

ex

p

erime

n

ts

�< 1>, U<2>, • • • , !(<•>

a

re

called

mutually independent,

if for an

y

qh qz, . • • , q,. the following

equation

holds

true :

P(AmA<2>

_,, _{,. • . .}

A<">)

_q,. =

P(Au>)

P (A<2>)

P(A<"1)

(2)

• q. IJt • • • f• •

' One may construct a field of/robability with arbitrary probabilities sub

ject only _{to the above-mentione} conditions, as follows: E is composed of.,. elements _{�IU ft}• . . 9,. • Let the corresponding elementary probabilities be

Pfh fa • • • ,,. , an<l finally let

A�

be the set of all �q, , ••• • 9• for which

(18)

10 I. Elementary T�eory of Probability

Among the

r

equations in (2), there are only

_r-r.-rz-. . .

-r,.+

n-1

independent equations8•

THEOREM I. If

n experiments

�:c•), m;<z>, • • •

,

I:C•>

are mutu

ally independent, then any

m of

them

(m<

n).

�<,�J.

�t<'->,

. • .

,

2((i,..�

are

also independent11•

In the case of independence

we then

have the equations:

P(A�:�>A��> ... A��>)= P{A�:l,)

P(A�:·>)

. _{. .}

P(A�::'>)

(8)

(all

i10 must

be

different.)

DEFINITION

II. n

events

A1,

A2,

• • • ,

A,..

are

mutually ifUlepen-

dent,

if

the

deconpositions

(trials)

E

=

_{At +}A�c

(k

_{= 1,}

2,

... , n)

are independent.

In

this

case r1 = rz

= . . . = r,.. =

2,

r =

2"'; therefore,

of the

2"

equations

in

(

2) only 21'l

-

n

-

1

are

independent. The

necessary

and sufficient conditions

for

the independence of the

events

Att Az,

. . .

, A,.

are

the

following 2•

-

n -

1 equations10 :

(4)

m

= i, 2 • . . .• n,

AU

of these equations

are mutually independent.

In

the

c

a

_s

_{e n.}

₌

_{2 we}

_obtain

_{from (4) only}

_{one condition (22}

-2-•

Actually, in the case of independenee, one may choose arbitrarily only r. + r, + . . . + 1,. probabilities p•h == p (Au') so as to comply with the n

conditions ' '�

EP:l

= 1.

9

Therefore, in the general case1 we have r-1 degrees of freedom, but in the case of .independence only .,., -r ,., + . .. + ,. "-n.

' To prove this it is sufficient to show that from the mutual independence

of n decompositions _followsthe mutual independence of the first n-1. Let us assume that the equations (2) _{hold. Then}

P{Au'Ar.!l

_!h _{!Ia • • •}

A1·-••) �P(A111A111

_{'I•- 1}

_A''"')

= � q, ""' • • • Y ..

'"

-P(A'11)P(A121)

_1/1 _{Y2 " ' '}

P(A'"-11) �P(A'"')

_f11- =

P(.A11')P(A121)

P(A••-I))

I """"" f11 91 'It ' " •

fw-I It

f• _Q.E.D.

• SeeS. N. Bernstein [1] p:p. 47-57. However, the reader can easily prove

(19)

§ _5. _Independence 11

1

=

1)

for the

independence of

two

events

A

1

a

n

d

A

:a :

P (A1A1)

₌

P (At ) P (A2) ,

(5 )

The

system

of

equations (2)

r

e

d

u

c

e

s itself, in this case, to three e

q

u

_a

t

i

ons

_,

besides (5) :

P (AtAt)

=

·P (At)

P (A2)

P (AtA,)

₌

P (At)

P (A2)

P (AtA.)

₌

P (At) P (Az)

which obviously

follow from

(

5)

.11

It need h

_ar

d

l

y

be

remarked that from

_the

independence

of

the

events A

1,

A2,

. . • ,

A,.

in.

pairs,

i.e.

from the

relations

it

does not at all follow

that

when

n

> 2 these

events

are inde pendent12. (For

that

we

need

the

_e

x

i

st

e

nce of

all

e

q

u

a

tio

_n

s

(4) . )

In

int

roduc

i

ng

the concept of independence, no

use

was made of

conditional

probability.

Our aim

has been

to

ex

p

l

a

i

n as clearly as possibletin a

p

u

r

e

_l

y mathematical

manner, the meaning of this

concept. Its

applications, however,

generally depend

upon

the properties of certain conditional

_{probabilities.}

If

we a

ss

u

m

e that

all probabilities

P(Av{i>)

are

p

o

s

i

t

i

v

e, then from the equations (3) it

follows13 that

p ,.tfl) �"·· • • • ..t<'--l)

(A�::->)

=

p (A�=)) .

•• " _{,. - l}

( 6 )

From

the

fact that formulas ( 6)

hold,

and

_from

the

Multiplica

tion

_{Theorem (Formula (7) , § 4), follow the}

formulas ( 2) .

We obtain, therefore,

THEOREM

II :

A _necessary

a,nd sufficient condition /or inde

pendence of experiments !J(t), 9£C•>,

• • • , we•>

in the case of

poli-u P(At At) -

P(A 1)

- P(A1 A1) =-P(A1) - P(A 1} P(A1) = P(A])

{t

-

P(.A,)}

-

P(A.J P{A.)

, etc.

sa This can be shown by the _followingsimple example (S. N. Bernstein) :

Let set E be composed o_{f four elements l1 ,}� • • f's , E. ; the corresponding elemen tary probabilities fh, p�, p,, P• are each assumed _{to be}"' and

A

= {Et. ll}.

B - {�t . l3},

C = fet. �c} •

It is easy _to_{compute that}

P (A ) = P(B) = P (C) = %,

P (AB) _c::::P(BC) _{= P}(AC) = "' = ( �) ',

P(ABC)

= "' + ('h )'.

11 To prove it, one must keep in mind the definition of conditional _proba

bility ( Formula (5 ) , § 4) an_{d substitute}for the probabilities of _productsthe

(20)

12 I. Elementary Theory of Probability

tive p1"0babilities

P (A �'l)

is

that the conditional

probability

of

the

results

AqO>

of

experiments

_mw u

n

de

r the hypothesis

that

several

other

tests

vr<M, ¥J(i.>• � • • • w<i.t>

have had definite reBUlts

A

<t:,) A <it> A

<it>

A

<f.t)

i

o

equal to

t

h

e absolute probability

'• ' till ' _r, ' • • • , _D r.o

P(AaCi>).

On the basis of formulas ( 4 ) we

can p

r

o

v

e

in

an analogous

manner

t

h

e

following theorem :

THEOREM III.

If all

probabilities

P (A�:) are positive, then a

necessary and sufficient c

o

nd

i

t

ion

for mutual

independence

of

the events

A11 A:�u

• • • ,

A

.. is

the satisfaction of the equations

p ..c.,, ..t�s . . . A��

(A,)

=

p (.A.)

(7)

for

a,n,y

pairwise

d

i

ff

e

r

en

t

i1, i2, . . . , i�c, i.

In

th

e

case n =

₂

_{the conditions (7) reduce}

_{to t}

_{wo equations :}

p A.

(A.)

=

P(A,) ,

I

(8)

pAs (Al)

=

p (Al)

•

It

is easy to see that

_{the first}

equation in (8) alone is a necessary

a

n

d sufficient condition for the independence

of

A1 and Az pro

vided

P(Al)

>

0. §

_6. Conditional Probahilltiea as Random Variables, Markov

_Chains

Let

I be

a decomposition of

the

fundamental

s

e

t

E :

E

=

A 1

+

A2

+ . . . +

Ar,

and

x

a real

function of

the

elementary e

v

e

n

t

f,

which for every

set AtZ

is

equal to a corresponding c

on

s

t

a

n

t

a.q.

x

is

then

called a

random variable,

and t

h

e

sum

E ( x)

= �ct0P(A0)

f

is

called

the

ma

t

h

e

'lfULt

i

cal

expectation

of the variable x.

The

theory

of

r

a

n

d_{om variables will}

be

developed in Chaps.

III

_a

nd

IV.

We

shall not limit ourselves there merely to

_those

r

an

d

o

m

vari

ables which

can assume

only

a finite number of different v

a

l

u

es.

!'-

random

v

a

r

ia

_b

l

_{e which for}

every set

A11

assumes the value

P

...

(B) ,

we

shall call

the conditional

pr

o

b

a

b

i

li

t

y of the

event B

after the given

experi

men

t

!l

and sh!Ul desig'11Ll.f£

it

by P• (B) . T

w

o

(21)

§ 6. Con_{ditional Probabilities}as Random Variables, Markov Chains ₁₃

q = 1, 2 , .. . , ,.t .

Given

any decompositions (experiments)

�c1>, m<z>, • • •

,

91<">, we

we shall represent by

�(1)�(2)

• • •

i[(ff)

the decomposition of set E into the products

A.,l<t) A�!2> . • •

A o-�!'>

Experiments �(1>, ·2(<2>, • • • , i{Cn)

a

r

e mutually independent when

and

_o

_n

ly _when

Pwu

•(I) • • • 1•• - u

(A�1)

= P

(A�1)

,

k

and

q

being

arbitrary14•

DEFINITION :

The sequence

�[(lJ, �<2>,

• • .

,

W(is), • • • forms

a Markov chain

if for arbitrary

n

and

q

P91m

9fltl • • . 9{1• -n

(A

�1)

=

Pwr• -

11 (A;'1).

Thus, Markov

chains form a natural generalization

of se

quences

of mutually

i

ndepe

nd

e

n_texperiments.

If

we set

Pt

...

,.(m, n)

= PAf•t

(A�111)

m <

_n _,

"• .

then

the basic formula of th

_e

theory of

M

arkov chains will assume

the

form :

If

we denote

the

matrix

JI:Pq

_..

g,.(m, n)Jj

written

as15 :

p(k,n)

=

p ( k,m) p (m,n)

k

< m <

n .

·

( 1 )

by p

(

m,

n) ,

(

1 )

can

be

k

<

m

<

n.

(2)

u The necessity of these conditions follows from Theorem II, § 5 ; that they are also sufficient follows immediately from the Multiplication Theorem

( Formula (7) _of

§

4 ) .

u For further development of the theory of Markov chains, see R. v. Mises

pl,

§ 16, and B. HosTINSKY, Methodes generales du oolcu.l _{des proba.bilites,}

(22)

Chapter II

INFINITE PROBABILITY FIELDS

§ I. Axiom of Continuity

We denote

_{by � Am,}

_as

_is

_{customary, the product of}

_{the sets}

"'

A.,. (whether finit

e or

infinite in number)

a

n

d

_{their sum}

_b_y

<S

Am

•

..

Only in the case-of disjoint

sets A,.

is the

form

I A,.

used _instead

of

6A... Consequently,

"'

�A"' = A1 + A1 -f. · · ·,

IA

..

= Al + A. + · · ·�

fll

"'

In

all _f

_u

_t

_ur

_{e investigations, we}

shall _assume

_{that besides Axioms}

I -

V,

still

a

nothe

r

holds true :

VI.

For a decreasing Bequence of e-ventB

of tj, for

which

XIA,.

= o , "

tke

following equa,tion holds �

lim P (A,.) = 0 .

( 1)

(2)

(3)

In the

future

we

shall designate

by

probability field

o

n

l

y

a

fi

e

ld

_of

_probability

_{as outlined}

_{in the first}

_{chapter, which also}

satisfies

Axiom

VI. The

fi

e

l_d

_s

_{of probability as}

_defined_in_t

_h

_e

_first

chapter without Axiom VI

might

be called generalized fields of probability.

If

the

system tr of

sets

is finite, Axiom

VI

follows

f

r

om Axioms I - V.

For

actually,

in that

case

there

exist

only

a finite number

of different

sets

in t

_h

_e

_{sequence ( 1) . Let A" be the smallest}

among

t

h

e

m,

then

all sets-Alofp coincide with A�c and we obtain then

(23)

§ 1. Axiom of Continuity

Ai

=

At+p

= i:l

A,.

::::: o ,

•

limP (A,.)

=

P(o)

= 0 .

15

All examples of

finite

fields of probability, in the first chapter,

satisfy,

therefore, Axiom

VI. The

sy

s

t

em

of

Axioms

I - VI

then

proves to be

comi8tent

and

incomplete.

For infinite fields,

on

the

other hand,.

the Axiom

of

Continuity,

VI, p

r

o

v

ed to be independent of Axioms

I

-

V. Since the new axiom

is essential for

i

n

fi

n

i

te

fields of probability only,

it

is almost

im

possible

to

elucidate its empirical meaning, as has

been

done, for

example, in the case

of

Axioms

I - V

in §

2 of the first

chapter.

For,

in describing

any observable random

process we

can

obtain

only fi

ni

t

e fields

of probability.

I

n

fi

n

i

te

fields

of probability occur

only

as

idealized models of

real

random processes.

We limit our

selves, arbitrarily,

to

only tkos.e models which

s

a.

ti

s

f

y

Axiom VI.

Thi

s

limitation has been found expedient in researches of the

most diverse so.rt.

GENERALIZED ADDITION THEOREM :

If A1r A2,

• . .

,

A,"

. . .

and

A

belong to �, then from

A = l: A.

₍₄₎

fl

foUows the equa.ticm

P(A)

=

� P (A,.) .

(5)

tl

Proof :

Let

Then, obviously

and, therefore,

according to

Axiom

VI

lim

P (R.)

=

0

" -+ 00 •

(6)

On

the

other

hand,

by

the addition

theorem

P (A)

=

P (At)

+

P (Az)

+ . . . +

P (A,.)

+

P(R.) .

(7)

From ( 6) and

( 7 ) we immediately

obtain

(5) .

We

have shown, then,

tha,t

the

probability

P

(A)

is a, com

pletely

additive set

function on iJ.

Conversely,

Ax

i

o

ms

V

and

VI

(24)

16 II. Infinite Probability Fields

any field

\}.

_{* We canp therefore, define the concept of}

a

_{field of}

probability in the following

way :

Let

E

b

e

an

a

r

bit

r

a

ry

set,

it a

field of subsets of E, containing E,

a,nd

P ( A ) a non--nega.tive com

..

pletely additive

set

function

defined

on

if; the field 3' together

with the set function P (A ) forms a field of

probability.

A _COVERING

THEOREM

:

If A, Au A2,

•

,

• ,

A", . . . belong to

3 and

A

c 6 .A" ,

(8)

"

then

P (.A)

;:a �

P (A,.) .

₍₉₎

"

Proof :

A =

A

6 (An) = A A1

+

_{A (A1 - A2 A1)}

₊

_{A (A3 - A3A1 - A3A 1)}

₊· · · , "

P (A)

=

P (AA1)

+

P

{

A (

A

�

-

A2A1)}

+ · · · �

P (A1)

+

P(A3)

+ · · · .

§

2. Borel Fields of Probability

>-\

The

field ij

_{is called a}

Borel field,

_{if all countable sums}

X

_A

•

of

the

_sets

A. _{from ij belong to}

s:.

_{Borel fields are also called c'bm}

pletely

_{additive systems}

of sets.

From

the formula

6

A,.

=

A1

+

(A1 - A1 A1)

+

(A3

-

A3A1 - A3A1)

+ · · ·

( 1 )

•

we can deduce that

a

_{Borel field contains also all the sums}

@) A"

II

composed

_of

a

_{countable number}

_{of sets}

A"

_belonging

_to

_{it. From}

the formula

� A.

=

E -

(5

A,.

(2 )

. "

the same

_can

be

_{said for the product}

of

_sets.

A field

of probability

is

a Borel field o/ probability

if

the

corresponding

field

if is

a

Borel

field. Only

in

_{the case of Borel}

fields

of probability

_do

we

_{obtain full freedom of}

action,

without

danger of

the

_{occurrence of events having no probability. We}

shall now

prove that

we

_{may limit ourselves}

to the investigation

of Borel fields of

probability.

This

_{will follow from}

_the

_so-called

extension theorem,

to

which we shall

_now

turn.

Given a field of

probability

( ij,

P) .

A

s

is

known1, there

exists

a

_{smallest Borel field}

Bit

_containing

'fl. And we

have the

• See, for example, 0. NtxooYM, Su.,. utte generalisation des intig.,.ales de

M. J. Radon, Fund. Math. v. 15, 1930, p. 136.

(25)

§ 2. Borel Fields of Probability 17

EXTENSION THEOREM _:It is al·ways possible to extend a non

n

egati

v

e completely

additive

set funution P

( A ) ,

defined in if,

to all sets

of Bi;

without losing either of

its

properties (non negativeness and complete additivity) and this can be done in

only one way.

The

extended field B� forms with the extended set

_func

tion P (A )

a

field

of probability

( BS,

P) .

This field

_{of p}

rob

a_bi

l

_i

t

_y

( B\}, P)

we

shall call

the Borel

extemion

of the

field OJ,

P )

The proof of this theorem, which belongs to

the

theory of

additive set

functions and

�hich

sometimes appears in other

forms, can be

given

as follows :

Let

A

_be

any

subset

of

_{E ;}we

shall denote

by

P*

(A )

the

lower

limit of the sums

for all coverings

A

c <S A,.

ft

of the set A by a finite or countable number

of

sets An of

ij.

It is

easy to prove that

P*

( A ) is then an outer measure in the

Caratheodory

sense2• In accordance with the Covering Theorem

("§ 1 ) , P* ( A ) coincides with

P (A )

for all sets of

ij.

It

can be

fur

ther shown that all sets of rf are measurable

in

the

Caratheodory

sense. Since all measurable sets form a Borel field, all sets of BW

are consequently measurable. The set function

P* (A ) i

s,

there

fore, completely additive on

BSo,

and on

Bit

we may

set

P (A )

=

p•

(A ) .

We have

thus

shown

the

existence of the extension. The unique

ness of

this

extension follows immediately from

the

minimal

property

of

the

field BW.

Remark

: Even if

the sets (events) A

of it can be

interpreted

as actual

and (perhaps only

approximately)

observable events,

it

does

not, of _course,

follow

from

this

that

the

_sets

of the extended

field

B'ij

reasonably admit of

such

an interpretation.

Thus

there

is the possibility that

while a

field

of

probability

(tf, P) may be regarded as the image (idealized, however )

of

' CAKATHifiODORY, Vorle8ungm _ii.be-r reelle Funktionen, pp. 237-258. ( New