Applied Mathematics Letters
PERGAMON Applied Mathematics Letters 12 (1999) 31-36
T e s t i n g S t a t i o n a r y D i s t r i b u t i o n s of M a r k o v C h a i n s B a s e d on R a o ' s D i v e r g e n c e
M. C.
PARDOUniversity School of Statistics
Cornplutense University of Madrid, 28040-Madrid, Spain (Received and accepted January 1998)
Communicated by R. Ahlswede
Abstract--Statistical inference problems such as the estimation of parameters and testing com- posite hypothesis about stationary distributions in the set of states of Markov chains are solved. Both, the estimator and the statistic proposed are based on Rao's divergence. T h e asymptotic properties of the estimator and the critical values of asymptotically -/-level tests are obtained. ~) 1998 Elsevier Science Ltd. All rights reserved.
K e y w o r d s m M a r k o v chain, Rao's divergence, Minimum R6-divergence estimate, Goodness-of-fit tests, Re-statistic.
1. I N T R O D U C T I O N
We consider a stationary irreducible aperiodic Markov chain X = ( X 0 , X l , . . . ) with a state space { 1 , . . . , m}. B y P = (Pij)i,"~=l, we denote the m a t r i x of transition probabilities of this chain and by p = ( P x , . . . , Pro) t the stationary distribution, i.e., solution of the equation p = p t p . We assume t h a t P is from the class P of irreducible aperiodic stochastic matrices with one ergodic class. The irreducibility means t h a t there are no transient states, i.e., no 1 < i < m with pi = 0. Therefore, p belongs to the set A m = {(PI,.. ,Pro) t I • )-~i=lPi m = 1,pi ~> 0, i = 1 , . . . , m } .
To solve the problem of testing the hypothesis H0 : p = lr0 E T where T C Am is simply an one point set, Tavard and Altham [1] found the asymptotic distribution of the Pearson's chi-square statistic for dependent data. This statistic measures the discrepancy between the observed pro- portions and the hypothesized proportions. If the discrepancy is "too large", the null hypothesis is rejected. T h e key is the choice of a good test statistic to measure the discrepancy between the observed proportions and the hypothesized proportions. Every divergence measures this discrep- ancy. In fact, Mendndez et al. [2] proposed a family of statistics based on Csisz~r divergence [3]
to solve this problem and Pardo [4] a family based on B u r b e a and Rao's divergence [5]. This last family is defined as
Re ~ n , r0) = i ~ x'= ¢ . i +2 r0i - 2 i=l ¢(Pni) + i=l ¢(71"0/ '
(1)
This work was supported by Grants DGICYT PB96-0635 and PR156/97-7159.
0893-9659/98/S--see front matter © 1998 Elsevier Science Ltd. All rights reserved.
PII: S0893-9659(98)00122°0 Typeset by ~A4S-I~_~X
32 M . C . PARDO
where ¢ : (0, cx)) --, R is a continuous concave function, ¢(0) = lime(t) e ( - c o , oo] and
1 " I(~)(xk), I(~)(xk)
is the observed cell frequencies in the d a t a Xn = ( X 1 , . . . ,
Xn)
being I the indicator function.This family has also been used by Pardo [6,7] for testing goodness-of-fit for independent ob- servations under classical (fixed-cells) and sparseness assumptions, respectively. Some properties of this family of divergences can be seen in Burbea and Rao [5] and Pardo and Vajda [8].
Otherwise, the null hypothesis is called composite and specifies lr to be a function of some fewer number of unknown parameters (i.e., ~r lies in the subset T of Am) which need to be estimated from the experimental d a t a Xn. Mendndez et al. [9] study a family based on Csisz~r's divergence under composite hypothesis.
In Section 3, we consider the composite hypothesis Ho : p = 7to, where r0 = q(8) = ( q 1 ( 8 ) , . . . , qra(8)) t E T C Am being 8 = ( 8 1 , . . . , 8 s ) t E 0 C_ R 8 the unspecified parameters vector. For every parameter 8 E O, we denote by P s the sets of all matrices P E P such t h a t their stationary distribution p coincides with q(8). This goodness-of-fit test requires us to estimate the unspecified parameters, i.e., to choose one value q(0) E T t h a t is "most consistent" with the d a t a Xn = ( X x , . . . , X n ) about the states. This last problem is solved in Section 2.
2. T H E M I N I M U M R e - D I V E R G E N C E E S T I M A T O R
In this section, we study the estimation problem. Throughout, we assume t h a t the true chain parameter is 8 ° E O. This means t h a t the true chain distribution is specified by the initial distribution q(8 °) and by a transition matrix P(8 °) E P0o. The most well-known method to choose q(~) consists of estimating 8 by maximum likelihood, but another sensible way to estimate 7r0 is to choose the q(0) E T t h a t it is closed to Pn with respect to the measure R¢(~n, q(8)). This leads to the minimum R~-divergence estimate defined as a ~'¢ E e t h a t verifies
Re (Pn,q
(0"¢)) = inf R~ (/~n,q(8)) • BEOLet us introduce the following regularity conditions before studying the asymptotic properties of this estimator:
(A1) q : O --* A m is continuously differentiable in a neighborhood of 8 ° and q(8) - q (8 °) = Jo (8 - 8 °) ÷ o (II e - 8 ° II), for 8 -~ 8 %
where Jo =
J(8 °)
= (Jjr(8°)) is the Jacobian matrix being Jjr(o) = aq#(o)aSr ;
(A2) A~Ao is positive definite for
Ao -- diag (x/-¢" (ql (8°)), • • •, ~/-¢"
(qm
(8°))) Jo.Hereafter, we consider the matrix
Bo: diag (q (e0)))O0di (q
Testing Stationary Distributions 33 where
fro = DoCo + CtDo - Do - q(O°)q(O°) t,
being Do = diag(q(0°)) and Co = (diag(1) -p(o o) +
lq(0O)t)-1, with 1 the column vector of m units, is the asymptotic covariance matrix of the asymptotically normal zero mean random vector(P'nl - ql (/90), ...
, P n m - qm
(/~o)) (c.f. [10] or [1, equation (2.2)]). Put for brevityA0 = A0 (A~Ao) -x , E0 = AoA~.
The following theorem summarizes the properties of minimum Re-divergence estimators of parameters of stationary distributions of Markov chains. Other similar results for maximum likelihood and other estimators with independent observations can be seen in [11-15].
THEOREM 1. Let ¢ : (0, ~ ) --* R be a
twice continuously differentiable concave function. Under
theabove regularity conditions
and assumingthat the function q : {9 ~ A m has continuous second partial derivatives in a neighborhood of 8 °, we have
that~V = 8 ° + A~ diag ( x / - ¢ " (q (80))) (ff~ - q (/9°)) +o ( 1 1 ~ - q (8°) II), where 0¢ is
unique
in aneighborhood
of 8 °.PROOF. From the proof of Theorem 1 of [15], there exits a m-dimensional neighborhood Uo of q(8O) in
R m
and a unique, continuously differentiable functionO: Uo ~ R s
such thatO R , ( P , O ( P ) )
= 0 ,j = l , . . . , s
and
O(P) = 8 ° + ( A ( O ° ) t A ( O ° ) ) -1 A ( O ° ) t d i a g ( v / - ¢ " ( q ( O O ) ) ) (P-q(O°))+ o ( l l P - q ( 0 ° ) l t ) ,
for all P E U0. Now then by the strong law of large numbers holding for the chains under consideration (cf. [10]) ff c.% q(/9o) ' so Pn E U0, and consequently, ~(Pn) is the minimum Re-divergence estimator, ~V, that satisfies the following:o~ ( ~ ) = 8 0 + (A (80) ` A (8 ° ) ) - I A (80) `
d i a g ( v / - ¢ " ( q ( / ? ° ) ) )
(~-q(O°))+o(ll~-q(O°)H). !
THEOREM 2. Under
Theorem 1 conditions, we have that
C a) v~(~¢ - 0 °) ~ N(0, A~BoAo);(b) v~(q(~¢) -
q( O°) ) ,,~ Y ( O, diag( v / - ¢ " ( q( O° ) ) ) EtoBoEodiag( v / - ¢ " ( q( O° ) ) ) ).
PROOF.
(a) From above, we know that
v ~ ( ~ - q (0°)) ' c.s.Y (0, ~o),
and consequently,vrnA~diag ( ~ / - ¢ ' ' ( q ( 0 ° ) ) )
( P n - q ( O ° ) ) n--,ooL
N(0, E),34 M . C . PARDO
where
---- A~diag ( % / - ¢ "
(q(O0)))~0
diag ( X / - ¢ "(q(O°))) AO.
So the result follows from Theorem 1.
(b) By (A1)
o°_oo )
Therefore,
R E M A R K i. The matrix no, and consequently the matrices Bo, Ao, and Zo figuring in Theorems 1 and 2, are known only if
P(O °)
6 P0o is specified. If this is not the case and the values of these matrices are needed to obtain confidence intervals or critical regions of statistical tests, then we can estimate the matrices Bo, Ao, and ~,o consistently by replacing the unknown elementspij(O °)
of
P(8 °)
in flo by the relative frequenciesI(id)
(Xk-1, Xk)
P n i j : k=2 n
E *(~) (xk_ 1)
k=2
as consistent estimates of elements
po(O °)
of the matrixP(O °)
(c.f. [10]).3. C O M P O S I T E N U L L H Y P O T H E S I S
In this section, we consider statistical tests of composite hypothesis introduced in Section 1 using the Re-divergence statistics (1). Assumptions (A1) and (A2) of Section 2 are supposed to be fulfilled.
First, it is necessary to obtain the asymptotic distribution of
R¢(~n, q(O))
under H0, where ion =(Pnl,... ,Prim) t
is the relative frequencies observed in the data Sn andq(O) = (ql(O),..., qm(O)) t
being 0 the maximum likelihood or minimum Re-divergence estimator.
THEOREM 3. Let ~ : (0, co) --* R be a twice continuously differentiable concave function. Let Pn be the relative frequencies vector, q : O --* Am a function with continuous second partial deriva- tives in a neighborhood of 0 ° and ~¢. = q(0¢.), then we have that
m
n - - - ~ O O
i=1
where the X 2 are independents and the Pi are the eigenvalues of the matrix
Lo = diag (-~" (q (o°))) r~
being
PROOF. By Lemma 1 in [8], on being Pn and ~'¢. V~-consistent estimates, we have that 8 ~ R , ( ~ . , ~ , . ) ~ ~ ( ~ . - q , . ) dlag ^ ~ " ( - ¢ " (q (0°))) ( ~ . ~ , . ) .
From Theorem 2
vfn (~',.-q(O°)) = x/'nJoA~ diag (X/-¢*" (q (0°))) (Pn - q(O°)),
Testing Stationary Distributions 35 8o
v < - ) = v < - q (O°) ) + (q (O °) - )
vCn(I - J o A t o d i a g ( x / - ¢ " ( q ( O ° ) ) ) ) ( ~ n - q ( O ° ) ) .
Consistently,v ~ ( P n - q'¢') L N ( 0 , E1)
so
8nR¢(~n,
~'¢*) is a s y m p t o t i c a l l y d i s t r i b u t e d as )-~m__ iP~X~
w h e r e t h e X 2 are i n d e p e n d e n t s a n d t h e p~ are t h e eigenvalues of t h e m a t r i x L0.REMARK 2. Small values of T =
8nR~(~n,~¢.)
s u p p o r t H0 b u t large values are not. Hence for large n, H o s h o u l d be rejected a t a level 7 if T > t~ where t~ is t h e u p p e r ~-quantile of t h e d i s t r i b u t i o n of ~-~im 1PIXY.
T h i s quantile can be a p p r o x i m a t e d b y t h e c o r r e s p o n d i n g quantile of t h e d i s t r i b u t i o n of AX2m w h e r e A is d e t e r m i n e d so t h a t ~-~m=lP~X~
a n d AX 2 have t h e s a m e e x p e c t e d values, t h a t isA = (~im=l p j m ) .
See, for instance, S o t zet al.
[16] a n d R a o a n d Scott [17].In this case, t~ is a p p r o x i m a t e d b y A t i m e s t h e u p p e r "y-quantile of t h e chi-square d i s t r i b u t i o n w i t h m degrees of freedom, t h a t is t~ =
AXm,~.
2 However, t h e variance of AX2m is smaller t h a nm 2
or equal t o t h e v a r i a n c e of ~-~i=l
P~X1
w i t h equality if a n d only if all eigenvalues p~ are equal.Following S a t t e r t h w a i t e [18] or Scheffd [19], we can a p p r o x i m a t e t h e d i s t r i b u t i o n of ~ - ~ 1
P~X~
by t h e d i s t r i b u t i o n A(1 +
a2)x2v
w h e r e a a n d v are d e t e r m i n e d so t h a t t h e two d i s t r i b u t i o n s have the s a m e e x p e c t e d value a n d t h e s a m e variance. T h i s leads tov - a n d a 2 = i----1
i=l or e q u i v a l e n t l y
(tr (L°))2 A(1 + a 2) t r (L2) u = t r (L 2) a n d = t r (Lo---ff"
I n this case, we consider t~ = A(1 +
a2)x~,.r
REMARK 3. T h e eigenvalues P x , . . . ,
Pm
d e p e n d n o t o n l y on t h e u n k n o w n chain t r a n s i t i o n m a - t r i xP(O°),
b u t also on t h e u n k n o w n s t a t i o n a r y d i s t r i b u t i o np(O°).
R e p l a c i n g t h e m a t r i x b y t h e consistent e s t i m a t ePno
defined in R e m a r k 1 a n dp(O °)
b y t h e consistent e s t i m a t e ff~, we o b t a i n an e s t i m a t e L n o f t h e m a t r i x Lo. Similarly as in R e m a r k 1, we c a n a r g u e t h a t t h e eigenvaluesP a l , . . . , Prim
of L n are consistent e s t i m a t e s of t h e eigenvalues P l , . • . , pro. T h u s , t h e critical values are o b t a i n e d b y replacing t h e u n k n o w n eigenvalues Pl . . . pm b y their e s t i m a t e s p ~ l , . . . , p~m.R E F E R E N C E S
1. S. Tavar6 and P.M.E. Altham, Serial dependence of observations leading to contingency tables, and corrections to chi-squared statistics,
Biometrika "tO,
139-144, (1983).2. M.L. Mendndez, D. Morales, L. Pardo and I. Vajda, Testing in stationary models based on f-divergences of observed and theoretical frequencies,
Kybernetika
33 (5), 465-475, (1997).3. I. Csisz~ir, Eine Inforrnationtheoretische Ungleichung und ihre Anwendung auf den Bewis der Ergodizit~it von Maxkhoffschen Ketten,
Publ. Math. Inst. Hungar. Acad. Sci.
Ser. A, 8, 85-108, (1963).4. M.C. Paxdo, Goodness-of-fit tests for stationary distributions of Markov chains based on Rao's divergence,
Information Sciences,
(1998).5. J. Burbea and C.R. Rao, On the convexity of some divergence measures based on entropy functions,
IEEE Transactions on Information Theory
28, 489--495, (1982).6. M.C. Pardo, On Burbea-Rao divergences baaed goodne~-of-fit tests for multinomial models (to appear).
7. M.C. Pardo, Goodness-of-fit tests based on Rao's divergence under sparseness assumptions (to appear).
8. M.C. Paxdo and I. Vajda, About distances of discrete distributions satisfying the data proccesing theorem of information theory,
Trans. IEEE on Inform. Theory
43 (4), 1288-1293, (1997).36 M . C . P A R D O
9. M.L. Mendndes, D. Morales, L. Pardo and I. Vajda, Inference about stationary distributions of Markov chains based on divergences with observed frequencies, Probability and Mathematical Statistics (to appear).
10. P. Billingsley, Statistical methods in Markov chains, Ann. Math. Statist. 32, 12-40, (1961).
11. M.W. Birch, A new proof of the Pearson-Fisher theorem, Annals of Mathematical Statistics 35, 817-824, (1964).
12. Y.M.M. Bishop, S.E. Fienberg and P.W. Holland, Discrete Multivariate Analysis Theory and Practice, The MIT Press, Cambridge, MA, (1975).
13. T.R.C. Read and N. Cressie, Goodness of Fit Statistics for Discrete Multivariate Data, Springer, New York, (1988).
14. D. Morales, L. Pardo and I. Vajda, Asymptotic divergence of estimates of discrete distributions, Journal of Statistical Planning and Inference 48, 347-369, (1995).
15. M.C. Paxdo, Asymptotic behaviour of an estimator based on Rao's divergence, Kybernetika 33 (5), 489-504, (1997).
16. S. Kotz, N.M. Johnson and D.W. Boid, Series representation of quadratic forms in normal variables, I. Central case AMS, 823-837, (1967).
17. J.N.K. Rao and A.J. Scott, The analysis of categorical data from complex sample surveys: Chi-squared tests for goodness of fit and independence in two-way tables, J. Amer. Stat. Assoc. 7tl, 221-230, (1981).
18. F.E. Satterthwalte, An aproximate distribution of estimates of variance components, Biometrics 2, 110--114, (194e).
19. H. Scheffd, The Analysis of Variance, Wiley, (1959).