EL EGIPTO ANTIGUO
E L TEMPLO DE D ENDERAH Y EL PANTEÓN EGIPCIO
Modeling by a stochastic process and estimating the parameters of the process are common problems. If the likelihood function of the process is known, the ML method is generally used. The LS methodf including the weighted least-squares method, the conditional least-squares method and so on) is another method which is widely used especially for prediction problems. In a sense, these two methods come from some basic idea such as the likelihood principle or the best prediction idea of minimizing the residual sum of squares. But, from the view-point of estimating function theory, these two methods are just methods for choosing an estimating function from some estimating function space under some specified rules. For the ML method, the rule is the maximum likelihood principle; for the LS method, there are two rules: one is the minimum residual sum of squares and the other is the form of the estimating function. We also can say that the ML method is a method for choosing an estimat ing function from the whole estimating function space under the likelihood principle and the LS method is a method for choosing an estimating function from a subset of estimating function space which has minimum residual sum of squares.
Why can the ML method choose an estimating function from the whole esti mating function space but the LS method only choose an estimating function from a subset of an estimating function space? Because, when using the ML method ,
we should know the likelihood function of the process, which means that we obtain all the information about the process. If we do not know the likelihood function of the process, we can only choose an estimating function from some subset of an estimating function space. The subset of the estimating function space depends on the information which we obtain from the process and experiments. For example, if we consider the LS method for fitting a model yt — f ( x 3, s < t) + et and we know nothing about the moments and mixed moments of {xa} for order k > 2, then only the set of linear estimating functions can be considered.
Briefly, using an estimation method is equivalent to finding an estimating func tion from some subset of an estimating function space under some rules. So, the estimation method is determined by the subset of the estimating function space and the rules which we consider.
Also, we can say that the likelihood score function and the least squares esti mating function are, respectively, w optimal” estimating functions with reference to some appropriate subset of estimating functions. Following these ideas, we here give a general definition for the quasi-score estimating function.
Suppose that $7 is a subset of estimating function space in which T is the sample size, 0 is a parameter space which is a subset of R? and A is a “ matrix functional” on tyj <8>0 such that A : $7 ® 0 9 G ® 9 1— ►Rpxp and ( A ( G r , 9)' A(Gt, 9))~l are
nonsingular for G j € ^ 7 , 0 6 0 . We call (^ 7 ,0 , A) an estimating function space. Definition: We say that G j is a quasi-score estimating function in ('F ,0 , A), if, for all G j € ^t and 9 6 0
A'{G't,9)A(Gt,9) > A'(Gt,9)A(Gt,9).
We denote 9j as a quasi-likelihood estimator which satisfies
Gj(9j) = 0.
Example 1: Suppose that X = {A"t } is a process and, for every T fixed, {Xt}t<7 belongs to an exponential family. Let ^7 = {Gj = G j { X t , 0 < t < 71,0)}, in which
EGj(9) = 0 for each Pe 6 V with index 9 , {Gt,&t} are martingales for each
Pe € P , G j are almost surely differentiable with respect to the components of 9
and EGj{9) = (E{dGj^(9)/d9t)) and E(Gj{9)Gj'(9)) are nonsingular. Let A be a
matrix functional defined by
A : tyj 0 0 3 G j 0 6 I—* [EqGjGj1 ) 1 ^ E$ Gj .
Then there is a quasi-score estimating function Gj in ( $ , 0 , A). In fact, Gj is the likelihood score function.
Example 2: Let us consider an AR model. Assume that eT = Hj(yt ~ (ß\xt-\ + • • - + ßkXt-k)) is a martingale in which E(et\Tt-i) = 0 and E{e2t \Tt-\) = c, c being a constant and T t - a( xi, . . . , Let = {Gj = E i ft(0)et\ { f t{9)}
is a predictable process with respect to {Tt}} and A be a matrix functional defined by
A : tyj ® 0 9 G j 0 0 i—* (EqGjGj') EqGj. Then there is a quasi-score estimating function G j in ('J'j,© , A) and
Cr = : £
\ — X t - \
\ Xt~k )
(yt - {ß\xt - 1 + . . . + ßkXt-k))-
The only difference between Gj and the least-squares estimating function is the constant coefficient.
Example 3: (Godambe and Heyde(1987)) Let Qj be a subspace of $ in which every element G j is almost surely differentiable with respect to the components of
9 and EGj (9) and E( Gj(9)Gj(9)) are nonsingular. Let fj,\ denote the subset of Qj
consisting of square integrable martingales with respect to {Tt}- Let A be a matrix defined by
A : x 0 3 Gj 0 9 I—► < G{9) > j ^ Gj {9),
where Gj{9) = / 0T E(dG3(9)\Ts-)- Then the quasi-score estimating function in (/ii,0 , A) is same as the quasi-score estimating function under Criterion 4 in Go dambe and Heyde(1987).
Therefore, following the extending definition of the quasi-score estimating func tion the quasi-likelihood method is a method to choose an estimating function from 'Ft under the rules:
(1) A is a given matrix functional,
(2) A'(G*T,Q)A(GmT,0) > A'(Gt,0)A{Gt,0) for all GT € 0 € 0 .
We can also say that G j is an optimal estimating function in But “ optimal” has its full meaning only when the following facts are true:
(1) 6j- 9o, where 0o is the true parameter,
(2) \\A(Gj,90)(6j — 6>0)||2 £(X ), for some proper r.v. X.
Following the discussion in Chapter 5, an (1 - a) x 100% percent asymptotic confi dence zone for 0o will be determined by
P m - 0 o ) m - OoY < ea( A \ G j 190)A(Gj, 00))-1),
where P ( X < eQ) = 1 — a. If there is another estimating function G j € ^ such that,
0T C-^ 6>,
where 9j satisfies Gt{0t) = 0 and
\\A(Gt,0')(8t - 0 , ) f ± C(X),
then A ( G j , 0 o) will provide smaller asymptotic confidence zones for 0o than those provided by j4(Gt,#0)- In this sense the A'(Gj, 0)A(Gj, 0) is playing the role of Fisher Information .
Here we have discussed three kinds of optimal estimating functions. There are also others. However it is difficult to find an universal optimal estimating function except the likelihood score functionfin practice the likelihood score function may work as a universal optimal estimating function if it can be written down). In general, the optimal estimating function property is only a local property or one relative to ($ , 0 , (rules)). The $ is determined by the kind of model we want to fit to the data and the kind of information we can obtain from the history. The rule is
determined by the kinds of criteria we choose for optimality and the kinds of tools we use.