• No se han encontrado resultados

INGRESOS POR CONTRATOS DE CONSTRUCCIÓN Y OTROS INGRESOS ORDINARIOS

For fully parametric models such as Poisson and negative binomial maximum likelihood, a crude diagnostic is to compare fitted probabilities with actual frequencies, where the fitted frequency distribution is computed as the average over observations of the predicted probabilities fitted for each count.

Suppose the count yitakes values 0, 1, . . . , m where m = maxi(yi). Let the

observed frequencies (i.e., the fraction of the sample with y = j) be denoted by ¯

pjand the corresponding fitted frequencies be denoted ˆpj, j = 0, . . . , m. For the

Poisson, for example, ˆpj = n−1

n

i=1exp(− ˆµi) ˆµij/j!. Comparison of ˆpjwith

¯

pj can be useful in displaying poor performance of a model, in highlighting

ranges of the counts for which the model has a tendency to underpredict or overpredict, and for allowing a simple comparison of the predictive performance of competing models. Without doing a formal test, however, it is not clear when ˆpjis “close” enough to ¯pj for one to conclude that the model is a good

one.

Formal comparison of ˆpjand ¯pjcan be done using aCMtest. We consider a

slightly more general framework than the above, where the range of y is broken into J mutually exclusive cells, where each cell may include more than one value of y and the J cells span all possible values of y. For example, in data where only low values are observed, the cells may be{0}, {1}, {2, 3} and {4, 5, . . .}. Let di j(yi) be an indicator variable with di j= 1 if yifalls in the jthset and di j= 0

otherwise. Let pi j(xi, θ) denote the predicted probability that observation i falls

in the jthset, where to begin with we assume the parameter vectorθ is known.

Consider testing whether di j(yi) is centered around pi j(xi, θ),

E[di j(yi)− pi j(xi, θ)] = 0, j = 1, . . . , J, (5.30)

or stacking all J moments in obvious vector notation

E[di(yi)− pi(xi, θ)] = 0. (5.31)

This hypothesis can be tested by testing the closeness to zero of the correspond- ing sample moment

m( ˆθ) =

n

i=1

(di(yi)− pi(xi, ˆθ)). (5.32)

This is clearly aCMtest, presented in section 2.6.3. TheCMtest statistic is

Tχ2= m( ˆθ)Vˆ−mm( ˆθ), (5.33)

where ˆVm is a consistent estimate of Vm, the asymptotic variance matrix of

m( ˆθ), and ˆVmis the Moore-Penrose generalized inverse of ˆVm. The generalized

inverse is used because the J × J matrix Vm may not be of full rank. Under

the null hypothesis that the density is correctly specified, that is, that pi j(xi, θ)

gives the correct probabilities, the test statistic is chi-square distributed with

rank[ ˆVm] degrees of freedom.

The results in section 2.6.3 can be used to obtain Vm, which here usually

has rank[Vm] = J − 1 rather than J as a consequence of the probabilities

over all J cells summing to one. This entails considerable algebra, and it is easiest to instead use the asymptotically equivalentOPGform of the test,

which is appropriate because fully parametric models are being considered here so that ˆθ will be theMLE. The test is implemented calculating n times the uncentered R2from the artificial regression of 1 on the scores s

i(yi, xi, ˆθ) and

di j(yi)− pi j(xi, ˆθ), j = 1, . . . , J − 1, where one cell has been dropped due to

rank[Vm]= J −1. In some casesrank[Vm]< J −1. This occurs if the estimator

ˆ

θ is the solution to first-order conditions that set a linear transformation of m( ˆθ)

equal to zero or is asymptotically equivalent to such an estimator. An example is the multinomial model, with an extreme case being the binary logit model whose first-order conditions implyni=1(di j(yi)− pi j(xi, ˆθ)) = 0, j = 0, 1.

The test statistic (5.33) is called the chi-square goodness-of-fit test, as it is a generalization of Pearson’s chi-square test,

J j=1 (n ¯pj− n ˆpj)2 nˆpj . (5.34)

In an exercise it is shown that (5.34) can be rewritten as (5.33) in the spe- cial case in which Vm is a diagonal matrix with ith entry

n

i=1 pi j(xi, θ).

Although this is the case in the application originally considered by Pear- son – yi is iid and takes only J discrete values and a multinomial MLEis

used – in most regression applications the more general form (5.33) must be used. The generalization of Pearson’s original chi-square test by Heckman (1984), Tauchen (1985), Andrews (1988a, 1988b), and others is reviewed in Andrews (1988b, pp. 140–141). For simplicity we have considered partition of the range of y into J cells. More generally the partition may be over the range of (y, x).

Example: Takeover Bids (Continued)

We consider goodness-of-fit measures for the Poisson estimates given in Ta- ble 5.3. The Pearson statistic (5.17) is 72.52, much less than its theoretical value of n− k = 116, indicating underdispersion. The deviance statistic (5.21) is 75.87. The Poisson deviance R2given in (5.26) equals .25 while the Pear-

son R2 given in (5.29) equals .35. Note that these two R2 measures are still

valid if the conditional variance equals αµi rather than µi and can be eas-

ily computed using knowledge of the deviance and Pearson statistics plus the frequency distribution given in Table 5.1. Although experience with these R2

measures is limited, it seems reasonable to conclude that the fit is quite good for cross-section data. If one instead runs anOLSregression, the R2equals .24.

Before performing a formal chi-square goodness-of-fit test, it is insightful to compare predicted relative frequencies ˆpj with actual relative frequencies

¯pj. These are given in Table 5.6, where counts of five or more are grouped

into the one cell to prevent cell sizes from getting too small. Clearly the Pois- son overpredicts greatly the number of zeros and underpredicts the number of ones.

Table 5.6. Takeover bids: PoissonMLE

predicted and actual probabilities

Counts Actual Predicted |Diff| Pearson 0 .0714 .2132 .1418 11.81 1 .5000 .2977 .2020 17.32 2 .2460 .2327 .0133 .10 3 .0952 .1367 .0415 1.58 4 .0476 .0680 .0204 .77 ≥5 .0397 .0517 .0120 .00

Note: Actual, actual relative frequency; Predicted, pre-

dicted relative frequency;|Diff|, absolute difference be- tween predicted and actual probabilities; Pearson, contri- bution to Pearson’s chi-square test.

The last column of Table 5.6 gives n( ¯pj− ˆpj)2/ ˆpj, which is the contribution

of count j to Pearson’s chi-square test statistic (5.34). Although this test statistic, whose value is 31.58, is inappropriate due to failure to control for estimation error in ˆpj, it does suggest that the major contributors to the formal test will be

the predictions for zeros and ones. The formal chi-square test statistic (5.33) yields a value 48.66 compared to a χ2(5) critical value of 9.24 at 5%. The

Poisson model is strongly rejected.

We conclude that the Poisson is an inadequate fully parametric model, due to its inability to model the relatively few zeros in the sample. Analysis of the data by Cameron and Johansson (1997) using alternative parametric models – Katz, hurdle, double-Poisson, and a flexible parametric model – is briefly discussed in section 12.3.3. Interestingly, none of the earlier diagnostics, such as residual analysis, detected this weakness in the Poisson estimates.