MEDIOS DE IMPLEMENTACIÓN
C. SEGUIMIENTO Y REVISIÓN
As we have shown in Section 2.3.1, AIC is a special case of TIC and GIC is an even more general criterion from which TIC can be derived. A logical conclusion becomes that we should always use GIC for model evaluation. However, from a practical standpoint, this may not be always the optimal choice or even necessary.
We assume that the large sample condition is satisfied so that the need to use AICc, instead of AIC, TIC, and GIC, is not in consideration in our following discussions.
Let us first consider the possible choice between AIC and TIC for model evaluation and selection. As we have shown in previous examples that the trace term EMD=
tr�K(ˆθ)J(ˆθ)−1� is about equal to k for “good” models. What we mean by “good”
models may be explained by Akaike (1974) in which Akaike stated that if the true distribution that generated the data exists near the specified parametric model, the bias associated with the sample log-likelihood of the model estimated through the empirical distribution of the truth can be approximated by the number of parameters. When a fitted model is misspecified, Burnham and Anderson (2002) found that EMD’s behaviour is unpredictable, i.e. EMD can be =k, > k, or < k. The change of AIC scores is independent of the pattern of EMD. Figure 2.2 and 2.3, which are adapted from Example 11 (p64) and Figure 5.3 (p125) in Konishi and Kitagawa (2008), provide two examples to illustrate this. For Figure 2.2, the data samples are generated from the true model
g(x) = (1−�)φ(x|µ1,σ1) +�φ(x|µ2,σ2) (0≤�≤1),
where � denotes the mixing ratio. φ(x|µ,σ) is the pdf of a normal distribution with mean µ and standard deviation σ and we have set µ1 =µ2 = 0; σ1 = 4, σ2 = 1. At
each of the 20 different � values, 200 simulated samples (n=1000) are generated. A normal distribution model is then fitted to each of the 200 simulated samples and AIC scores and EMDs are calculated. For calculating TIC, the EMD=tr�K(ˆθ)J(ˆθ)−1� is
defined in Equation(2.6). The EMD values are plotted in the upper panel. The solid line represents the mean EMDs and two dashed lines form a 95% quantile confidence band. The AIC scores are plotted in the lower panel with solid line for the means and dashed lines for 95% quantile confidence band. Figure 2.3 is produced in the same way as with Figure 2.2, but the data samples are generated from a different true model:
g(x) = (1−�)�0.3λ1exp(−λ1x)+0.7λ2exp(−λ2x)
�
+�Weibull(x|shape, scale) (0≤�≤1),
where � denotes the mixing ratio and we have set λ1 = 0.5, λ2 = 2. The expression
Weibull(x|shape, scale) represents the pdf of a Weibull distribution with shape = 3 and scale = 1.5. A mixed-exponential distribution model pλ1exp(−λ1x) + (1 −
p)λ2exp(−λ2x) (0 < p < 1) is then fitted to the simulated data and AIC scores
and EMDs are calculated.
From Figure 2.2, in the upper panel we notice that EMD = k = 2 when the fitted model is correctly specified, i.e. when � = 0 or � = 1. The AIC score pattern in the lower panel shows a monotonic increase over the whole� range. From Figure 2.3, again we observe that EMD = k = 3 when the fitted model is correctly specified, i.e.
! ! ! !! ! !! ! ! ! ! ! ! ! ! ! ! ! ! 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 EMD ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! 0.0 0.2 0.4 0.6 0.8 1.0 3000 4000 5000 mixing ratio !! AIC
Figure 2.2: AIC versus TIC: A normal distribution model case
when �= 0 in the upper panel. However, both the upper panel and the lower panel in Figure 2.3 show complete different patterns from that in Figure 2.2. From both Figure 2.2 and 2.3 we notice that the absolute level of EMDs is very small compared with the absolute level of the AIC scores. Also, the sampling variation range of EMDs (less than 10) is much less than that of AIC scores (about a few hundreds).
If large samples are available, TIC might offer an improvement over AIC. However, it has been found that the estimation error of those two k×k matrices K(ˆθ) and
J(ˆθ) can cause instability of the results of model selection (critical remarks on TIC, Section 2.3.3). When k gets large, k > 10 say, the computation of EMD can become a problem, too. Burnham and Anderson (2002), p435, reported that for all the cases they examined (in comparison of AIC versus TIC), if the model was less general than the truth (the real world case), they predominantly found that tr�K(ˆθ)J(ˆθ)−1� <
k. Thus, use of AIC should then often lead to slightly more parsimonious models than the use of TIC. Our experience in using AIC and TIC agrees with what they
!!!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 EMD !!!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 0.0 0.2 0.4 0.6 0.8 1.0 1400 1800 2200 2600 mixing ratio !! AIC
Figure 2.3: AIC versus TIC: A mixed-exponential distribution model case
reported. From a practical point of view, Konishi and Kitagawa (2008) argued that the AIC does not require any analytical derivation of the bias correction terms for individual problems and does not depend on the unknown probability distribution G, which removes fluctuations due to the estimation of the bias.
Theoretically, GIC is a general model evaluation criterion which extends the ap- plication domain to include the M-estimator framework. In practice, however, the calculation of the GIC trace term can be non-trivial. Because of the M-estimator ver- sion AIC (Equation 2.8) derived by Konishi and Kitagawa (2008), the research findings from comparing AIC against TIC are likely applicable for the comparison between AIC and GIC.
In principle, the application of using AIC for model evaluation should have been considerably limited by the strong true model inclusion assumption condition. How- ever, from a practical point of view, we consider AIC to be a good proxy for TIC or GIC in model evaluation for the following reasons. By definition, the difference between
AIC and TIC (as well as between the M-estimator version AIC and GIC) is in their trace terms. Therefore, when the specified models are close to the true distribution, this implies thatk ≈tr�K(ˆθ)J(ˆθ)−1�ork≈tr�R(ψ,Gˆ)−1Q(ψ,Gˆ)�; then AIC≈TIC
or AIC≈GIC. In the case that a model is badly misspecified (the fitted model is far away from the true model), the sample log likelihood term (evaluated at MLE or at M-estimate) is in absolute domination over the trace term so that a bad model would not stand a chance to be selected. AIC and its generalized variants will select the same best model as long as at least there is one ‘good’ model, i.e. not all candidate models have almost the same ‘distance’ (within two AIC units variation, as a rule of thumb) from the true model.
To sum up, we reach the following conclusion. AIC is a practical and parsimonious implementation of TIC and GIC. The theoretical justification of the M-estimator ver- sion AIC has virtually removed the need to calculate TIC or GIC in model evaluation practice. Given required regularity conditions, it is always valid to apply AIC (MLE version or M-estimator version) for model evaluation in data analysis practice without worrying about the true model inclusion assumption. More real data empirical study cases are needed to test this conclusion.