de las Reglas Diversión y Entretenimiento
REGLA 13 SAQUES (DE INICIO Y PUNTAPIÉS DE REINICIO) DEFINICIÓN
After estimating our chosen model, the next step is to validate it by checking the residuals of the estimated ARIMA model to see if it satisfies the three assumptions of the white noise process. Recall the three assumptions of the error term:
(i) E (ε t) = 0
(zero mean)
(ii) Var (ε t) = σ2 (constant variance)
(iii) Cov (ε t, ε t+k) = 0 for all t and arbitrarily chosen k = 1, 2 …n
For diagnostic checking of the estimated models, different diagnostic check methods can be applied. One method is to use the ACF and PACF of the residual and check if they are significantly different from zero or not. A lag is significant if it lies outside the 95% confidence limit line or outside the two-standard error band [-2/√T, 2/√T], where T, is the sample size. However, instead of testing the statistical significance of any individual autocorrelation coefficient, we can use Ljung-Box test and Q-statistics to test the joint hypothesis that all autocorrelation coefficients up to certain lags are simultaneously equal to zero. This test is provided in the tool we are using for ARIMA modelling, (Quantitative Micro Software, 2007).The test is represented as
Ljung-Box Q-statistics ( 2) ( ˆ ) 2( ) 1 2 m k T T T Q m k k LB χ ρ ≈ − + =
∑
= [ 4.21]where T is the sample size, m is the number of autocorrelations included in the test.
According to Box and Jenking (1971), m can be chosen between the range of [20, T-1] and Tsay (2005) suggested that m ~ ln (T) can also be used. Moreover, Albright et al. (2009) recommended that the value of m should not exceed more than 25% of the number of observations. For this test the hypothesis is stated as:
H 0: No autocorrelation through m-lags, ρ1= ρ2=…. ρm= 0
H 1: there is an autocorrelation, ρm≠ 0
There are two ways to examine the hypothesis of the test, one way is by looking at the Ljung- Box Q-statistics. If the Ljung-Box Q-statistics exceeds the critical value from the chi-square
distribution at the chosen level of significance (5% in this study), then the null hypothesis can be rejected. We can also check the p-values of all the Ljung-Box Q-statistics up to a chosen m lag. The p-values should be insignificant and all equal or greater than 0.05. However, as stated by Markridakis et al.(1998, p. 326), it is acceptable to have around 5% of spikes exceeding the limits.
Another alternative to the Q-statistics test for testing serial correlation is the Breusch-Godfrey or the Lagrange Multiplier test. This test is also used to test the residuals in this study in order to ensure that the chosen model is adequate. The concept of Breusch-Godfrey is to conduct another regression (auxiliary regression) in which all the independent variables of the estimated model and number of lagged error terms are regressed on the current value of the error. The R2then obtained to compute the following test
2
)
(T p R
LM = − [ 4.22]
where, T is number of observations, p is number of lagged error terms needing to be included. However, when reviewing some of the materials related to this test we could not find one rule for deciding the number of lags of the residuals to include. For example, Brooks (2008, p. 166) stated that the difficulty with this test lay with determining the right number of lags for the residuals that needed to be included. The author recommended, for example, including 12 lags of the residuals for monthly series and four lags for the quarterly. In contrast, Baltagi (2008, p. 115) suggested including a higher order of the autoregressive or the moving average of the model as the number of residual lags that needed to be tested for. This was confirmed by Breusch (1978) and Franses (1998, p. 57). Breusch (1978) also stated that “LM statistics for testing against autocorrelation of MA type is the same as that for testing against AR of the same order”. Therefore, based on the literature, the highest order of either AR or MA will be considered as the number of lags of the residuals to be included in the test. However, in an attempt to achieve an accurate a result as possible, if the highest order of the model is found to be less than, for example, 12 in the monthly series, then the method suggested by Brooks (2008) will be applied. The test involves the specification of the following hypothesis: H 0: the error is correlated
H 1: the error is not correlated
The statistics are distributed as Chi-squared, with p degrees of freedom and the decision is to reject the null hypothesis if the p-value of the Breusch-Godfrey (LM) statistics is less than 5%. By testing the correlation of the errors we have completed all the assumptions suggested
by Box and Jenkins. However, more often in the literature, the normality of the residual is also tested; but it is worth noting that satisfying the normality of the residual is not one of the assumptions suggested by Box and Jenkins (Brooks, 2008). Thus, in this study, the normality will be reported using the Jarque-Bera test only as an additional test to give an overall picture of the distribution of the error. The Jarque-Bera (JB) statistic is distributed as Chi-square with two degrees of freedom. The Jarque-Bera test is represented in following equation:
+ − − = 2 ( 3)2 4 1 6 ) ( K S T JB β [ 4.23]
where T is the number of observations, β is the number of estimated regression coefficients, S is the skewness and K is the kurtosis.
The test involves the specification of the following hypotheses: H 0: the residual is normally distributed
H 1: the residual is not normally distributed
For this test, if the p-value of the Jarque-Bera test is less than 0.05, the null hypothesis is rejected, otherwise, if all assumptions about the error term of the estimated ARIMA model are satisfied, then we precede to the last step, forecasting. However, if the model is found to be unsatisfactory (the assumptions are not met), then the specification, parameter estimation and diagnostic checking phases need to be repeated until the assumption of the residual is satisfied and the errors are not correlated (Chatfield, 2004). However, if the residual is not normally distributed the model can still be used for forecasting but we cannot use the t and F statistics to draw any inferences.