The specification search is similar to that described in the previous section, although we also investigate models with double log functional form here. The coefficient for the CHILD variable (average number of children) is consistently negative and significant, which is opposite to expectation. The coefficient for the WORKER variable (average number of person in work) is positive but not significant for all linear and semi-log linear models; for all double log models, it is negative and significant. However, the double log models with CHILD, WORKER and HHSIZE as household structure variables could be mis-specified according to the RESET test. On the other hand, using the split of household types as explanatory variables produces more satisfactory results. Regarding the household location variables, the more detailed breakdown of location type does not offer additional explanatory power to the model with only the AREA5 variable (proportion living in the least populated rural area) being significant. Consequently, the compressed location variables are chosen in the preferred model.
For the models with the preferred form of household type and location variables, the difference between those with different functional forms (linear, semi log and double log) is more subtle. All models have high adjusted R square, and none of the RESET test rejects the hypothesis of no misspecification. The Durbin-Watson statistic is very close to 2 for all models, which does not indicate the presence of autocorrelation. However, the White test rejects the hypothesis of homoskedasticity for all models, which will be addressed later.
Similar to the static model, the sign and magnitude of the regression coefficients are used as additional criteria to determine the model of best fit. The household structure
and location variables are the same for all three models, and their coefficients are very similar for the linear and semi-log models (and comparable for the double log model). None of the coefficients of the purchase price elasticity are significant at 10% level, so no reliable conclusion can be made about the price elasticity. The income and running cost elasticities, as implied by models of different functional forms, are different. Table 5.2 and 5.3 compare the short run and long run elasticities implied by the three models. Table 5-2 Short Run and Long Run Income Elasticity
Short Run Long Run
Income /
Car Linear Semi-Log Dbl-Log Linear Semi-Log Dbl-Log
Low 0.173 0.394 0.225 0.203 0.460 0.316
Middle 0.141 0.181 0.225 0.165 0.211 0.316
High 0.145 0.133 0.225 0.171 0.155 0.316
Table 5-3 Short Run and Long Run Running Cost Elasticity
Short Run Long Run
Car
Linear Semi-Log Dbl-Log Linear Semi-Log Dbl-Log
Low -0.371 -0.309 -0.067 -0.436 -0.360 -0.095
Middle -0.170 -0.141 -0.067 -0.200 -0.165 -0.095
High -0.125 -0.104 -0.067 -0.147 -0.121 -0.095
Note: 1. Low, middle and high real disposable income are 172, 306 and 430 pound per week respectively; low, middle and high car ownership level are 0.42, 0.92 and 1.25 cars per household respectively.
2. The running cost elasticity is based on a constant cost index of 100 (1995 level);
3. The coefficient of running cost variable is not significant for the double log model, and hence the corresponding elasticity is in italic.
In the double log model, the income and running cost elasticities are constant for families with variable income/car ownership level. This is a characteristic of the double log model and could be problematic for car ownership models, where the income elasticity is known to decline with income and “saturation” is observed in mature car markets such as the UK. Consequently, the double log model is regarded as inappropriate form. The income elasticities implied by the linear model are low and similar cross household with various income and car ownership level; meanwhile, the running cost elasticity is much higher than the income elasticity for low car ownership households and declines rapidly with car ownership level. Regarding the semi-log model, the income elasticity declines rapidly with car ownership level, and the running cost elasticities are always lower than the income elasticities by a consistent proportion. Overall, the income and price elasticity implied by the semi-log model seem more sensible so the semi-log model is selected as the preferred model.
As mentioned above, White test detects presence of heteroskedasticity in all models with different functional forms. Further examination of the residual identifies that one observation has particularly high prediction error. Figure 5.1 shows the residual plot of the semi-log model with the preferred household structure and location variables. Figure 5-1 Residual Plot of Semi-Log Model with outlier
Residuals. Bars mark mean res. and +/- 2s(e)Observ.# -2 -1 0 1 2 -3 48 96 144 192 240 0 Residual
Note: X-axis is ordered by year for each cohort
Figure 5-2 Identifying outlier for cohort 9 in year 1999
Identifying Outlier for Cohort 9
0.80 0.90 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 N o . o f C a rs 300 350 400 450 500 550 600 W e e k ly I n c o m e Car Income Unexpected rise of income and drop of car ownership level.
The outlier is the observation for cohort 9 (household heads born between 1941 and 1945) in 1999. The real weekly disposable income increased from £415 in the previous year to £433 and then suddenly dropped to £367 in the following year; on the other
hand, the car ownership level dropped from 1.36 cars per household in 1998 to 1.19 cars and then increased to 1.23 cars. Figure 5.2 illustrates the unexpected change of income and car ownership level for cohort 9 in 1999.
In the dynamic model, the outlier can not be simply excluded, since that will upset the dynamic relationship. As a result, a new dummy variable “OUT”, which takes the value of 1 for the outlier and 0 for other observations, was added to the model. The coefficient of the “OUT” variable measures the prediction error for this observation. The likelihood ratio test produces a Chi Square statistic of 21.7, which strongly suggests the increased level of fit with the additional variable. When the model is re- estimated, the coefficient for the purchase price variable is wrong signed and not significant. As a result, the purchase price variable is dropped from the model.