A NÁLISIS DE LOS G ESTORES DE B ASE DE D ATOS

CAPÍTULO 1 FUNDAMENTACIÓN TEÓRICA

1.7 A NÁLISIS DE LOS G ESTORES DE B ASE DE D ATOS

In this section, an adequate ARIMA model of order¹⁰⁶ 𝑝, 𝑑 and 𝑞 is determined for the time series which consists of the annual gross electricity demand of Turkey in the period 1970-2014. The relation of the ARIMA model with the corresponding time series is achieved by a three-stage iterative procedure based on identification, estimation, and diagnostic checking (i.e the Box-Jenkins model building process mentioned in Subsection 10.2.8). In the 4^th fourth stage, the annual gross electricity demand of Turkey in the period 2015-2025 is forecasted by utilizing the determined adequate model.

The analysis is carried out by using the license free statistical software R (version 3.1.2) and also by using the RStudio IDE (Integrated Development Environment, Version 0.98.1102) which is a license free user interface for R.

The 1^st Stage- Model Identification

During this stage, the subclasses of parsimonious models are specified according to how the considered data were generated. In particular, the objective is to obtain some tentative values of 𝑝, 𝑑 and 𝑞 orders needed in the general linear ARIMA model building process.

First of all, the annual gross electricity demand of Turkey in the period 1970-2014 (i.e.

composed of 45 observations) is plotted against time and visually examined for

106 Note that the orders 𝑝, 𝑑 and 𝑞 indicate the order of autoregressive component, differencing and moving average component of a general ARIMA model respectively.

159 stationarity (see Figure 48). It can be inferred from the figure that the non-linear upward trend is evident for the presence of non-stationarity in the time series.

Figure 48- The annual gross electricity demand in the period 1980-2014 (continuous line) and its mean (dashed line) (own illustration according to TEIAS (2015)

The sample autocorrelation function (ACF) of the time series also indicates non-stationarity by showing large autocorrelations that diminish very slowly as lags increase (see Figure 49).

Therefore, it is necessary to transform the data into a stationary time series by differencing.

Figure 49- The sample ACF of the annual gross electricity demand series (own calculation & illustration)

In Figure 50, the first order difference of the time series is displayed. The general upward trend is alleviated and it is also observed that as the lags increase, the dispersion in the series increase. In the figure, the cone-shaped pattern (i.e. depicted as dotted green dashed lines) indicates variance instability (i.e. heteroscedasticity). The variance of the data should also be stable (homoscedastic); in order to validate the stationarity condition. Consequently, the

160 second order difference should be taken; in order to make the observations vary around their mean.

Figure 50- The scatter plot of the first order difference of the annual gross electricity demand series (circle) and its mean (dashed line) (own calculation & illustration)

The sample ACF of the once differenced series also suggests that differencing is required by indicating non-stationarity through displaying a recurrent pattern which decays slowly (see Figure 51).

Figure 51- The sample ACF of first order difference of gross annual electricity demand series (own calculation &

illustration)

161 In Figure 52, the second order difference of the time series is displayed. It can be inferred that the general upward trend has disappeared and the observations vary around their mean;

however the heteroscedasticity still recurs (i.e. depicted as dotted green dashed lines).

Subsequently, the data should be transformed according to the Box-Cox power transformation technique before fitting any tentative models. The Box-Cox technique of data transformation is utilized for determining a power transformation of the data to stabilize the variance of the series (see Subsection 10.2.7.1 for information).

Figure 52- The scatter plot of the second order difference of the annual gross electricity demand series (circle) and its mean (dashed line) (own calculation & illustration)

The “BoxCox” function (in package “FitAR” version 1.94) is applied on the original series and the transformation parameter (𝜆) is estimated by maximizing the relative likelihood (𝑅(𝜆)) of the series. The estimated transformation parameter (𝜆̂) is computed as -0.049 (see Figure 53). Subsequently, the original series is transformed by using the computed 𝜆̂ and then twice differenced. The sample ACF and the sample PACF of the resulting series are displayed in Figure 54.

162

Figure 53- The plot of Box-Cox transformation analysis of the original series (own calculation & illustration)

In Figure 54, the sample ACF of the transformed and the twice differenced series indicates that the stationarity is achieved with the Box-Cox transformation of the series. The sample ACF of the corresponding series indicates no spikes (i.e. randomness); whereas the sample PACF (partial autocorrelation function) indicates only one significant spike at lag 2. Both patterns in sample ACF and sample PACF do not indicate a specific tentative model according to the tabulated theoretical patterns in Table 24 on p. 133. In particular, the patterns indicate an ARIMA model which is especially difficult to identify. In order to analyze all possible tentative ARIMA(p,d,q) models, the auto.arima function (in package “forecast”

Version 6.1) is applied on the corresponding series. Note that the number of times the data have been differenced to become stationary (i.e. d) equals to 2. During the next stage, the parameter estimation process is carried out according to the possible tentative models.

Figure 54- The sample ACF and the sample PACF of the Box-Cox transformed and the twice differenced annual gross electricity demand series (own calculation & illustration)

163 The 2^nd Stage- Parameter Estimation

During the second stage of the time series modelling, the parameters of the possible tentative models are estimated and the corresponding models are compared based on their corrected AIC criterion¹⁰⁷ (AICc). The AICc criterion is considered for comparing the tentative models;

since the AICc is usually superior in smaller samples where the relative number of parameters is large (Shumway & Stoffer, p. 53).

The “auto.arima” function in R is utilized for automatically fitting a number of tentative models by varying the assigned starting values of the orders (i.e. 𝑝 and 𝑞). In contrast, the order of differencing is once given, cannot be varied. The utilized function increases the assigned starting values in a stepwise manner and finally returns the tentative model with the lowest magnitude of the selected information criterion. During the analysis, the starting values of 𝑝 and 𝑞 are set to zero, while the order of differencing is set to 2. Note that the value of 𝑝 and 𝑞 can be automatically raised to 5 at maximum during computations. The maximum likelihood method is preferred for fitting the models to the time series data.

The result of the model fitting analysis is tabulated below in Table 30. In total seven different ARIMA models are analyzed (the first model is calculated twice). Note that due to having positive log-likelihoods, the analyzed tentative models have negative AICc values (e.g. see APPENDIX B for the summary statistics of the selected model). The best model is selected to be the ARIMA(1,2,1) model; however this is not the adequate model selected to be representing the process generating the considered time series.

Table 30-The output from the auto.arima function for the analyzed tentative models (own calculation & illustration)

107 See Subsection 10.2.9.4 on p. 138 for more information.

Time Series Model AICc Value ARIMA(0,2,0) -175.17 ARIMA(0,2,0) -175.17 ARIMA(1,2,0) -175.27 ARIMA(0,2,1) -179.58 ARIMA(1,2,1) -180.76 ARIMA(1,2,2) -179.16 ARIMA(2,2,2) -176.64 ARIMA(2,2,1) -179.21 Best model: ARIMA(1,2,1)

164 The best model, which is returned after the computation, indicates the selected tentative model based on the lowest magnitude of AICc criterion among others. The selected tentative model can be an adequate model as a mathematical representation of the linear stochastic process under study, after being validated for its adequacy through running diagnostic tests on its residuals. If the selected tentative model is found to be inadequate, the next model on the lowest rank of AICc criterion is considered for diagnostic checking.

The coefficients of the ARIMA(1,2,1) model and their standard errors are tabulated in Table 31. The error measures of the fitted model are observed to be at an acceptable level; especially the mean absolute percentage error (MAPE) of 2.52% (see APPENDIX B for the detailed summary statistics).

Table 31- The brief summary statistics of the ARIMA(1,2,1) model (own calculation & illustration)

The ARIMA(1,2,1) model can be expressed in general form as follows:

∇²(1 − 𝜙₁𝐵) 𝑧_𝑡 = (1 − 𝜃₁𝐵)𝑎_𝑡 (11.3.1)

(1 − 𝜙₁𝐵 − 2𝐵 + 2𝜙₁𝐵²+ 𝐵²− 𝜙₁𝐵³) 𝑧_𝑡 = (1 − 𝜃₁𝐵)𝑎_𝑡 (11.3.2) 𝑧_𝑡 = (2 + 𝜙₁)𝑧_𝑡−1− (1 + 2𝜙₁)𝑧_𝑡−2+ 𝜙₁𝑧_𝑡−3+ 𝑎_𝑡− 𝜃₁𝑎_𝑡−1 (11.3.3)

In the general form, the symbol “∇” represents the difference operator and indicates that the series is differenced twice ∇²= (1 − 𝐵)². The symbol "𝐵" is the back shift operator. The symbol "𝑧_𝑡" indicates the gross electricity demand at time 𝑡. The symbol "𝜙" represents the autoregressive parameter; whereas the symbol "𝜃" indicates the moving average parameter.

The random shock at time 𝑡 is represented as "𝑎_𝑡" respectively. The model can be expressed in equation form with the computed parameters as shown below:

(1 − B)²(1 − 0.3558𝐵)𝑧_𝑡 = (1 + 0.8482𝐵)𝑎_𝑡 (11.3.4) 𝑧_𝑡 =(2 +0.3558)𝑧_𝑡−1− (1 + 2 ∙ 0.3558)𝑧_𝑡−2+ 0.3558𝑧_𝑡−3+ 𝑎_𝑡+

0.8482𝑎_𝑡−1

(11.3.5)

AR(1) MA(1)

Coefficients 0.3558 -0.8482 Standard Error 0.1797 0.0969

165 𝑧_𝑡 = 2.3558𝑧_𝑡−1− 1.7116𝑧_𝑡−2+ 0.3558𝑧_𝑡−3+ 𝑎_𝑡+ 0.8482𝑎_𝑡−1 (11.3.6)

The goodness of the fitted model ARIMA(1,2,1) can also be visually examined by plotting the predicted values versus the observed values as displayed in Figure 55. The plot indicates a successful fit to the data by substantially overlapping green and red circles representing the observed and the predicted values respectively.

Figure 55- The scatter plot of the observed values (green circles) and the predicted values from ARIMA(1,2,1) model (red circles) (own calculation & illustration)

The 3^rd Stage- Diagnostic Checking of Residuals

During the third stage of the time series modelling, the residuals of the ARIMA(1,2,1) model is tested for normality and randomness by both graphical and analytical methods. See Subsection 10.1.1.5.1 on p. 99 for more information about the analysis of residuals.

The Quantile-Quantile (Q-Q) plot of the residuals from the ARIMA (1,2,1) model is represented in Figure 56 to examine whether the residuals are normally distributed. The

“qqnorm” function (in package stats version 3.1.2) is used for plotting the residuals. Further, the function “qqline” adds a line (so called Q-Q line) to a normal Q-Q plot which passes through the first and third quartiles. Although there exits deviations at the tails of the Q-Q line; most of the values are close to it. Hence, the residuals are considered to be normally distributed.

166

Figure 56- The Q-Q plot of the residuals from ARIMA(1,2,1) model (own calculation & illustration)

In addition to the examination of the Q-Q plot, the function “shapiro.test” (in package stats) is applied on the data to analytically examine the normality according to the shapiro-wilk test of normality. The value of the test statistic W and p-value are computed to be 0.98 and 0.50 respectively. Accordingly, there is not enough evidence to reject the null hypothesis which states that the residuals are from a normally distributed population at an alpha level of 5% (i.e.

a p-value higher than 5%).

The standardized residuals of the model, displayed in Figure 57, resemble identically and independently distributed white noise series by varying around the zero horizontal level (i.e.

mean of the series). There are three unusual residuals with magnitudes higher than 2; however these residuals are caused by the abrupt change in the demand growth rate originating from the financial crisis of 1978 and 2001 in Turkey and the global financial crisis of 2009 respectively.

167

Figure 57- The standardized residuals from ARIMA(1,2,1) model (own calculation & illustration)

The sample ACF and the sample PACF of the residuals from ARIMA(1,2,1) model can be examined for correlation at each individual lag as displayed in Figure 58. In both of the plots, there is not any statistically significant spike indicating correlation.

Figure 58- The sample ACF and the PACF plot of the residuals from ARIMA(1,2,1) model (own calculation &

illustration)

In addition to examining residual correlations at individual lags, it is also useful to carry out Ljung-Box test that takes into account the magnitudes of autocorrelations as a group. Note that the number of lags to be tested is required to be given by the practitioner. According to the analysis conducted by Hyndman and Athanasopoulos (2014, p. 56), the number of lags to be jointly tested can be practically considered as the result of the function “𝑓” for a given sample size “𝑛”.

𝑓(𝑛) = min (10, 𝑛/5) ^(11.3.7)

Thus, the number of lags for analysis is set to 9; since 𝑛 = 45. By applying Ljung-Box test function (in package “stats” version 3.1.2), the value of the test statistic Q and p-value are

168 computed to be 7.3 and 0.4 respectively (for 7 degrees of freedom). The examination of the residuals indicates that there is not enough evidence to reject the null hypothesis of independently distributed series at an alpha level of 5% (i.e. considering a p-value higher than 5%).

To sum up, The ARIMA (1,2,1) is validated through all diagnostic tests and is determined to be an adequate model representing the stochastic process generating the time series data.

Hence, the model can be utilized for forecasting the annual gross electricity demand of Turkey.

The 4^th Stage- Forecasting Annual Gross Electricity Demand of Turkey

After validating the adequacy of the ARIMA(1,2,1) model, the annual gross electricity demand of Turkey is forecasted for the period 2015-2025 by utilizing the function “forecast”

(in package “forecast”). The confidence level for forecast intervals is set to 95%. The computed forecasts and their corresponding forecast intervals are displayed in Figure 59.

According to the results of forecasting, the annual gross electricity demand increases from 270 TWh_el to 456 TWh_el in the mentioned period. The corresponding annual mean growth rate is determined to be 5.3% per annum (see Table 32). Note that forecasting is recursively carried out by utilizing time series models. Accordingly, the uncertainty in forecasts rises as the forecast horizon rises; as it can be deduced from the displayed forecast intervals.

Figure 59- The historical gross electricity demand (solid black line), corresponding forecasts (dashed blue line) and forecast limits (dotted dashed red lines) (own calculation & illustration)

169 In Table 32, the results of forecasting including the change in annual gross electricity demand w.r.t. the previous years are tabulated.

Table 32- The forecasted annual gross electricity demand by using ARIMA(1,2,1) model (own calculation &

illustration)

In order to retrospectively check the feasibility of forecasting, the forecasted period is compared to the historical development in the period 2000-2014 as displayed in Figure 60. In the figure, the historical data are represented as solid lines; whereas the forecasts are represented as dashed lines. The forecasted annual gross electricity demand exhibits an increasing trend roughly similar to the one in the period 2000-2014, if the sudden falls in demand are ignored. In particular, the annual mean growth rate for the forecasted period is about the same as the corresponding growth rate in the period 2000-2014 (i.e. 5.2%/a).

Figure 60- The development of the annual gross electricity demand in the period 2000-2014 and in the forecasted period 2015-2025 (own calculation & illustration)

Gross Electricity Demand Change w.r.t. Previous Year Annual Mean Growth Rate

170 In Table 33, the forecasts made in the previous studies are represented and compared with the forecasts of this study. Note that the forecasts indicate the latest forecasted year in the corresponding studies. See Subsection 11.3.1 for more information about the previous studies.

Table 33- The comparison of the forecasts from the previous studies and this study (own illustration)

The results of forecasting in this study are observed to be different from the presented results in previous studies. Namely, the forecasted year 2023 of this study is observed to be between the low case and reference case scenarios of TEIAS. Further, the forecasted year 2020 is seen to be well below the low case scenario of the study by Özturk and Ceylan. Furthermore, the forecasted year 2025 is seen to be well above the both forecasts using ANN and SVM by Kücükdeniz. Moreover, the forecasted year 2020 is observed to be between the reference case and the high case scenarios of the study by Dilaver and Hunt.

Nevertheless, the forecasts of this study are observed to be close to the two previous studies by Akay and Atak and by Yavuzdemir and Gökgöz (i.e. true only for forecasting by using TSA). Note that Yavuzdemir and Gökgöz did not provide any information about their model adequacy rather indicated a MAPE of 2.75% (i.e. 2.52% for this study). Although Polater also used TSA, his forecast for the year 2023 is higher than the forecast in this study. However the ARIMA(2,2,0) model utilized by Polater cannot be an adequate model for forecasting the annual gross electricity demand (see p. 156).

In conclusion, the statistical evidences and historical data examination revealed that the forecasted time series using ARIMA(1,2,1) model adequately reflects the development of the annual gross electricity demand of Turkey for the period 2015-2025. In addition, the conducted analysis is statistically more reliable than the mentioned previous studies utilizing TSA for forecasting.

Akay & Atak (2007) GPRM - 2015 266 270

ANN 294

In document Modulo Registro de Evaluacion para el Simulador Quirurgico (página 36-43)