Considering the KIT dataset, Figure 5.7 depicts the box plots of the pinball-loss (QPL, cf. Equation (2.25)) and average modified reliability deviation (QMRD, cf. Equation (2.27)) values obtained by both the NNQF-based quantile regressions and the “true" Poly3 benchmark.
Poly1
Figure 5.7: Results obtained by quantile regressions on the KIT dataset;
QPL: average pinball-loss; QMRD: average modified reliability deviation
The results in Figure 5.7 show that all data mining techniques are able to obtain QPLvalues with a low amount of variance and with median val-ues that never surpass 2%. Likewise, it can be observed that the ANNs
5.2 Application on Real Data
deliver slightly better results than the polynomials. Moreover, the median QPLof the used benchmark is slightly better than that of the NNQF based polynomials; an expected result considering that the “true" Poly3 models are actually trained to minimize the sum of pinball-losses. Still it must be considered, that training all “true" Poly3 regressions takes around 107 [s], while their NNQF counterparts are trained in only 50 [s]. In other words, training the former takes approximately 114% more time. This offers more evidence to the trade-off between computation time and pinball-loss accu-racy described in Section 5.2.4. Furthermore, the ANNs obtain in general the best QPLvalues, thus supporting the fact that the NNQF allows quantile regressions to be trained with more complex techniques using their tradi-tional cost functions and training algorithms. In terms of QMRD, the vari-ance of all results appears to be much larger than that of the corresponding pinball-losses, with that of the NNQF-based polynomials being the largest.
Moreover, both the ANN10 and the “true" Poly3 benchmark obtain similar QMRDbox plots.
After training the quantile regressions, 49 intervals can be obtained (cf.
Section 2.2.2); these are all centered on the regression describing the me-dian and have probabilities ranging from 0.02 to 0.98. Figure 5.8 depicts box plots of the intervals’ average interval width (QIW, cf. Equation (2.31)), average interval score (QIS, cf. Equation (2.32)), and average modified in-terval reliability deviation (QMIRD, cf. Equation (2.33)).
As the results show, all intervals created with quantile regressions based on the NNQF have a similar average width and are narrower than those of the benchmark. Moreover, the benchmark appears to perform better in terms of QISand QMIRDthan the NNQF-based polynomials; especially consid-ering the fact that the variance of the NNQF-based polynomials’ QMIRD
values seems to be relatively large. Nevertheless, the best results in terms of QISand QMIRDare generally obtained by the intervals stemming from the ANN quantile regressions. This supports once again, the possibility of
5 Application
Figure 5.8: Results obtained by interval forecasts on the KIT dataset.
The intervals are formed using the trained quantile regres-sions and Equation (2.5); QIW: average interval width;
QIS: average interval score; QMIRD: average modified interval reliability deviation
training accurate probabilistic forecasts using the combination of a complex data mining technique and the NNQF.
Now, using the method described in Section 2.2.4 and the trained NNQF-based quantile regressions, parametric distribution (CDF) forecasts can be calculated. Figure 5.9 depicts the results obtained by the parametric CDF forecasts and their non-parametric counterparts.
Poly1
Figure 5.9: Results obtained by parametric and non-parametric dis-tribution forecasts on the KIT dataset. The forecasts are obtained with the trained quantile regressions and with the methods described in Sections 2.2.3 and 2.2.4; Black: Non-parametric;Red:Parametric; QPL: average pinball-loss;
QMRD: average modified reliability deviation
5.2 Application on Real Data
As Figure 5.9 shows, both types of forecasts obtained almost indistin-guishable QPL and QMRD values. Therefore, the results support that the present thesis approach is able to obtain parametric distribution forecasts that retain the accuracy of the non-parametric forecasts they are based on.
Finally, Figure 5.10 depicts the results of scenario forecasts11created with the present thesis method (cf. Section 2.2.5), as well as those stemming from the NNQF-based quantile regressions (i.e. the non-parametric probabilistic forecasts) presented previously. Just as in Chapter 2, the scenario forecasts are set to estimate everyday the next 24 hours.
Poly1
Figure 5.10: Results obtained by (Poly3) scenario forecasts and quan-tile regressions used for comparison on the KIT dataset.
The scenario forecasts are obtained using Poly3 quantile regressions and the method described in Section 2.2.5;
QPL: average pinball-loss; QMRD: average modified reliability deviation
As expected, the scenario forecasts deliver the worse results. For instance, their resulting QPL values surpass the 2% mark and have a greater vari-ance than those of the NNQF-based quantile regressions, while their QMRD
values show not only a great amount of variance, but also a median value greater than 15%. For the previous reasons, methods for improving the qual-ity of the scenario forecasts should be investigated in future related works.
11Notice that the scenario forecasts are formed by 50 scenarios based on Poly3 NNQF-based quantile regression created with the input vector described in Section 5.2.3 and a forecast horizon of H = 1. Additionally, the values ySmaxand ySmin(cf. Equation (2.20)) are set equal to one and zero, respectively.
5 Application
Moreover, ways of speeding up the creation of the scenario forecasts have also to be researched further. This is of relevance considering that obtaining the scenario forecast for the present experiment takes around 4.8 [h], while simply applying the regressions takes less than a minute.
Please note, that interested readers are referred to Appendix C.5.2 for more information about the results presented in the current section. Ad-ditionally, the features selected as most relevant for each technique are also shown in Appendix C.5.2. The selected features show that the time series containing maximal values in Equation (5.1) is the additional time series that is preferred by all techniques.