• No se han encontrado resultados

Capítulo III. Presentación y Análisis de los resultados.

FRASE NUN

There can be several variations of multi-step ahead forecasts, depending on exactly how many steps ahead is forecast. To limit the time needed to run all the eval- uations, the two-step ahead forecasts were done for all scenarios and the three- to seven-step ahead forecasts only for the DC (daily, country-level) scenario. The two-step ahead results are discussed in detail per individual scenario, the results for scenarios with more steps ahead are shortly discussed at the end of this section.

M2WC Scenario results

Table 4.8 shows the results for the M2WC scenario. ARIMA is the best performing technique by far, both in terms of the percentage of items for which it is the ‘best’ technique (68.5%) and in terms of performance consistency. It achieves an average 15.6% improvement in RelRMSE compared to always using NAIVE. The next-best techniques in terms of performance consistency were ES and LINREG, with respec- tively a 5.9% and 4.6% improvement compared to NAIVE. Always automatically selecting the best technique for each individual item results in a 17.7% improvement compared to always using NAIVE and an additional 2.1% improvement compared to always using ARIMA.

CHAPTER 4. PERFORMANCE COMPARISON RESULTS 38

M2DC Scenario results

Table 4.9 shows the results for the M2DC scenario. ARIMA again is the best model for the largest number of items (86.2%). However, it is the second-worst technique in terms of performance consistency since it achieves a 5.2% improvement in average RelRMSE compared to NAIVE. The techniques with best performance consistency were ES, MA and LSTM, which respectively resulted in a 22.3%, 22.2% and 14.9% average RelRMSE improvement compared to always using NAIVE. Automatic se- lection fo the best technique for each item resulted in a 32.4% improvement in RelRMSE compared to always using NAIVE and 10.1% compared to always using ES.

M2WS Scenario results

Table 4.10 shows the results for the M2WS scenario. ARIMA again performs best for most items (49.5%), followed by ES (15.8%) and SVR (12.9%). ARIMA also has the best performance consistency and achieves a 16% average RelRMSE im- provement compared to the NAIVE forecasts. The next-best techniques in terms of performance consistency were ES and LINREG, with respectively a 12.6% and 5.7% improvement compared to NAIVE. Automatic best technique selection results in a 20.4% improvement compared to NAIVE, so that is an additional 4.4% compared to ARIMA.

M2DS Scenario results

Table 4.11 shows the results for the M2DS scenario. ARIMA performs best for most items (52.2%), followed by LSTM (16.4%) and ES (13.2%). ES has the best performance consistency and results in a 24.6% improvement compared to always using NAIVE. Closely behind ES in terms of performance consistency are ARIMA and LSTM, with respectively a 23.9% and 22.9% improvement compared to NAIVE. Automatically selecting the best DF technique for each item results in a 27.9% improvement compared to always using NAIVE, which is 3.3% more compared to always using ES.

Discussion of results across two-step ahead scenarios

Tables 4.12 and 4.13 give an overview of the results for the two-step ahead scenarios. They respectively contain each scenario’s top 3 for the best technique in terms of average RelRMSE (performance consistency) and in terms of the percentage of items for which it was the best technique.

When looking at the performance of DF techniques across scenarios, it can be seen that ARIMA performed quite well overall. It is best for the highest percentage of items in all four scenarios. ARIMA is also in the average RelRMSE top 3 for three out of four scenarios (first place for M2WC and M2WS and second place for M2DS). It can be concluded that ARIMA has its lowest performance consistency for scenarios with a daily time detail level.

The technique with best performance consistency overall is ES, as it is in the average RelRMSE top 3 for all scenarios with one first place in scenario M2DS and second places in the other three scenarios.

Although LSTM does not perform well in the M2WC and M2WS scenarios, it does perform well for the M2DC scenario, where it is in the average RelRMSE top

CHAPTER 4. PERFORMANCE COMPARISON RESULTS 39

Table 4.8: M2WC results (RelRMSE) for 909 items

avg std min max %best

NAIVE 1.000 0.000 1.000 1.000 2.1 MA 1.198 0.702 0.439 10.363 1.3 ES 0.941 0.080 0.688 1.407 6.5 LINREG 0.954 0.147 0.711 2.647 4.1 ADA 0.976 0.199 0.713 3.168 5.1 ARIMA 0.844 0.120 0.411 1.867 68.5 SVR 1.238 0.573 0.454 5.658 8.6 MLP 0.971 0.188 0.711 3.809 1.3 LSTM 1.159 0.337 0.510 3.197 2.5 BEST 0.823 0.096 0.411 1.000 -

Table 4.9: M2DC results (RelRMSE) for 986 items

avg std min max %best

NAIVE 1.000 0.000 1.000 1.000 0.6 MA 0.778 0.102 0.604 1.308 6.7 ES 0.777 0.074 0.606 1.154 3.4 LINREG 0.881 0.157 0.602 2.616 0.3 ADA 0.894 0.196 0.600 2.894 0.2 ARIMA 0.948 5.479 0.377 133.141 86.2 SVR 0.962 0.367 0.609 5.847 0.8 MLP 0.877 0.169 0.573 2.616 0.3 LSTM 0.851 0.190 0.606 3.480 1.4 BEST 0.676 0.090 0.377 1.000 -

Table 4.10: M2WS results (RelRMSE) for 4278 store-items

avg std min max %best

NAIVE 1.000 0.000 1.000 1.000 0.9 MA 1.142 0.548 0.198 9.774 2.0 ES 0.874 0.090 0.649 1.374 15.8 LINREG 0.943 0.487 0.537 28.762 4.0 ADA 0.980 0.730 0.475 30.993 5.7 ARIMA 0.840 0.473 0.183 30.823 49.5 SVR 1.046 0.648 0.119 23.418 12.9 MLP 0.961 0.585 0.539 28.646 3.8 LSTM 1.016 0.442 0.152 16.408 5.5 BEST 0.796 0.077 0.119 1.000 -

Table 4.11: M2DS results (RelRMSE) for 4827 store-items

avg std min max %best

NAIVE 1.000 0.000 1.000 1.000 0.2 MA 0.792 0.141 0.387 6.556 2.2 ES 0.754 0.048 0.645 1.201 13.2 LINREG 0.834 0.539 0.610 32.468 2.8 ADA 0.897 1.165 0.602 70.730 3.8 ARIMA 0.761 0.764 0.391 39.024 52.2 SVR 0.826 0.150 0.613 5.535 7.0 MLP 0.822 0.310 0.610 15.107 2.3 LSTM 0.771 0.120 0.596 4.280 16.4 BEST 0.721 0.053 0.387 1.000 -

Table 4.12: Average RelRMSE top 3 per two-step ahead scenario

M2WC M2DC M2WS M2DS

#1 ARIMA (0.844) MA (0.778) ARIMA (0.840) ES (0.754)

#2 ES (0.941) ES (0.777) ES (0.874) ARIMA (0.761)

#3 LINREG (0.954) LSTM (0.851) LINREG (0.943) LSTM (0.771)

Table 4.13: Best% top 3 per two-step ahead scenario

M2WC M2DC M2WS M2DS

#1 ARIMA (69%) ARIMA (86%) ARIMA (50%) ARIMA (52.2%)

#2 SVR (9%) MA (7%) ES (16%) LSTM (16.0%)

CHAPTER 4. PERFORMANCE COMPARISON RESULTS 40

3. LSTM particularly impresses in the M2DS scenario, where it is in the top 3 for both best% and average RelRMSE. It can be concluded that LSTM achieves good performance for scenarios with a daily time detail level and its best performance when such a scenario has a store location detail level.

The best performance by far is achieved when for each item the best DF tech- nique is chosen automatically. The automatic best model selection for each indi- vidual item results in an improvement that ranges from 17.7% (M2WC) to 32.4% (M2DC) compared to NAIVE. The improvement ranges from 2.1% (M2WC) to 10.1% (M2DC) when comparing to the best individual forecasting technique. The added value of the automatic DF technique selection is greatest for the scenarios with a daily time detail level.

Discussion of results for M3DC to M7DC scenarios

The results tables for the M3DC to M7DC scenarios can be found in tables 4.14 to 4.18. Some interesting trends can be identified from these results. As the forecast is created for more steps ahead, ARIMA is less frequently the best DF technique. For the M2DC scenario, ARIMA was still the best for 86.2% of items, which gradually declines to 40.8% of items in the M7DC scenario. For the three- to six-step ahead scenarios, MA is best for more items, from 9.2% in the M3DC scenario to 25.5% in the M6DC scenario. MA does not perform so well in the M7DC scenario, where the more advanced forecasting techniques perform relatively well. LINREG, ADA and MLP show great performance improvements compared to the fewer steps ahead scenarios, which could for example be because these models can extract more complex patterns in weekly sales differences. The top 3 best DF techniques in terms of lowest average RelRMSE scores remains the same for M3DC to M6DC (#1 MA, #2 ES, #3 LSTM), but has changed for M7DC (#1 LINREG, #2 MLP, #3 ES). The implications of these trends are that depending on how many steps ahead is forecast, it differs which DF techniques perform best, so that the retailer should make a different selection of techniques to implement. Again, in all forecasting scenarios, by far the best performance can be achieved by automatically selecting the best forecasting technique for each individual item, which results in a 12.5% RelRMSE improvement in the M7DC scenario to 35.9% in the M5DC scenario compared to always using NAIVE. Compared to always using the best performing individual DF technique, automatic selection results in a 6.6% RelRMSE improvement in the M7DC scenario to 9.9% in the M3DC scenario.

Documento similar