• No se han encontrado resultados

Capítulo III: La organización territorial

III. 3.5.2 13 de Junio

III.4. El tejido social

In this section, we compare the performance of our proposed TSDIGAN model with other benchmark traffic data imputation models to find out the efficiency of our proposed model. The benchmark models used in this study for comparison are support vector regression (SVR), history average (HA), denoising stacked autoencoder (DSAE), and GAN based parallel data model (Chen et al., 2019). It should be noted that the benchmark dataset used in our study was also used by Duan et al. (2016) for DSAE model and Chen et al. (2019) for GAN based parallel data model. This enabled us to directly compare the performance of our model with these benchmark models. The detailed default settings for model training and evaluation results for baseline models can be found in Chen et al. (2019). Figure 4.18 shows the average imputation performance accuracies across all VDS in terms of M AE, RM SE, and M RE for all comparison models. It can be seen

that our proposed model outperformed all other benchmark models in terms of M AE and RM SE. An overall improvement of 13.7 % and 16.3 % was observed by TSDIGAN model compared to the next best performing parallel data model. However, in terms of M RE, the proposed TSDIGAN model performed poorly to other benchmark models, except SVR. This can be contributed due to a few sensors for which M RE was found to be significantly higher, as shown in Figure 4.17

and described in Section 4.3.4. Further, while the benchmark models were trained individually for each sensor separately, we trained our models for each cluster or group of sensors which can be attributed to be one of the reason why our model performs poorly for sensors with significant noise or zero volume report compared to other sensors. It can be pointed out that while the mean M RE for TSDIGAN across all sensors for 30% MR was found to be 35.5%, the median M RE was only 20.7%. Also, the mean M RE for 95% of sensors was found to vary between 24.0% - 26.1% in comparison to 35.5% - 39.4% variation, when all sensors are considered. This implies that the mean value shown in Figure4.18was significantly affected by performance on few outlying sensors, which led to its poor performance compared to other benchmark models. In future, more efficient clustering techniques can be used either to remove such sensors from performance analysis or separate models can be trained for such sensors, depending on user specific requirements. Overall, the proposed TSDIGAN model outperformed all benchmark models in terms of M AE and RM SE, while performing reasonably well in terms of M RE too for majority of sensors.

4.4 Conclusion

In this study, we propose a traffic sensor data imputation framework based on generative ad- versarial networks (TSDIGAN) that treats the missing data problem as a data generation problem. Our study demonstrates that the generative model based method can perform accurately and ro- bustly to impute missing traffic data under widely varying missing rates. Our proposed model first embeds traffic time-series data into GASF matrix images preserving the temporal correlations. This enables training of a deep convolutional generative adversarial network that can generate realistic- looking synthetic data for missing data imputation. We have also shown our proposed model’s

Figure 4.18: Comparison of imputation performance accuracies in terms of (a) M AE, (b) RM SE, and (c) M RE with respect to other benchmark imputation models

training process step by step, demonstrating how our model learns to generate its high-quality synthetic data. We have evaluated the performance of the proposed model using benchmark data from PeMS (PeMS, 2014) and further investigated it’s capability for large-scale applications. We compared our proposed model performance with other benchmark models, including support vector regression (SVR), history average (HA), denoising stacked autoencoder (DSAE), and GAN-based parallel data model. Our results show that the proposed model can outperform the benchmark models in terms of M AE and RM SE, while achieving comparable accuracies in terms of M RE for majority of the sensors. Further, our proposed framework groups the sensors into clusters based on the similarity of their daily traffic patterns to learn the generative model which can be applied to the entire cluster. This can help to train fewer cluster-specific models instead of maintaining each sensor specific model, thereby handling the entire training, testing, and real-world application procedure more efficiently.

Our proposed framework can easily and cheaply generate a variety of realistic synthetic traffic data, which makes it a good choice when it is inconvenient or impossible to get sufficient real traffic data. In addition, the characteristics of our proposed framework offer the possibility of extended ITS applications like data analysis enhancement, anomaly detection, etc. In future, this can be integrated with external features such as weather, special events, and other factors that can impact traffic flow patterns to enable our model to provide more adaptive and accurate imputation performance to appropriately reflect different conditions. Further, in this study, we used k-means clustering to group the sensors based on their daily traffic patterns and develop models for each cluster. In future, this study can be extended to evaluate other efficient clustering techniques such as hierarchical clustering, density based clustering and even determining optimal variation of temporal and spatial traffic data characteristics which can be grouped and worked upon as a single cluster. Also, this can be extended to evaluate the suitability and effectiveness of such generative model based deep learning frameworks for traffic speed generation, prediction, and similar other ITS applications.

CHAPTER 5. TECHNICAL AND ECONOMIC FEASIBILITY