5 MARCO CONCEPTUAL
5.3 ENFOQUE CONSTRUCTIVISTA DE LA ENSEÑANZA DE LAS MATEMÁTICAS Y LA
Accuracy of model prediction
The best model was selected using the nearby sites to minimize external impacts for a contrasting analysis. Huesca et al. (2014) pointed out that remote sensing time series data were usually associated with highly significant seasonal changes of ground features, thus the multiplicative seasonal autoregressive integrated moving average (SARIMA) model should be a better choice than the ARIMA model. A detailed description of the SARIMA model can be found in Box and Jenkins (1970) and the model’s equation is provided in the Supplementary material. In this study, performances of ARIMA models and SARIMA models are evaluated for the two nearby sites. Both fitting accuracy and predicting accuracy are considered for selecting the appropriate model. The evaluation results of the first eight models with high performance using AERONET and MODIS data are provided in Figure S2-5 and Figure S2-6.
In general, an identical optimal non-seasonal ARIMA model is applied for both MODIS and AERONET at GSFC site after comparing various models through trial and error. The suitable non-seasonal ARIMA models for MDSC site are also the same for both MODIS and AERONET despite that the fitting and predicting accuracies at MDSC site are relatively lower than those at GSFC site. Interestingly, for both GSFC and MDSC sites, the best-fitted SARIMA model performs
35
relatively better for MODIS than the non-seasonal ARIMA model (discussed in Section 2.4.4), while the predicting accuracy of the best fitted SARIMA is not as good as that of the non-seasonal ARIMA model for AERONET. Unlike GSFC and MDSC sites, the pair of Beijing and Xianghe sites share the same AOD grid in the remotely sensed image. Thus, their optimal ARIMA models for MODIS are the same. Compared with non-seasonal ARIMA, the use of SARIMA model for MODIS data has significantly improved the fitting accuracy by 10.3% but reduced the predicting accuracy by 10.2% in terms of RMSE. On the other hand, the accuracy of the non-seasonal ARIMA model is higher for Xianghe AERONET data than that for Beijing AERONET data. Despite the fact that SARIMA has reduced BIC when applied to these two sites, little improvements could be achieved generally.
Overall, more accurate models can be obtained from the time series data that have more distinct seasonal patterns (Jiang et al. 2010). The desirable model may be the one with a combined influence on data quality, reliability, and low-AOD values.
Uncertainties in the model performance
As aforementioned, the quality of the AOD data plays a crucial role in the performance of the ARIMA model. The differences in data quality presumably result from the AOD retrieval algorithms, the geographical features, the climatological characteristics, the land surface conditions, and the heterogeneity with regard to aerosol types (Ichoku, Remer, and Eck 2005, Sayer et al. 2014). On the other hand, possible errors could also exist for ground-measured data, and they may be caused by instrument defects or operational failures. On top of that, ground observations measured at high elevations are not representative of regional aerosol variations (Ichoku, Remer, and Eck 2005). The high quality of MODIS AOD data at GSFC and MDSC sites is largely attributed to the good retrieval algorithm. These two sites are located in eastern U.S.,
36
which is characterized by vegetation, suburban, or urban surfaces over a fairly flat, low-lying area. Consequently, the DT algorithm applied in this region tends to exhibit small bias. It is expected that the ARIMA model can be applicable in the regions where MODIS observations are well- correlated with AERONET measurements. However, the retrieved aerosol product is less accurate in western U.S. (e.g., Boulder) than in eastern U.S., due to the complex aerosol formation mechanisms complicated by emissions transported from Asia, widespread bright surfaces, and smoke from wildfires (Yu et al. 2008). Therefore, it is more rational to predict future AOD values using AERONET AOD. In contrast to the Boulder site, the sampling limitation of AERONET AOD at CKU site renders the forecast to rely on MODIS AOD. The heterogeneous urban features and mixed aerosol types at Beijing and Xianghe sites make it hard to predict accurately AOD values in a long term, as indicated by the growing deviations with time between the predicted and actual AOD values (see Figure 2-7). But it is feasible to use the ARIMA model for predictions during short-term periods (e.g., 12 months).
Since the ARIMA model is sensitive to extreme AOD values, possible contributors to the abnormal values of AOD are analyzed. For example, Hsu et al. (2012) pointed out that the periodic, abrupt, fluctuations in AOD time series might result from natural phenomena such as large-scale climatic conditions associated with El Niño Southern Oscillation (ENSO) events. In order to examine how inter-annual variability of AOD at the six sites might be potentially influenced by ENSO, the Multivariate ENSO Index (MEI; Wolter and Timlin (1998), accessible at
https://www.esrl.noaa.gov/psd/enso/mei/) was adopted from Jan. 2003 to Dec. 2015. The time series of deseasonalized monthly AOD anomalies and MEI during the study period are illustrated in Figure S2-7. The peaks of AOD abnormal values are consistent with the high ranked El Niño events but with certain time lags, implying that the ENSO year could result in abnormal high/low
37
values in AOD. A further quantitative study on the correlation between monthly AOD anomaly and MEI, and correlation between monthly AE anomaly and MEI is conducted. Considering the whole time span, there is indeed no significant correlation between monthly AOD anomaly and MEI. This result is consistent with the finding of Li, Carlson, and Lacis (2011). The correlation coefficients between monthly AE anomaly and MEI at a significant level of 95% are -0.202, - 0.196, and 0.227 for GSFC, MDSC, and Boulder sites, respectively. No significant correlation was found between monthly AE anomaly and MEI for the other three sites. Furthermore, a time lag analysis between the correlation of MEI and AOD/AE anomaly is performed and provided in Figure S2-8. It is observed that the correlation between MEI and aerosol properties (e.g, AOD and AE) is not so representative, probably because a 13-year period of aerosol records is still relatively short for understanding large-scale climatic influence of phenomenon like ENSO (Kärner 2002). A longer historical timescale would be better for the prediction of AOD and the analysis of its relationships with large scale climatic elements.
Sensitivity of the model performance
The missing aerosol data can be one of the major concerns for the use of the AOD data in application studies, such as epidemiology or climate change. Therefore, the impact of the missing AOD data on the performance of ARIMA model was examined. The GSFC site was selected for simulation due to its data completeness during the study period. 2% to 50% of AOD data were randomly removed from both MODIS and AERONET time series, and the analysis results are shown in Figure 2-8. For AERONET AOD data, the fitting accuracy, as represented by R2, and
Adjusted R2, RMSE_fit, MAE_fit, and MAPE_fit indices (in solid lines) first remains stable and
then slightly increases, for around 26% of missing data. However, the predicting accuracy, represented by RMSE_pred, MAE_pred, and MAPE_pred indices (in dashed lines) decreases at
38
this point. A similar behavior is found for MODIS AOD, with around 22% of missing data, indicating a trade-off between fitting accuracy and predicting accuracy. The BIC, however, shows a steadily positive trend as the amount of missing data increases. The precision of the model is not significant when the proportion of missing values exceeded 50% for AERONET AOD and 40% for MODIS AOD. Interestingly, a consistent pattern of metrics is observed for MODIS AOD, suggesting some intrinsic noises in the AOD time series. Nevertheless, this phenomenon is not evident for AERONET AOD. In addition, the fluctuation of MODIS time series is relatively greater than that of AERONET, suggesting that MODIS AOD is more sensitive to the missing data. In general, the performance of the ARIMA model is less vulnerable to the missing AOD values (with a certain percentage range, e.g., 20%) due to the small amplitude caused by growing volume of missing values. However, the model may be subject to a data convergence problem when the number of missing observations increases.
39
Figure 2-8. Analysis of the relationships between the performance of the ARIMA model and the
percentage of the missing AERONET AOD data (a), and the missing MODIS AOD data (b). The fitting accuracy of the model is represented by solid lines while the predicting accuracy is represented by dashed lines.