3.5. Métodos de toma de muestras
3.5.5. Determinación de metales y sus compuestos iónicos
Time series analysis can account for sequential data points which may have some internal structure, such as autocorrelation and seasonal variation (Chatfield, 2004). Time series are observed in many fields, including finance, meteorology, chemical and physical process, geophysics and environmental sciences. A basic assumption in time–series analysis is that successive observations are dependent, that future values may be influenced by past observations, so there is a need to account for the time order of observations in models (Chatfield, 2004).
There are various standard ways of viewing a time series model. A simple approach is a compartmental model of which there are four types, including horizontal (data values
fluctuate around a constant value, if it is a stable process, i.e. stationary), trend (the data appear increasing or decreasing over time), seasonality (data is changed by seasonal factors), cycles (data show wave swings, rises and falls without a fixed period).
14
Moving average and exponential smoothing methods are a cruder form of modelling, in that they are less structured and make fewer distributional assumptions about the error. Box and Jenkins (1976) developed an approach for model selection, parameter estimation and model testing in time series analysis for stationary time series data. It is well known that a class of popular time series models have been developed, such as autoregressive moving average (ARMA) model, autoregressive integrated moving average (ARIMA) model and seasonal autoregressive integrated moving average model (SARIMA), based on the Box-Jenkins approach for analysing stationary univariate time series data for forecasting.
In general, an autoregressive process would be represented by
where {Yt} is a time series. The autoregression parameter φi is a portion of the past rating carried over to the rating at time t, and Yt-i (i=1, 2… p)is an endogenous variable with a function of lags itself (Yaffee and McGee, 2000). et is an error term with zero-mean and variance The above formula shows the current value as a function of its p previous ratings, and is known as an order autoregressive relationship, denoted as AR(p).
Using moving average (MA) processes, Yaffee and McGee (2000) described growth of a time series where its mean centred series follows a random shock et at time t, plus previous random shocks et-q, where q is number of time lags (called the order of the MA model), denoted as MA(q). The moving average model of order q may be written by
15
where θ represents the moving average coefficient, and is the white noise process with mean zero and variance .
The ARMA model includes an autoregressive process and a moving average process. Each value in a time series is expressed as a linear function of the preceding values in the autoregressive process (Yaffee and McGee, 2000). A general ARMA(p, q) model is then given by
where all symbols are similar to both the AR(p) model and MA(q) model.
A nonstationary time series can often be transformed to stationarity. ARIMA (p,d,q) models are the most common models for forecasting a time series, which can be made stationary by transformations such as differencing and logs. Here, I(d) represents the integrated process, where d is the order of nonseasonal differencing (Yaffee and McGee, 2000). The three processes, namely autoregressive process, integrated process and moving average process, are linearly associated with previous data points (Gershenfeld, 1999).
It is necessary to control for seasonality when the series exhibits seasonal patterns (Yaffee and McGee, 2000). A seasonal ARIMA (SARIMA) model has been developed to adjust for seasonality. This model consists of both the nonseasonal components ARIMA(p.d.q) and the seasonal components ARIMA(P,D,Q). A full formulation of a multiplicative seasonal ARIMA is denoted as ARIMA(p,d,q)(P,D,Q)S (Yaffee and McGee, 2000), where s is the
16
order of seasonality. For example, if there are monthly data with seasonality, such that s = 12, it means the seasonal differencing is performed at a lag of 12 months.
The autoregressive conditional heteroskedasticity (ARCH) model (Engle, 1982) has been successfully used to analyse financial time series. Here, the variance (σ) of the error term in this model, Var(et), depends upon the error of past values of yt, if the conditional variance of et is not constant (Engle, 1982). The ARCH model was generalised to the GARCH model by Bollerslev (1986) to accommodate the case in which the variance of the current error term changes over time, that is, the time series exhibits time-varying volatility.
An ARCH(q) process is:
An GARCH(p,q) process is:
Bayesian formulations of these time series models and more general nonstationary models are now relatively common in financial and biostatistical studies (Koop and Potter, 1999;
Pole et al., 1994). However, these models have not been as widely applied in biometrics.
Three main approaches for adjusting for seasonality have been widely applied in air pollution and health time series data (Bell et al., 2008; Galán et al., 2003; Katsouyanni et al., 1996;
Schwartz et al., 1996). Firstly, seasonality may be described by a linear combination of sine-cosine functions of different frequencies (Wei, 1990). For example, ∑ ( )
∑ ( ) is used for the monthly model and ∑ ( ) ∑ ( ) for the
17
weekly model. Secondly, seasonality may be described alternately as a categorical or dummy variable (e.g. spring, summer, autumn, winter, weekend or public holidays). Thirdly, temporal trend and seasonality can be controlled through a smooth function of time by a generalized additive model (GAM) using moving averages, smoothing splines and kernel smoother.
As discussed above, the choice of the order p, q and seasonality is a crucial step in developing these time series models. In general, the Autocorrelation function (ACF) and partial autocorrelation function (PACF) can be used as exploratory techniques to confirm any seasonal patterns in the data and diagnose correlation between the observations for different lags (Yaffee and McGee 2000). An ACF is expressed as
( ) ∑ ( ∑ ̂)( ( ̂) ̂) ( )⁄⁄
( )
where Yt-k is the same series Yt with k lags between them. In an adequate model, the residual autocorrelations should fall within the upper or lower 95% confidence bands around zero (in the plots of ACF and PACF). Figure 2.1 provides an illustration; it shows that the plots of the ACF exhibit a slight seasonal pattern (the cosine pattern) and the value of autocorrelations exceed 95% confidence bands. The Box-Ljung Q statistic can also be used to test the significance of autocorrelations (Yaffee and McGee 2000).
18
Figure 2.1. The ACF plot of the residuals from the weekly linear regression model
The Durbin-Watson statistic also can be used to detect the first order autocorrelation and seasonality. A value near two for this statistic indicates that autocorrelation or seasonality is satisfactorily removed.