As was mentioned earlier in chapter 2, the Monte Carlo technique has been applied quite often in relation with singular spectrum analysis. This section will be used to provide a more
detailed discussion of the various implementations of MC-SSA in the work done by other researchers.
The first work that mentioned using Monte-Carlo simulations to generate confidence limits for the discriminating statistic was Barnard (1963), with further developments made by Hope (1968), Besag and Diggle (1977), Hall and Titterington (1989), Noreen (1989), Fisher and Hall (1990) and Tsay (1992).
Theiler et al. (1992) did not initially use Monte Carlo analysis as such, but provided a very useful overview on the method of surrogate data to use as a test for nonlinearity. He presented a comprehensive discussion on the various levels of hypothesis testing, different test statistics that can be used and the available methods by which to generate the surrogate data sets.
This work was then extended in (Theiler and Prichard, 1996) where the use of the Monte- Carlo method for hypothesis testing was investigated. They pointed out that the questions typically asked about data sets in relations to hypothesis testing are:
• Is the distribution of the data non-Gaussian? • Is the mean of the data significantly nonzero? • Are there any temporal correlations?
• Is there any nonlinear structure in the temporal correlations? • It the time series chaotic?
The null hypothesis is formulated to accept an answer of ‘no’. This is also the default answer by lack of any contrary proof. A discriminating test statistic is used and this test statistic is then evaluated to determine if it falls within the bounds that would be expected if the null hypothesis were true. As was explained earlier, Monte-Carlo analysis is based on the principle of calculating values for the discriminating test statistic for a great many realizations of the null hypothesis. This collection of estimates is then used to determine the boundaries for the test statistic.
When addressing the questions of whether a time series is non-Gaussian and whether there are any temporal correlations, Theiler and Prichard (1996) suggest two techniques by which the Monte Carlo realizations can be generated. They named these approaches typical and constrained realizations. They also suggested typical hypotheses to use when testing for different properties in the data. For instance, when trying to determine whether the time series is nonlinear, the hypothesis should be that the data had arisen from a linear stochastic
process. For this hypothesis, two different approaches can be used for the surrogate data. The first method to generate the surrogate data is to fit a linear model to the original data series and to then use different realizations of Gaussian white noise for the residual terms and thereby reconstruct the surrogate data from the linear model. The linear model approach could use either an autoregressive moving average (ARMA), a purely autoregressive (AR) or purely moving average model to simulate the linear stochastic processes.
The second approach to obtain the surrogate data would be to take a Fourier Transform of the data, randomise the phases and then invert the transform again. Both techniques have advantages and disadvantages and one should consider the practical trade-offs when deciding which technique to use. In the terms of the two techniques mentioned earlier, the ARMA method would be a typical realization-approach and the Fourier Transform would generate constrained realizations. It should be noted that the sample Fourier spectrum obtained from Fourier Transform of the original data is a poor estimator of the underlying frequency spectrum. However, as long as the spectrum is not the main focus of the
calculation, this would not necessarily present a flaw in the Fourier Transform based method of calculating the surrogate data. The biggest advantage of ARMA is in fitting the model, where as Fourier Transform is more useful for fitting the data. Theiler and Prichard (1992) mentioned that if one wanted to calculate error bars or confidence limits rather than test the null hypothesis for a certain test statistic, the Fourier Transform method would be very undesirable to determine the surrogate sets and the ARMA method should rather be used. It is important to distinguish between the different problems of the estimation of confidence intervals and testing the null hypothesis (Theiler and Prichard, 1996). For the estimation of confidence intervals, a statistic of some intrinsic value, such as the mean or the fractal dimension, is calculated for the data and certain ‘error bars’ for the calculated value is
specified. These confidence limits enclose, within certain probability limits, the actual mean of the true underlying distribution. However, when the null hypothesis is tested, it is done for a carefully specified hypothesis and the aim is to determine whether the data are actually consistent with this hypothesis.
Allen and Smith (1996) used Monte-Carlo SSA to detect irregular oscillations in the presence of coloured noise. They identified the need for a statistical tool by which discrimination could be made between possible oscillation signals and other signals present in the time series. The null hypothesis used was that of the data being coloured noise and the basic formalism of SSA provided a natural test for modulated oscillations against this hypothesis. Even though the presence of coloured noise will be discussed in significantly more detail in a later section (section 4.7), it would perhaps be appropriate at this stage to just explain the term. The name ‘coloured’ or ‘red’ noise is a popular term with no mathematical significance at all, that has been assigned to noise series with certain characteristics. This term is related to noise series or phenomena of which the power spectral densities are proportional to 1/fβ (Addison, 1997, Aldrich, 2002).
Monte Carlo SSA was tested for three different types of artificial data. The first situation was where the power spectral characteristics of the noise were known prior to the analysis, in the second situation it was tested whether the data consisted of only white or coloured noise and lastly a composite hypothetical noise model was tested by which it was assumed that some deterministic components were found in the data and the aim was to determine if the remainder of the components were noise.
According to Allen and Smith (1996), there can be two different approaches when Monte Carlo hypothesis testing is applied to the analysis of nonlinear systems. One approach is that the null hypothesis should be well understood and the other is that the null hypothesis should be physically interesting. For example, if it is known beforehand, due to the physical situation, that a system could not appear to be white noise, one does not gain any new information from rejecting the white noise null hypothesis. However, this situation is not so simple for first order autoregressive processes or so-called red or coloured noise. The output from many systems, both in the engineering industry and in many other research areas, is often indistinguishable from purely red-noise systems. The complexity of the test procedure therefore depends on how much prior knowledge about the properties of the noise is available.
The application of Monte Carlo SSA was further extended by Palus and Novotna (1998) in that it was also used to evaluate and test the regularity of dynamics, in addition to the normal test performed by inspecting the eigenvalues. This was done by evaluating the SSA modes against the coloured noise null hypothesis. This approach resulted in enhanced test sensitivity and reliability in detecting the relevant modes.