3.2 The Minimal Supersymmetric Standard Model
3.2.2 Experimental searches of MSSM particles and constraints
The use of smoothing approximations can lead us to reconsider the definition of outliers in the light of data regularization. In fact, we can now assess the consistency of the data by comparing the raw data with their smoothed version and consider as outliers the samples for which the difference between raw and smoothed values exceeds a certain threshold. So we are again considering the definition of outliers in terms of a derived quantity, as in Figure 3.9, which in this case is represented by the residuals defined as the difference between the original and the smoothed data. Figure 3.16 shows a new definition of outlier, represented by data whose difference with the smoothed data is outside the 10th–90th percentile bracket. Of course, this definition is heavily p-dependent as the number of outliers will tend to increase with the amount of smoothing. In this example, p = .99 was selected, so that the smoothing approximation (thick dashed line) is still reasonably close to the data. If a lower p is selected, then the number of outliers will increase considerably.
Because we are dealing with residuals, their characteristics may be regarded as a further check for the choice of p. In fact, we have seen in Chapter 2 the importance of checking the residual autocorrelation and normality, and we can now use these characteristics as further guidance in the selection of the smoothing parameters. Figure 3.17 shows the autocorrelation of the smoothed DO data of Figure 3.14 smoothed with p = .7. A lower p would have caused a proliferation of outliers, whereas, on the other hand, a 1-lag autocorrelation is acceptable in view of moderate smoothing.
3.3.4 Time-series synThesis
Separating the components of a time series may assist in the synthesis of more series, to enhance the input library of model simulation. As previously noted, the two main constituents of a time series are the deterministic pattern and the stochastic disturbance. We are now in a position to shape or alter each element and produce more synthetic time series by recombining them as shown in Figure 3.18. In this procedure, the deterministic part may be viewed as a kind of ‘average’
behaviour determined by the primary environmental agents, which can be predicted on the basis of a sufficiently comprehensive historical data set describing the typical behaviours to be expected
0.0 1.0 2.0 3.0
Jan Apr Jul Oct Jan
Jan Apr Jul Oct Jan
Jan Apr Jul Oct Jan
0.0 0.5 1.0 1.5
NH4+ (mg
L−1) BOD5 (mg L−1 )
NO3− (mg
L−1)PO43− (mg
L−1) TSS (mg L−1 )
0.00.20.40.60.8
DataSpline smoothing
DataSpline smoothing
DataSpline smoothing
Jan Apr Jul Oct Jan
Jan Apr Jul Oct Jan
0 5 10 15
0 50 100 150 200
DataSpline smoothing
DataSpline smoothing
FIGURE 3.15 Data enhancement via smoothing splines to construct sufficient data for feeding a wetland water quality model. (Reprinted with permission from E. Giusti et al. 2011b, Water Science & Technology, 63, 2061–2070, IWA Publishing.)
0 6 12 18 24 0
200 400 600 800
Time (h)
Original data Smoothed data
0 6 12 18 24
−200
−100 0 100 200 300
Time (h) Residuals (m3/h)
−200 −100 0 100 200 300
0.0 0.2 0.4 0.6 0.8 1.0
Data range
Cum. distr.
Residuals Outliers Flow (m3/h)
(a)
(b) (c)
FIGURE 3.16 Outliers detection using a smoothing spline with p = .99. (a) compares the original and smoothed data. (b) plots their differences (residuals), while (c) shows their cumulative distribution, defining as outliers the data whose residuals lie outside the 10th–90th percentile bracket. These graphs were obtained with the MATLAB script Ex _ Spline _ Smooth _ Outliers.m in the ESA _ Matlab\Exercises\
Chapter _ 3\Time _ Series folder.
0.5
0.0
−0.5
50 100 150 200 250 300 350
DO residuals (mg/L)
Time (h) Outlier threshold
Non-zero autocorrelation for 1 lag 0.50
1.00
0.00
1 3 5 7 9 11 13 15 17 19
Lag (a)
(b)
FIGURE 3.17 Residual autocorrelation analysis of the DO data of Figure 3.14 for p = .7. (a) shows the time plot of the residuals with the outliers limits set by the cumulative distribution, while (b) shows their autocor-relation. The non-zero 1-lag autocorrelation is acceptable vis-à-vis the limited number of resulting outliers.
A smaller p would have produced a much larger number of outliers. The Lilliefors test suggests rejecting the normality hypothesis for the residuals distribution.
in that context. Conversely, the stochastic part is the random component, influenced by secondary environmental agents, and by its very nature it is not predictable from the past observations. Yet, this part too is generally context dependent in terms of autocorrelation and probability density function.
In defining the characteristics of the deterministic module of the time series, several aspects must be considered. The time series normally contains circadian periodicities due to the cyclic vari-ability of the environmental variables on a daily/weekly/monthly/yearly basis. Examples of this are the daily periodicity of wastewater flow and organic load, both linked to the daily cycle of human activities, or the daily periodicity of DO in rivers and lakes due to the solar radiation, which stimu-lates photosynthesis. Other ‘typical’ periodic behaviours in domestic wastewater may be due to differing weekdays and weekend habits, whereas in natural waters, the DO daily patterns strongly differ in summer and winter. So it is important to define a criterion to decide under which circum-stances a behaviour is considered representative, its period and the typical conditions under which it occurs. By observing a large number of similar behaviours, the typical features of this pattern can be isolated and will form the basis for the deterministic component of the synthetic time series.
As an example, Figure 3.19 (Marsili-Libelli, 2004) shows two sets of observed daily data from which representative patters have been extracted. In (a), 30 daily patterns of DO in the Orbetello lagoon were recorded during April 2001. The strong increase around noon is due to the intense algal activity, as will be better explained in Chapter 8, and the typical hourly behaviour (thick line) is obtained as the mean of the observations. In (b), five typical loading patterns were observed at the
Deterministic part
Stochastic part
Synthesized time-series
FIGURE 3.18 Time-series synthesis by combining the deterministic and stochastic components.
4 6 8 10 12 14 16
DO (mg/L)
0 6 12 18 24
Time of day (a)
150 200 250 300 350 400
Flow (m3/h)
0 6 12 18 24
Time of day
1 2
3
4 5
(b)
FIGURE 3.19 Typical patterns isolation in daily recurring behaviours. (a) shows the circadian evolution of dissolved oxygen in the Orbetello lagoon, whereas (b) shows five typical daily loading profiles of a wastewater treatment plants. (Reproduced with permission from Marsili-Libelli, S., Ecol. Model., 174, 67, 2004.)
input to a small wastewater treatment plant treating domestic reject water. The five observed typical loading patterns have the following explanations:
1. Low load (summer)
2. Medium load with wide daily variability 3. High load with large daily variability 4. Low load with moderate daily variability 5. Medium load with average daily variability
While the extraction of the daily patterns is fairly simple, the identification of the stochastic com-ponent is more complex and hinges on the ability to identify the noise-generating stochastic process in terms of autocorrelation and probability density function. Figure 3.20 shows the four steps in the synthesis of environmental time series. First, the deterministic pattern is isolated. There may be a collection of possible patterns, such as the ones in Figure 3.20b for the daily wastewater loading profiles. The residuals (observation pattern) are then analysed for autocorrelation (Figure 3.20c) and distribution (Figure 3.20d). Then, more random data series can be generated with these characteris-tics and summed back to the pattern to obtain a synthetic time series. The resulting time series may look like the one shown in Figure 3.21.