Variations and errors arising from non-stationarity are another common source of uncertainty. For instance, when collecting observations at a constant sample rate – say, the daily closing prices of a stock –, the volume of trades in any given day can drastically vary and, thus, valuable daily trend information will not be reflected within the collected data. In fact, given a collection of samples taken between equally spaced intervals, there are infinitely many continuous functions of time that would output exactly those same values (see Fig. 2.1). This means that all measures of return and risk in the stock market will be taken over incomplete information, since one cannot know if what happened between any two given sampling intervals carries relevant information. As a consequence, an estimative of risk taken from historical data will inevitably suffer from lack of reliability after some time and, thus, such estimatives must be periodically recalculated upon arrival of new data points.
The field of Stochastic Processes (SP) (or random processes) has evolved in statistics to deal with such scenarios. Formally, a SP is a collection of random variables X(T ) = {Xt, t ∈ T } (T
is a set of indices that can be either discrete or continuous). If T is continuous, then, sampling from a given SP model is equivalent to sampling from a subspace of continuous functions. Such sample is known as an realization of the SP. In other words, each realization {xt, t ∈ T } 2Apparently, Robert [172] was referring to an equivalence between (a) least-square estimators of the slope of
linear regression models under only a simple assumption about the expected value of the independent variables; and (b) maximum likelihood estimators of the slope under a normality assumption.
2.1. Sources of Uncertainty 31
(a) A GP model with 4 samples.
(b) A GP model with 6 samples.
Figure 2.1: Gaussian Process models estimated over (a) four and (b) six samples. The horizontal axes depict time. The blue lines are the mean processes; whereas the red shadowed areas depict 95% confidence intervals computed from the variance processes.
sampled from the SP may represent one hypothesis for fitting a given finite set of observations. For instance, a Gaussian Process (GP) [170] is a SP for which each Xt∼ N (mt, σ2t). Figure 2.1
illustrates how the uncertainty is modeled in GP models: (a) with an realization of only four samples, the total variance of the GP is higher; whereas (b) with two additional samples, the uncertainty about the behavior of the temporal measurement process is reduced.
Roughly speaking, assuming T to be a set of positive integers, a strict stationary process is one whose joint probability distribution of a subset X(t1 : tl) of X(T ) is the same as of
X(t1 + h : tl+ h), for all l, h. A weaker notion of stationarity relaxes the joint distribution
assumption and, instead, states that the mean and variance of the process is what remain constant upon a shift in time. Conversely, (weak) non-stationary processes are those for which the joint distribution (or the parameters) changes with a shift in time. Time-varying quantities such as asset prices can be modeled as non-stationary processes. The most simple model is the so called random walk :
Pt= Pt−1+ t, (2.2)
which describes how the distribution of prices, Pt, depends on the previous price plus a time-
varying error distribution, t ∼ N (mt, σt2), modeling volatility. No matter what model one
takes to represent a non-stationary process, the data analysis involved requires clever techniques to keep track of the varying statistics associated with the collected realizations. In section 2.3, two such techniques will be described: (1) the Kalman Filter [100], for normally distributed data following linear temporal dynamics (such as in the random walk model); and (2) an approach for estimating time-varying probabilities modeled as Dirichlet distributions. Both of them are
integrated in the problem-solving tools devised in this thesis. Modeling Changes with the Sliding Window Approach
Intelligent systems for real-time monitoring and diagnosis can be of great practical interest in a complex dynamic environment. A system capable of continuously tracking time-varying un- certain statistics can pro-actively take exploratory actions for investigating potential problems, which can be achieved by using active learning techniques [148]. In addition, such systems can recommend promising trade-off actions or decisions for mitigating eventual bottlenecks and problems that might be detected.
In many practical scenarios, tasks such as the monitoring, analysis, and prediction of time- varying phenomena rely on the detection of changes in order to suggest adjustments so the predictive models can remain consistent with the environment evolving dynamics. In the fol- lowing, the problem of change detection in realizations from stochastic processes is roughly described (we follow the notation of Dasu [62]). Let {x1, x2, · · · } be a data stream describing
some nonstationary process. For instance, xt can be taken as vectors in Rn representing the
observed consumption rates of a product inventory. The underlying probability distributions can be estimated using sliding windows, denoted as Wt1,tl = (xt1, ..., xtl). Then, for two empiri-
cal distributions, say, Ft1:tl and Ft1+h:tl+h, estimated from two adjacent windows (realizations)
Wt1:tl and Wt1+h:tl+h, the distance dt(Ft1:tl, Ft1+h:tl+h) can be computed and used to test if a
statistically significant change has occurred. The null hypothesis in this case is [62]
H0 : Ft1+h:tl+h = Ft1:tl. (2.3)
Note that, depending on the choice of h (particularly, if h < l), some samples in Wt1,tl will also
appear in Wt1+h,tl+h. The distance function dtcan be taken as any statistics that represents the
discrepancy between two empirical distributions, such as the Kullback-Leibler divergence [62]. Besides change detection, some classes of change patterns can be diagnosed by analyzing data streams. For instance, coagulation and dissolution regions in the state space can be detected from the analysis of temporal and spatial profiles of the velocities with which the empirical distributions modeling the phenomena are changing. This can be done from the estimation of the gradient vector of the adjacent realizations for each spatial coordinate [3]. Specifically, non-parametric Gaussian Kernel-based estimators are useful for this task because they are able to represent multimodality.
By integrating the density of spatial velocity, one can estimate a global change rate that can be useful to measure the evolution degree of a stochastic process. With this approach, it is also possible to find a set of minimal evolving projections that can be valuable for multivariate trend analysis [3].