B. INFORMACIÓN CUALITATIVA Y CUANTITATIVA DE RIESGOS
3. Información sobre el riesgo de mercado de la cartera de negociación
The common basic structure of surrogate model algorithms has already been described in Section 1.2.1. Recall that the algorithms are iterative. In the first step an initial experimental design is generated, and the costly objective function is evaluated at these points. Secondly, the chosen response surface is fit to the given data, and in the third step the next sample site is chosen according to some strategy that is based on objective function
CHAPTER 3. SO-M-C AND SO-M-S 61
value predictions by the response surface. The costly objective function is evaluated at the chosen point, and in step four, the response surface parameters are updated with the new data. The algorithm iterates through steps three and four until a predefined stopping criterion has been reached. Surrogate model algorithms differ in general with respect to the method of generating the initial experimental design, the surrogate model, and the strategy for choosing the next sample site.
Two well-known and widely used surrogate model algorithms, namely EGO [74] and Gutmann’s radial basis function algorithm [57], are briefly reviewed in the following sections. Also, a recap of the mixture surrogate model algorithm SO-M [98] that uses Dempster-Shafer theory for computing the weights of the models in the mixture is provided.
3.2.1
Efficient Global Optimization (EGO)
The initial experimental design in the EGO algorithm [74] is created by generating a Latin hypercube design in k dimensions, so that all one- and two-dimensional projections are nearly uniformly covered. Initially, with slight deviations, n0 = 10k points are generated.
A kriging model (see Section 1.2.5) is used as response surface [32, 92]. The advantage of the kriging model is that an uncertainty estimate is computed when making predictions. These uncertainty estimates are then used by the EGO algorithm for determining the next sample site. A disadvantage of the kriging model is the computation of the model parameters. The number of parameters depends on the problem dimension, and kriging is known to be plagued by the curse of dimensionality, not just in computer memory and time but also in ill-conditioning.
EGO determines the next sample point by maximizing the expected improve- ment
E [I(x)] = E [max(fmin− Y, 0)] , (3.1)
where fminis the best function value found so far, and Y is a random variable. The closed form solution can be shown to be
E [I(x)] = (fmin− sk(x)) Φ fmin− sk(x) ζ + ζφ fmin− sk(x) ζ , (3.2) for ζ > 0, where φ(·) and Φ(·) are the standard normal density and distribution function, respectively, and ζ is the root mean squared error of
the prediction at the point x ∈ Ω obtained from the kriging model. The expected improvement function is however not unimodal, and to find the global maximum, a global optimization routine must be used. The advantage is that the computationally expensive objective function is not involved when maximizing the expected improvement. Finding the global optimum of this auxiliary optimization problem remains, however, a challenging task. Jones et al. [74] solved the subproblem of maximizing the expected improve- ment with a branch and bound algorithm. The chosen stopping criterion for the EGO algorithm was a maximal expected improvement of less than 1%. EGO has been tested in [74] on problems of at most six dimensions, whereas an 11-dimensional problem has been solved with EGO in [64].
3.2.2
Gutmann’s Radial Basis Function Algorithm
Gutmann’s algorithm [57] uses an RBF interpolant (see equation (1.9)). The next sample site is determined by minimizing a “bumpiness” measure
gn(x) = (−1)m0+1μ
n(x) [fn∗− sb(x)]2, (3.3)
where m0 depends on the chosen radial basis function and is either -1, 0, or 1, fn∗ is a target value for the objective function in iteration n, sb(·) is the RBF interpolant as defined in equation (1.9), and μn(x) is the coefficient corresponding to x of the Lagrangian function L that satisfies L(xι) = 0, ι = 1, . . . , n, and L(x) = 1 in iteration n. Gutmann’s algorithm has been applied to test problems of at most ten dimensions in [57], and the author of this paper remarked that optimizing the auxiliary function becomes com- putationally more complex as the number of iterations increases. Another disadvantage is that the algorithm may have difficulties finding very steep minima (as in the case of the Shekel test function [39], for example), and sometimes converges slowly to the global minimum because it does not search locally [120].
3.2.3
SO-M: Mixture Surrogate Model Algorithm
Based on Dempster-Shafer Theory
Recall that the mixture surrogate model algorithm SO-M [98] described in Section 2.2 uses n0 = k + 2 initial sample points generated by a Latin
hypercube design. Model characteristics such as correlation coefficients and various error measures are computed by leave-one-out cross-validation, and
CHAPTER 3. SO-M-C AND SO-M-S 63
represent how well each individual model fits the data. Dempster-Shafer theory (DST) uses this information to compute the weights wr of the models in the mixture in equation (2.2). The next sample site is chosen by solving an auxiliary optimization problem that uses a target value strategy similar to that proposed by Holmstr¨om et al. [64]. The optimization of the auxiliary problem is done by multistart accelerated random search with some restrictions on the search space that force the algorithm to switch between local and global search phases. The mixture model is updated in every iteration, and thus the models contributing to the mixture may change. The algorithm has been used to solve problems of up to six dimensions.
The advantage of using a mixture model is that if it is a priori unknown which surrogate model performs best for a given problem, the influence of “bad” models can be restricted and the influence of “good” models can be emphasized. The disadvantage of the mixture model approach is that finding the next sample point by optimizing an auxiliary function becomes more complex as the number of variables increases, and finding very steep minima (as in the case of the Shekel function) may be difficult. Furthermore, the leave-one-out cross-validation becomes computationally expensive as the number of sample sites increases, and if a kriging model is involved, the same drawbacks as in the EGO algorithm are encountered.