• No se han encontrado resultados

8. Principales Socios Comerciales

8.1. Países De ASEAN

ods

In this section, we consider unconstrained nonlinear programs of the form min f (x)

where x = (x1, . . . , xn) and f is a nondifferentiable convex function. Opti-

mality conditions based on the gradient are not available since the gradient is not defined in this case. However, the notion of gradient can be general- ized as follows. A subgradient of f at point x∗ is a vector s∗ = (s∗1, . . . , s∗n) such that

s∗(x− x∗)≤ f(x) − f(x∗) for every x.

When the function f is differentiable, the subgradient is identical to the gradient. When f is not differentiable at point x, there are typically many subgradients at x. For example, consider the convex function of one variable

f (x) = max{1 − x, x − 1} = |x − 1|.

As is evident from Figure 5.6 this function is nondifferentiable at the point x = 1 and it is easy to verify that any vector s such that −1 ≤ s ≤ 1 is a subgradient of f at point x = 1. Some of these subgradients and the linear approximations defined by them are shown in Figure 5.6. Note that each subgradient of the function at a point defines a linear “tangent” to the function that stays always below the plot of the function–this is the defining property of subgradients.

Consider a nondifferentiable convex function f . The point x∗ is a min- imum of f if and only if f has a zero subgradient at x∗. In the above

example, 0 is a subgradient of f at point x∗= 1 and therefore this is where the minimum of f is achieved.

−1 −0.5 0 0.5 1 1.5 2 2.5 3 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 f(x)=|x−1| x f(x)

A simple nonsmooth function and subgradients

s=−2/3 s=1/2

s=1/2 s=0

Figure 5.6: Subgradients provide under-estimating approximations to func- tions

The method of steepest descent can be extended to nondifferentiable convex functions by computing any subgradient direction and using the op- posite direction to make the next step. Although subgradient directions are not always directions of ascent, one can nevertheless guarantee convergence to the optimum point by choosing the step size appropriately.

A generic subgradient method can be stated as follows. 1. Initialization: Start from any point x0. Set i = 0.

2. Iteration i: Compute a subgradient si of f at point xi. If si is 0 or close to 0, stop. Otherwise, let xi+1= xi− αisi, where αi > 0 denotes a

step size, and perform the next iteration.

Several choices of the step size αi have been proposed in the literature.

To guarantee convergence to the optimum, the step size αi needs to be de-

creased very slowly (for example αi → 0 such that Piαi = +∞ will do).

But the slow decrease in αiresults in slow convergence of xi to the optimum.

In practice, in order to get fast convergence, the following choice is popular: start from α0 = 2 and then half the step size if no improvement in the objec-

tive value f (xi) is observed for k consecutive iterations (k = 7 or 8 is often

used). This choice is well suited when one wants to get close to the optimum quickly and when finding the exact optimum is not important (this is the case in integer programming applications where subgradient optimization is used to obtain quick bounds in branch-and-bound algorithms). With this in mind, a stopping criterion that is frequently used in practice is a maximum number of iterations (say 200) instead of “si is 0 or close to 0”.

We will see in Chapter 12 how subgradient optimization is used in a model to construct an index fund.

NLP Models: Volatility

Estimation

Volatility is a term used to describe how much the security prices, market indices, interest rates, etc. move up and down around their mean. It is mea- sured by the standard deviation of the random variable that represents the financial quantity we are interested in. Most investors prefer low volatility to high volatility and therefore expect to be rewarded with higher long-term returns for holding higher volatility securities.

Many financial computations require volatility estimates. Mean-variance optimization trades off the expected return and volatility of a portfolio of se- curities. Celebrated option valuation formulas of Black, Scholes, and Merton (BSM) involve the volatility of the underlying security. Risk management revolves around the volatility of the current positions. Therefore, accurate estimation of the volatilities of security returns, interest rates, exchange rates and other financial quantities is crucial to many quantitative techniques in financial analysis and management.

Most volatility estimation techniques can be classified as either a his- torical or an implied method. One either uses historical time series to infer patterns and estimates the volatility using a statistical technique, or con- siders the known prices of related securities such as options that may reveal the market sentiment on the volatility of the security in question. GARCH models exemplify the first approach while the implied volatilities calculated from the BSM formulas are the best known examples of the second approach. Both types of techniques can benefit from the use of optimization formula- tions to obtain more accurate volatility estimates with desirable character- istics such as smoothness. We discuss two examples in the remainder of this chapter.

6.1

Volatility Estimation with GARCH Models

Empirical studies analyzing time series data for returns of securities, interest rates, and exchange rates often reveal a clustering behavior for the volatil- ity of the process under consideration. Namely, these time series exhibit

high volatility periods alternating with low volatility periods. These obser- vations suggest that future volatility can be estimated with some degree of confidence by relying on historical data.

Currently, describing the evolution of such processes by imposing a sta- tionary model on the conditional distribution of returns is one of the most popular approaches in the econometric modeling of financial time series. This approach expresses the conventional wisdom that models for finan- cial returns should adequately represent the nonlinear dynamics that are demonstrated by the sample autocorrelation and cross-correlation functions of these time series. ARCH (autoregressive conditional heteroscedasticity) and GARCH (generalized ARCH) models of Engle [25] and Bollerslev [15] have been popular and successful tools for future volatility estimation. For the multivariate case, rich classes of stationary models that generalize the univariate GARCH models have also been developed; see, for example, the comprehensive survey by Bollerslev et al. [16].

The main mathematical problem to be solved in fitting ARCH and GARCH models to observed data is the determination of the best model parameters that maximize a likelihood function, i.e., an optimization prob- lem. Typically, these models are presented as unconstrained optimization problems with recursive terms. In a recent study, Altay-Salih et al. [1] ar- gue that because of the recursion equations and the stationarity constraints, these models actually fall into the domain of nonconvex, nonlinearly con- strained nonlinear programming. This study shows that using a sophis- ticated nonlinear optimization package (sequential quadratic programming based FILTER method of Fletcher and Leyffer [28] in their case) they are able to significantly improve the log-likelihood functions for multivariate volatility (and correlation) estimation. While this study does not provide a comparison of forecasting effectiveness of the standard approaches to that of the constrained optimization approach, the numerical results suggest that constrained optimization approach provides a better prediction of the ex- tremal behavior of the time series data; see [1]. Here, we briefly review this constrained optimization approach for expository purposes.

We consider a stochastic process Y indexed by natural numbers. Yt, its

value at time t, is an n-dimensional vector of random variables. Autoregres- sive behavior of these random variables is modeled as:

Yt = m

X

i=1

φiYt−i+ εt (6.1)

where m is a positive integer representing the number of periods we look back in our model and εtsatisfies

E[εt|ε1, . . . , εt−1] = 0.

While these models are of limited value, if at all, in the estimation of the actual time series (Yt), they have been shown to provide useful information

for volatility estimation. For this purpose, GARCH models define ht := E[ε2t|ε1, . . . , εt−1]

in the univariate case and

Ht := E[εtεTt|ε1, . . . , εt−1]

in the multivariate case. Then one models the conditional time dependence of these squared residuals in the univariate case as follows:

ht = c + q X i=1 αiε2t−i+ p X j=1 βjht−j. (6.2)

This model is called GARCH(p, q). Note that ARCH models correspond to choosing p = 0.

The generalization of the model (6.2) to the multivariate case can be done in a number of ways. One approach is to use the operator vech to turn the matrices Htand εtεTt into vectors. The operator vech takes an n×n

symmetric matrix as an input and produces an n(n+1)2 -dimensional vector as output by stacking the elements of the matrix on and below the diagonal on top of each other. Using this operator, one can write a multivariate generalization of (6.2) as follows:

vech(Ht) = vech(C) + q

X

i=1

Aivech(εt−iεTt−i) + p

X

j=1

Bjvech(Ht−j).(6.3)

In (6.3), Ai’s and Bj’s are square matrices of dimension n(n+1)2 and C is an

n× n symmetric matrix.

After choosing a superstructure for the GARCH model, that is, after choosing p and q, the objective is to determine the optimal parameters φi,

αi, and βj. Most often, this is achieved via maximum likelihood estimation.

If one assumes a normal distribution for Yt conditional on the historical

observations, the log-likelihood function can be written as follows [1]: −T2 log 2π1 2 T X t=1 log ht− 1 2 T X t=1 ε2t ht , (6.4)

in the univariate case and −T 2 log 2π− 1 2 T X t=1 log det Ht− 1 2 T X t=1 εTtHt−1εt (6.5)

in the multivariate case.

Exercise 6.1 Show that the function in (6.4) is a difference of convex func- tions by showing that log ht is concave and ε

2 t

ht is convex in εt and ht. Does

the same conclusion hold for the function in (6.5)?

Now, the optimization problem to solve in the univariate case is to max- imize the log-likelihood function (6.4) subject to the model constraints (6.1) and (6.2) as well as the condition that ht is nonnegative for all t since

ht = E[ε2t|ε1, . . . , εt−1]. In the multivariate case we maximize (6.5) subject

to the model constraints (6.1) and (6.3) as well as the condition that Htis a

positive semidefinite matrix for all t since Ht defined as E[εtεTt|ε1, . . . , εt−1]

must necessarily satisfy this condition. The positive semidefiniteness of the matrices Ht can either be enforced using the techniques discussed in Chap-

ter 9 or using a reparametrization of the variables via Cholesky-type LDLT decomposition as discussed in [1].

An important issue in GARCH parameter estimation is the stationar- ity properties of the resulting model. There is a continuing debate about whether it is reasonable to assume that the model parameters for financial time series are stationary over time. It is, however, clear that the estimation and forecasting is easier on stationary models. A sufficient condition for the stationarity of the univariate GARCH model above is that αi’s and βj’s as

well as the scalar c are strictly positive and that

q X i=1 αi+ p X j=1 βj < 1, (6.6)

see, for example, [33]. The sufficient condition for the multivariate case is more involved and we refer the reader to [1] for these details.

Especially in the multivariate case, the problem of maximizing the log- likelihood function with respect to the model constraints is a difficult non- linear, non-convex optimization problem. To find a quick solution, more tractable versions of the model (6.3) have been developed where the model is simplified by imposing additional structure on the matrices Aiand Bj such

as diagonality. While the resulting problems are easier to solve, the loss of generality from their simplifying assumptions can be costly. As Altay-Salih et al. demonstrate, using the full power of state-of-the-art constrained opti- mization software, one can solve the more general model in reasonable com- putational time (at least for bivariate and trivariate estimation problems) with much improved log-likelihood values. While the forecasting efficiency of this approach is still to be tested, it is clear that sophisticated nonlinear optimization is emerging as a valuable tool in volatility estimation problems that use historical data.

Exercise 6.2 Consider the model in (6.3) for the bivariate case when q = 1 and p = 0 (i.e., an ARCH(1) model). Explicitly construct the nonlinear pro- gramming problem to be solved in this case. The comparable simplification of the BEKK representation [3] gives

Ht= CTC + ATεt−1εtt−1A.

Compare these two models and comment on the additional degrees of free- dom in the NLP model. Note that the BEKK representation ensures the positive semidefiniteness of Htby construction at the expense of lost degrees

of freedom.

Exercise 6.3 Test the NLP model against the model resulting from the BEKK representation in the previous exercise using daily return data for

two market indices, e.g., S & P 500 and FTSE 100, and an NLP solver. Compare the optimal log-likelihood values achieved by both models and comment.