2 ESTUDIO DE MERCADO
2.3 CARACTERIZACIÓN DEL PRODUCTO
underlying dynamics of the volatility. In order to obtain the best estimation of the realized volatility, we must estimate the parameter which characterizes this dynamics. Two possible approaches to obtain the optimal value of the these estimators are:
• using the least square problem which consists to minimize the following objec-
tive function: n
X
i=1
R2ti+T − T ˆσ2ti
2
• or using the maximum likehood problem which consists to maximize the log- likehood objective function:
−n 2ln 2π− n X i=0 1 2ln T ˆσ 2 t − n X i=0 R2ti+T 2T ˆσ2 ti
We remark here that the moving-average estimator depends only on the averaging window whereas the IGARCH estimator depends only on the parameter β. In gen- eral, there is no way to compare these two estimators if we do not use a specific dynamics. By this way, the optimal values of both parameters are obtained by the optimal value of ξ and that offers a direct comparison between the quality of these two estimators.
Example of realized volatility
We illustrate here how the realized volatility is computed by the two methods dis- cussed above. In order to illustrate how the optimal value of the averaging window nT or β? are calibrated, we plot the likehood functions of these two estimator for one value of volatility at a given date. In Figure 2.20, we present the logarithm of likehood functions for different value of ξ. The maximal value of the function l(ξ) gives us the optimal value ξ? which will be used to evaluate the volatility for the two methods. We remark that the IGARCH estimator is better to estimate the global maximum because its logarithm likehood is a concave function. For the the moving-average method, its logarithm likehood function is not smooth and presents complicated structure with local maximums which is less efficient for the optimization procedure.
Figure 2.20: Comparison between IGARCH estimator and CC estimator
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1670 1675 1680 1685 1690 1695 1700 1705 1710 1715 1720 ξ l( ξ ) CC optimal IGARCH
We now test the implementation of IGARCH estimators for various high-low estimators. As we have demonstrated that the IGARCH estimator is equivalent to
exponential moving-average, then the implementation for high-low estimators can be set up in the same way as the case of close-to-close estimator. In order to determine the optimal parameter β?, we perform an optimization scheme on the logarithm like-
hood function. In Figure2.21, we present the comparison of the logarithm likehood function between different estimators in function of the parameter β. The optimal parameter β? of each estimator corresponds to the maximum of the logarithm like-
hood function. In order to have a clear idea about the corresponding size of the Figure 2.21: Likehood function of high-low estimators versus filtered parameter β
0.7 0.75 0.8 0.85 0.9 0.95 1 1440 1445 1450 1455 1460 1465 1470 1475 1480 1485 1490 β l( β ) CC, OC, P, GK, RS, YZ
moving-average window to the optimal parameter β?, we use the formula (2.7) to
effectuate the conversion. The result is reported in the Figure2.22below. Backtest on the voltarget strategy
We take historical data of S&P 500 index over the period since 01/2001 to 12/2011 and the averaging window of the close-to-close estimator is chosen as n = 25. In Figure2.23, we show the different estimations of realized volatility.
In order to test the efficiency of these realized estimators (moving-average and IGARCH), we first evaluate the likehood function for the close-to-close estimator and realized estimators then apply these estimators for the voltarget strategy as performed in the last section. In Figure 2.25, we present the value of likehood function over the period from 01/2001 to 12/2010 for three estimators: CC, CC optimal (moving-average) and IGARCH. The estimator corresponding to the highest value of the likehood function is the one that gives the best prediction of the volatility.
Figure 2.22: Likehood function of high-low estimators versus effective moving window 0 10 20 30 40 50 60 70 80 90 1440 1445 1450 1455 1460 1465 1470 1475 1480 1485 n l( n ) CC, OC, P, GK, RS, YZ
Figure 2.23: IGARCH estimator versus moving-average estimator for close-to-close prices 01/2001 01/2003 01/2005 01/2007 01/2009 01/2011 20 40 60 80 100 σ (% ) CC CC optimal IGARCH
Figure 2.24: Comparison between different IGARCH estimators for high-low prices 01/2001 01/2003 01/2005 01/2007 01/2009 01/2011 10 20 30 40 50 60 70 80 90 σ (% ) CC, CO, P, GK, RS, YZ
Figure 2.25: Daily estimation of the likehood function for various close-to-close esti- mators 01/2001 01/2003 01/2005 01/2007 01/2009 01/20011 1300 1400 1500 1600 1700 1800 1900 l( ˆσ ) CC CC optimal CC IGARCH
Figure 2.26: Daily estimation of the likehood function for various high-low estimators 01/2001 01/2003 01/2005 01/2007 01/2009 01/2011 1200 1300 1400 1500 1600 1700 1800 1900 l( ˆσ ) CC, OC, P, GK, RS, YZ
In Figure 2.27, the result of the backtest on voltarget strategy is performed for the three considered estimators. The estimators which dynamical choice of averaging parameters always give better result than a simple close-to-close estimator with fixed averaging window n = 25. We next backtest on the IGARCH estimator applied on the high-low price data, the comparison with IGARCH applied on close-to-close data is shown in Figure 2.28. We observe that the IGARCH estimator for close-to-close price is one of the estimators which produce the best backtest.
2.4 High-frequency volatility estimators
We have discussed in the previous sections how to measure the daily volatility based on the range of the observed prices. If more information is available in the trading data like having all the real-time quotation, can one estimate more accurately the volatility? As far as the trading frequency increases, we expect that the precision of estimator get better as well. However, when the trading frequency reaches certain limit, new phenomenon coming from the non-equilibrium of the market emerges and spoils the precision. This limit defines the optimal frequency for the classical estimator. In the literature, it is more and less agree to be at the frequency of one trade every 5 minutes. This phenomenon is called the micro-structure noise which are characterized by the bid-ask spread or the transaction effect. In this section, we will summarize and test some recent proposals which attempt to eliminate the micro-structure noise.
Figure 2.27: Backtest for close-to-close estimator and realized estimators 01/2001 01/2003 01/2005 01/2007 01/2009 01/2011 0.6 0.8 1 1.2 1.4 S&P 500 CC CC optimal CC IGARCH
Figure 2.28: Backtest for IGARCH high-low estimators comparing to IGARCH close- to-close estimator 01/2001 01/2003 01/2005 01/2007 01/2009 01/2011 0.6 0.8 1 1.2 1.4 S&P 500, CC, OC, P, GK, RS, YZ
2.4.1 Microstructure effect
It has been demonstrated in the financial literature that the realized return estimator is not robust when the sampling frequency is too high. Two possible explanations of this effect the following. In the probabilistic point of view, this phenomenon comes from the fact that the cumulated return (or the logarithm of price) is not a semimartingal as we assumed in the last section. However, it emerges only in the short time scale when the trading frequency is high enough. In the financial point of view, this effect is explained by the existence of the so-called market microstructure noises. These noises come from the existence of the bid-ask spread. We now discuss the simplest model which includes the mircrostruture noise as an independent noise to the underlying Brownian motion. We assume that the true cumulated return is an unobservable process and follows a Brownian motion:
dXt= µt− σt2 2 dt + σtdBt
The observed signal Ytis the cumulated return which is perturbed by the microstruc-
ture noise t:
Yt= Xt+ t
For the sake of symplicity, we use the following assumptions: (i) tiis iid with E [ti] = 0and E
2ti=E2 (ii) t⊥⊥ Bt
From these assumptions, we see immediately that the volatility estimator based on historical data Yti is biased:
var(Y ) = var(X) + E2
The first term var(X) is scaled as t (estimation horizon) and E2 is constant, this estimator can be considered as unbiased if the time horizon is large enough (t > E2/σ2). At high frequency, the second term is not negligible and better
estimator must be able to eliminate this term. 2.4.2 Two time-scale volatility estimator
Using different time scales to extract the true volatility of the hidden price process (without noise) is both independently proposed by Zhang et al. (2005) and Bandi et al. (2004). In this paragraph, we employ the approach in the first reference to define the intra-day volatility estimator. We prefer here discussing the main idea of this method and its practical implementation rather than all the detail of stochastic calculus concerning the expectation value and the variance of the realized return3.
Definitions and notations
In order to fix the notations, let us consider a time-period [0, T ] which is divided in to M −1 intervals (M can be understood as the frequency). The quadratic variation of the Bronian motion over this period is denoted:
hX, XiT =
Z T 0
σt2dt
For the discretized version of the quadratic variation, we employ the [., .] notation:
[X, X]T = X
ti,ti+1∈[0,T ]
Xti+1− Xti
2
Then the habitual estimator of realized return over the interval [0, T ] is given by: [Y, Y ]T = X
ti,ti+1∈[0,T ]
Yti+1− Yti
2
We remark that the number of points in the interval [0, T ] can be changed. In fact, the expectation value of the quadratic variation should not depend on the distribution of points in this interval. Let us define the ensemble of points in one period as a grid G:
G = {t0, . . . , tM}
Then a subgrid H is defined as:
H = {tk1, . . . , tkm}
where (tkj) with j = 1, . . . m is a subsequence of (ti)with i = 1, . . . M. The number
of increments is denoted as:
|H| = card (H) − 1
With these notations, the quadratic variation over a subgrid H reads: [Y, Y ]HT = X
tki,tki+1∈H
Ytki+1 − Ytki
2
The realized volatility estimator over the full grid
If we compute the quadratic variation over the full grid G which means that at highest frequency. As discussed above, it is not surprising that it will suffer the most effect of the microstructure noise:
[Y, Y ]GT = [X, X]GT + 2 [X, ]GT + 2 [, ]GT
Under the hypothesis of the microstructure noise, the conditional expectation value of this estimator is equal to:
Eh[Y, Y ]GT
and the variation of the estimator:
var[Y, Y ]GTX= 4ME4+8 [X, X]GTE2− 2var 2+ O(n−1/2) In these two expressions above, the sums are arranged order by order. In the limit M → ∞, we obtain the habitual result of central limit theorem:
M−1/2[Y, Y ]GT − 2ME2−→ 2 EL 41/2N (0, 1) Hence, as M increases, [Y, Y ]G
T becomes a good estimator of the microstructure noise
and we denote: [ E [2] = 1 2M [Y, Y ] G T
The central limit theorem for this estimator states:
M1/2E [[2]− E2−→ EL 41/2N (0, 1) as M → ∞
The realized volatility estimator over subgrid
As we mentioned in the last discussion, increasing the frequency will spoil the esti- mation of the volatility due to the presence of the microstructure noise. The naive solution is to reduce the number of point in the grid or to consider only a subgrid, then one can take the average over a number choice of subgrids. Let us consider a subgrid H with |H| = m−1, then the same result as for the full grid can be obtained in replacing M by m:
Eh[Y, Y ]HT Xi= [X, X]HT + 2mE2
Let us now consider a sequence of subgrids H(k) with k = 1 . . . K which satisfies
G =SKk=1H(k) and H(k)∩ H(l) =∅ with k 6= l. By averaging over these K subgrid,
we obtain the result:
Eh[Y, Y ]avgT Xi= 1 K K X k=1 [Y, Y ]HT(k)
We define the average length of the subgrid m = (1/K)PK
k=1mk, then the final
expression is:
Eh[Y, Y ]avgT Xi= [X, X]avgT + 2mE2
This estimator of volatility is still biased and the precision depends strongly on the choice of the length of subgrid and the number of subgrids. In the paper of Zhang et al., the authors have demonstrated that there exists an optimal value K? for which we can reach the best performance of estimator.
Two time-scale estimator
As the full-grid averaging estimator and the subgrid averaging estimator both contain the same component coming from the microstructure noise to a factor, we can employ both estimators to have a new one where the microstructure noise can be completely eliminated. Let us consider the following estimator:
ˆ σts2 = 1− m M −1 [Y, Y ]avgT − m M [Y, Y ] G T
This estimator now is an unbiased estimator with its precision determined by the choice of K and m. In the theoretical framework, this optimal value is given as a function of the noise variance and the forth moment of the volatility. In practice, we employ a scan over the number of the subgrid of size m ∝ M/K in order to look for the optimal estimator.
2.4.3 Numerical implementation and backtesting
We now backtest the proposed technique on the S&P 500 index with the choice of the sub grid as following. The full grid is defined by the ensemble of data every minute from the opening to the close of trading days (9h to 17h30). Data is taken since the 1st February 2011 to the 6th June 2011. We denote the full grid for each trading day period:
G = {t0, . . . , tM}
and the subgrid is chosen as following:
H(k)={tk−1, tk−1+K. . . , tk−1+nkK}
where the indice k = 1, . . . , K and nkis the integer making tk−1+nkKthe last element
in H(k). As we can not compute exactly the value of the optimal value K? for each
trading period, we employ an iterative scheme which tends to converge to the optimal value. Analytical expression of K? is given by Zhang et al.:
K? = 12 E 22 TEη2 !1/3 M2/3 where η is given by the expression:
η2= Z T
0
σt4dt
In the first approximation, we consider the case where the intraday volatility is constant then the expression of η cans be simplified to η2 = T σ4. In Figure 2.29, we
present the result of the intraday volatility which takes into account only the trading day for the S&P 500 index under the assumption of constant volatility. The two- time scale estimator reduces the effect of microstructure noise effect on the realized volatility computed over the full grid.
Figure 2.29: Two-time scale estimator of intraday volatility 02/110 03/11 04/11 05/11 06/11 5 10 15 20 25 30 35 σ (% )
Volatility with full grid Volatility with subgrid Volatility with two scales
2.5 Conclusion
Voltarget strategies are efficient ways to control the risk for building trading strate- gies. Hence, a good estimator of the volatility is essential from this perspective. In this paper, we show that we can use the data rang to improve the forecasting of the volatility of the market. The use of high and low prices is less important for the index as it gives more and less the same result with traditional close-to-close estimator. However, for independent stock with higher volatility level, the high-low estimators improves the prediction of volatility. We consider several backtests on the S&P 500 index and obtain competing results with respect to the traditional moving-average estimator of volatility.
Indeed, we consider a simple stochastic volatility model which permit to integrate the dynamics of the volatility in the estimator. An optimization scheme via the maximum likehood algorithm allows us to obtain dynamically the optimal averaging window. We also compare these results for rang-based estimator with the well- known IGARCH model. The comparison between the optimal value of the likehood functions for various estimators gives us also a ranking of estimation error.
Finally, we studied the high frequency volatility estimator which is a very active topic of financial mathematics. Using simple model proposed by Zhang et al, (2005), we show that the microstructure noise can be eliminated by the two time scale estimator.
[1] Bandi F. M. and Russell J. R. (2006), Saperating Microstructure Noise from Volatility Journal of Financial Economics, 79, pp. 655-692.
[2] Drost F. C. and Nijman T. E. (1993), Temporal Aggregation of GARCH Processes Econometrica, 61, pp. 909-927.
[3] Drost F. C. and Werker J. M. (1999), Closing the GARCH gap: Continuous time GARCH modeling Journal of Econometrics, 74, pp. 31-57 .
[4] Feller W. (1951), The Asymptotic Distribution of the Range of Sums of Inde- pendent Random Variables, Annals of Mathematical Statistics, 22, pp. 427-432. [5] Garman M. B. and Klass M. J. (1980), On the estimation of security price
from historical data, Journal of Business, 53, pp. 67-78.
[6] Kunimoto N. (1992), Improving the Parkinson method of estimating security price volatilities, Journal of Business, 65, pp. 295-302.
[7] Parkinson M. (1980), The extreme value method for estimating the variance of the rate of return, Journal of Business, 53, pp. 61-65.
[8] Rogers L. C. G. and Satchell S. E. (1991), Estimating variance form high, low and closing prices, Annals of Applied Probability 1, pp. 504-512.
[9] Yang D. and Zhang Q. (2000), Drift-Independent Volatility Estimation Based on High, Low, Open and Close Prices, Journal of Business, 73, pp. 477-491. [10] Zhang L., Mykland P. A. and Ait-Sahalia Y. (2005), A Tale of Two Time
Scales: Determining Integrated Volatility With Noisy High-Frequency Data Journal of the American Statistical Association, 100(472), pp. 1394-1411.
Support Vector Machine in
Finance
In this chapter, we review in the well-known machine learning technique so-called support vector machine (SVM). This technique can be employed in different contexts such as classification, regression or density estimation according to Vapnik [1998]. Within this paper, we would like first to give an overview on this method and its numerical variation implementation, then bridge it to financial applications such as the stock selection.
Keywords:Machine learning, Statistical learning, Support vector machine, regres- sion, classification, stock selection.
3.1 Introduction
Support vector machine is an important part of the Statistical Learning Theory. It was first introduced in the mid-90 by Boser et al., (1992) and contributes important applications for various domains such as pattern recognition (for example: handwrit- ten, digit, image), bioinformatic e.t.c. This technique can be employed in different contexts such as classification, regression or density estimation according to Vapnik [1998]. Recently, different applications in the financial field have been developed via two main directions. The first one employs SVM as non-linear estimator in order to forecast the market tendency or volatility. In this context, SVM is used as a re- gression technique with feasible possibility for extension to non-linear case thank to the kernel approach. The second direction consists of using SVM as a classification technique which aims to elaborate the stock selection in the trading strategy (for ex- ample long/short strategy). In this paper, we review the support vector machine and its application in finance in both points of view. The literature of this recent field is quite diversified and divergent with many approaches and different techniques. We would like first to give an overview on the SVM from its basic construction to all extensions including the multi classification problem. We next present different numerical implementations, then bridge them to financial applications.
This paper is organized as following. In Section 2, we remind the framework of the support vector machine theory based on the approach proposed in O.Chapelle (2002). We next work out various implementations of this technique from both both primal and dual problems in Section 3. The extension of SVM to the case of multi classification is discussed in Section 4. We finish with the introduction of SVM in the financial domain via an example of stock selection in Sections 5 and 6.