This section gives a brief introduction to various volatility estimators included in the models employed in this chapter. First, for day t denote the intra-day high, low and closing prices as Ht, Lt and Ct. The daily log return is then:
rt = log(Ct)−log(Ct−1) (2.1)
Assuming the mean return is zero, as standard, a constant daily return variance can be estimated by:
V = 1 n n X t=1 r2t (2.2)
Based on the distribution of range derived by Feller(1951), Parkinson (1980) pro- posed the high-low intra-day range (squared), with scaling factor 4log(2) as an approximately unbiased variance estimator:
Ra2t = (logHt−logLt)
2
4 log 2 (2.3)
Through theoretical derivation and a simulation study, Parkinson showed that this is a more efficient estimator than the traditional squared return. Garman-Klass
(1980), Rogers and Satchell (1991) and Yang and Zhang (2000) derived other range based estimators; a full study and comparison on the properties of different volatility estimators is presented in Moln´ar (2012).
Gerlach and Chen (2015) incorporated overnight price movements into this mea- sure, defining range plus overnight as:
RaOt= (log(max(Ct−1, Ht))−log(min(Ct−1, Lt)))×100 , (2.4)
where Ct−1 is the closing price on day t−1.
Extending into the high frequency intra-day framework, each day t can be di- vided into N equally sized intervals of length 4, each intra-day time subscripted as i = 0,1,2, ..., N. The log closing price at the i-th interval of day t is de- noted Pt−1+i4. Then, the high and low prices during this time interval are
Ht,i = sup(i−1)4<j<i4Pt−1+j and Lt,i = inf(i−1)4<j<i4Pt−1+j respectively. Realized
variance (RV) has proven an efficient volatility estimator and gained popularity in recent years. RV is simply the sum of the N intra-day squared returns, at frequency 4, for day t, i.e.:
RVt4 =
N
X
i=1
[log(Pt−1+i4)−log(Pt−1+(i−1)4)]2 (2.5)
Proposed by Barndorff-Nielsen and Shephard (2002), the realized kernel is a more robust volatility estimator compared to realized variance, especially when the re- turns are contaminated with micro-structure noise.
The Realized Range (RR), proposed by Martens and van Dijk (2007) and Chris- tensen and Podolskij (2007), has the following specification, which simply replaces the intra-day squared returns with intra-day squared ranges, and scales:
RRt4 = PN
i=1(logHt,i−logLt,i)2
4 log 2 (2.6)
Theoretically, the RR may contain more information about volatility, in the same way as the intra-day range contains more information than squared returns: it uses all the price movements in a time period to form the high and low price, not just the price at each end of each time period. Results in Martens and van Dijk (2007) lend support to this hypothesis. Only when N → ∞, the scaling factor 4 log 2 makes the RR as an unbiased volatility estimator.
Of course, both RV and RR have been criticized as being subject to micro-structure noise bias and inefficiency, more so than daily returns or daily ranges. This issue has been studied extensively, see Rogers and Satchell (1991), Barndorff-Nielsen
et al. (2004) and Christensen and Podolskij (2007) for discussion. In response, Martens and van Dijk (2007) presented a scaling process, as in Equations (2.7) and (2.8). ScRVt4 = Pq l=1RVt−l Pq l=1RV 4 t−l RVt4, (2.7) ScRR4t = Pq l=1RRt−l Pq l=1RR 4 t−l RR4t , (2.8)
whereRVt−1 and RRt−1 represent the daily return square and range square at day t−1. This scaling process is motivated by the fact that the daily return and range are less affected by micro-structure noise and thus can be used to help reduce bias. Recently jumps in returns have also attracted attention in the analysis of high-frequency data (Andersen, Bollerslev and Diebold, 2007), but are not tackled in this thesis.
Further, Zhang, Mykland and A¨ıt-Sahalia (2005) proposed a sub-sampled process to further smooth out micro-structure noise. For day t, N equally sized samples are grouped into M non-overlapping subsets X(m) with size N/M = nk, which
means: X = M [ m=1 X(m), whereX(k)∩X(l) =∅, whenk6=l.
Then sub-sampling will be implemented on the subsets X(i) with n
k interval:
X(i)=i, i+nk, ..., i+nk(M−2), i+nk(M −1), wherei= 0,1,2..., nk−1.
Representing the log closing price at the i-th interval of day t as Ct,i = Pt−1+i4,
the RV with the subsets Xi is:
RVt,i = M X m=1 (Ct,i+nkm−Ct,i+nk(m−1)) 2; wherei= 0,1,2..., n k−1.
We have the T /M RV with T /N sub-sampling for day t as (supposing there are
T minutes per trading day):
SSRVt,T /M,T /N4 =
Pnk−1
i=0 RVt,i
nk
, (2.9)
Then, denoting the high and low prices during the intervali+nk(m−1) andi+nkm
asHt,i = sup(i+nk(m−1))4<j<(i+nkm)4Pt−1+j andLt,i = inf(i+nk(m−1))4<j<(i+nkm)4Pt−1+j
respectively, we propose the T /M RR with T /N sub-sampling as:
RRt,i = M
X
m=1
(Ht,i−Lt,i)2; wherei= 0,1,2..., nk−1. (2.10)
SSRR4t,T /M,T /N =
Pnk−1
i=0 RRt,i
4log2nk
, (2.11)
For example, the 5 mins RV and RR with 1 min sub-sampling for any day can be calculated, respectively, as below :
RV5,1,0 = (logCt5−logCt0)2+ (logCt10−logCt5)2+... RV5,1,1 = (logCt6−logCt1)2+ (logCt11−logCt6)2+...
.. .
RV5,1,4 = (logCt9−logCt4)2+ (logCt14−logCt9)2+... SSRV54,1 =
P4
i=0RV5,1,i
5
RR5,1,0 = (logHt0<t<t5−logLt0<t<t5)2+ (logHt5<t<t10−logLt5<t<t10)2+... RR5,1,1 = (logHt1<t<t6−logLt1<t<t6)2+ (logHt6<t<t11−logLt6<t<t11)2+...
.. .
RR5,1,4 = (logHt4<t<t9−logLt4<t<t9)2+ (logHt9<t<t14−logLt9<t<t14)2+... SSRR45,1 =
P4
i=0RR5,1,i
4 log(2)5
Only intra-day returns on the 5 minute frequency, additionally with 1 minute sub-sampling when employed, are considered in this thesis work.