COMPETENCIAS DEL CONSEJO ESCOLAR - Trabajo Fin de Máster

The volatility for each intraday time bar tI on day t is dependent on the time of day.

For the first 15 time bars, the volatility is taken to be a forecast of a GARCH(1,1) model which has been fitted on the last 60 returns of the previous day t − 1. The reason for this choice is that the market is very volatile during the opening hour as well as the fact that there will be relatively few data points to utilise when computing the volatility. The rest of the days volatility estimates are computed using the Realized Volatility (RV) method [56]. RV is one of the more popular methods for estimating volatility of high-frequency returns32 _{computed from tick data. The mea-}

sure estimates volatility by summing up intraday squared returns at short intervals (eg. 5 minutes). Andersen et al. [56] propose this estimate for volatility at higher frequencies and derive it by showing that RV is an approximate of quadratic variation under the assumption that log returns are a continuous time stochastic process with zero mean and no jumps. The idea is to show that the RV converges to the continuous time volatility (quadratic variation) [57], which we will now demonstrate.

Assume that the instantaneous returns of observed log stock prices (pt) with unob-

servant latent volatility (σt) scaled continuously through time by a standard Wiener

process (dWt) can be generated by the continuous time martingale [57]

dpt= σtdWt (51)

Then it follows that the conditional variance of the single period returns, rt+1 =

pt+1− ptis given by

σ2_t = Z t+1

σ_s2ds (52)

Eq. (52) is also known as the integrated volatility for the period t to t + 1.

Suppose the sampling frequency of the tick data into regularly spaced time intervals is denoted by f such that between period t − 1 and t there are f continuously compounded returns. Then

rt+1/f = pt+1/f− pt (53)

Hence, we get the Realised Volatility (RV) based on f intraday returns between

32_{Most commonly refers to returns over intervals shorter than one day. This could be minutes, seconds}

6. TESTING FOR STATISTICAL ARBITRAGE

53

periods t + 1 and t as RVt+1= f X i=1 r2_t+i/f (54)

The argument here is that provided we sample at frequent enough time steps (f ), the volatility can be observed theoretically from the sample path of the return process and hence [57,58] lim f →∞ Z t+1 t σ_s2ds − f X i=1 r_t+i/f2 = 0 (55)

which says that the RV of a sequence of returns asymptotically approaches the integrated volatility and hence the RV is a reasonable estimate of current volatility levels.

6 Testing for Statistical Arbitrage

To test the overall trading strategy for statistical arbitrage, we implement a novel statistical test originally proposed by [22] and later modified by [31] by applying it to the overall strategy’s profit and losses PL. The idea is to axiomatically define the conditions under which a statistical arbitrage exists and assume a parametric model for incremental trading profits in order to form a null hypothesis derived from the union of several sub-hypotheses which are formulated to facilitate empirical tests of statistical arbitrage. The modified test, proposed by [31], called the Min-t test, is derived from a set of restrictions imposed on the parameters defined by the statistical arbitrage null hypothesis and is applied to a given trading strategy to test for statistical arbitrage. The Min-t statistic is argued to provide a much more efficient and powerful statistical test compared to the Bonferroni inequality used in [22]. The lack of statistical power is reduced when the number of sub-hypotheses increases and as a result, the Bonferroni approach is unable to reject an incorrect null hypothesis leading to a large Type II error.

To set the scene and introduce the concept of a statistical arbitrage, suppose that in some economy, a stock (portfolio)33 _s

t and a money market account Bt34

are traded. Let the stochastic process (x(t), y(t) : t ≥ 0) represent a zero initial cost trading strategy that trades x(t) units of some portfolio stand y(t) units of the

money market account at a given time t. Denote the cumulative trading profits at time t by Vt. Let the time series of discounted cumulative trading profits generated by

the trading strategy be denoted by ν(t1), ν(t2), . . . , ν(tT) where ν(ti) = V_ti

B_ti for each

i = 1, . . . , T . Denote the increments of the discounted cumulative profits at each time i by ∆νi= ν(ti) − ν(ti−1). Then, a statistical arbitrage is defined as:

Definition 1 (Statistical Arbitrage [22,31]). A statistical arbitrage is a zero-cost, self- financing trading strategy (x(t) : t ≥ 0) with cumulative discounted trading profits ν(t) such that

1. ν(0) = 0

33_{In our study, we will be considering a portfolio} 34

2. lim t→∞E P_{[ν(t)] > 0} 3. lim_t→∞P[ν(t) < 0] = 0 4. lim t→∞V ar[∆ν(t)|∆ν(t) < 0] = 0

In other words, a statistical arbitrage is a trading strategy that 1) has zero initial cost, 2) in the limit has positive expected discounted cumulative profits, 3) in the limit has a probability of loss that converges to zero and 4) variance of negative incremental trading profits (losses) converge to zero in the limit. It is clear that deterministic arbitrage stemming from traditional financial mathematics is in fact a special case of statistical arbitrage [59].

In order to test for statistical arbitrage, assume that the incremental discounted trading profits evolve over time according to the process

∆νi= µiθ+ σiλzi (56)

where i = 1, . . . , T . There are two cases to consider for the innovations: 1) zi i.i.d

N(0,1) normal uncorrelated random variables satisfying z0 = 0 or 2) zi follows an

MA(1) process given by:

zi= i+ φi−1 (57)

in which case the innovations are non-normal and correlated. Here, i are i.i.d.

N(0,1) normal uncorrelated random variables. It is also assumed that and ∆ν0 = 0

and in case of our algorithm νtmin = 0. We will refer the first model (normal uncor-

related innovations) as the unconstrained mean (UM) model and the second model (non-normal and correlated innovations) as the unconstrained mean with correlation (UMC) model. Furthermore, we refer to the corresponding models with θ = 0 as the constrained mean (CM) and constrained mean with correlation (CMC) respectively which assume constant incremental profits over time and hence have an incremental profit process given by:

∆νi= µ + σiλzi (58)

The discounted cumulative trading profits for the UM model at terminal time T discounted back to the initial time which are generated by a trading strategy are given by ν(T ) = T X i=1 ∆νi∼ N µ T X i=1 iθ, σ2 T X i=1 i2λ (59)

From Eq. (59), it is straightforward to show that the log-likelihood function for the discounted incremental trading profits is given by

`(µ, σ2, λ, θ|∆ν) = logL(µ, σ2, λ, θ|∆ν) = −1 2 T X i=1 log(σ2i2λ) − 1 2σ2 T X i=1 1 i2λ(∆νi− µi θ₎2 ₍₆₀₎

6. TESTING FOR STATISTICAL ARBITRAGE

55

The probability of a trading strategy generating a loss after n periods is as follows [31]

Pr{Loss after n periods} = Φ −µ Pn i=1i θ σ(1 + φ)pPn_i=1i2λ ! (61)

where Φ(·) denotes the cumulative standard normal distribution function. For the CM model, Eq. (61) is easily adjusted by setting φ and θ equal to zero. This probability converges to zero at a rate that is faster than exponential.

As mentioned previously, to facilitate empirical tests of statistical arbitrage under Definition1, a set of sub-hypotheses are formulated to impose a set of restrictions on the parameters of the underlying process driving discounted cumulative incremental trading profits and are as follows:

Proposition 1 (UM Model Hypothesis [31]). Under the four axioms defined in Def- inition 1, a trading strategy generates a statistical arbitrage under the UM model if the discounted incremental trading profits satisfy the intersection of the following four sub-hypotheses jointly:

1. H1: µ > 0

2. H2: −λ > 0 or θ − λ > 0

3. H3: θ − λ +1₂ > 0

4. H4: θ + 1 > 0

An intersection of the above sub-hypotheses defines a statistical arbitrage and as by De Morgan’s Laws35_{, the null hypothesis of no statistical arbitrage is defined by}

a union of the sub-hypotheses. Hence, the no statistical arbitrage null hypothesis is the set of sub-hypotheses which are taken to be the complement of each of the sub-hypotheses in Proposition1:

Proposition 2 (UM Model Alternative Hypothesis [22,31]). Under the four axioms defined in Definition 1, a trading strategy does not generate a statistical arbitrage if the discounted incremental trading profits satisfy any one of the following four sub- hypotheses:

1. H1: µ ≤ 0

2. H2: −λ ≤ 0 or θ − λ ≤ 0

3. H3: θ − λ +1₂ ≤ 0

4. H4: θ + 1 ≤ 0

The null hypothesis is not rejected provided that a single sub-hypothesis holds. The Min-t test is then used to test the above null hypothesis of no statistical arbitrage by considering each sub-hypothesis separately using the t-statistics t(ˆµ), t(−ˆλ), t(ˆθ − ˆ

λ), t(ˆθ−ˆλ+0.5), and t(ˆθ+1) where the hats denote the Maximum Likelihood Estimates (MLE) of the parameters. The Min-t statistic is defined as [31]

Min-t = Min{t(ˆµ), t(ˆθ − ˆλ), t(ˆθ − ˆλ + 0.5),

Max[t(−ˆλ), t(ˆθ + 1)]} (62)

35_{This states that the complement of the intersection of sets is the same as the union of their comple-}

The intuition is that the Min-t statistic returns the smallest test statistic which is the sub-hypothesis which is closest to being accepted. The no statistical arbitrage null is then rejected if Min-t > tc where tc depends on the significance level of the

test which we will refer to as α. Since the probability of rejecting cannot exceed the significance level α, we have the following condition for the probability of rejecting the null at the α significance level

Pr{Min-t > tc|µ, λ, θ, σ} ≤ α (63)

What remains is for us to compute the critical value tc. We will implement a

Monte Carlo simulation procedure to compute tc which we describe in more detail in

Section 6.1step5 below.

6.1 Outline of the Statistical Arbitrage Test Proce-

dure

The steps involved in testing for statistical arbitrage are outlined below:

1. Trading increments ∆νi: From the vector of cumulative trading profits and

losses, compute the increments (∆ν1, . . . , ∆νT) where ∆νi= ν(ti) − ν(ti−1).

2. Perform MLE: Compute the likelihood function as given in Eq. (60) and maximise it to find the estimates of the four parameters, namely, ˆµ, ˆσ, ˆθ and ˆλ. The log-likelihood function will obviously be adjusted depending on whether the CM (θ = 0) or UM test is implemented. We will only consider the CM test in this study. Since MATLAB’s built-in constrained optimization algorithm36_{only per-}

forms minimization, we minimize the negative of the log-likelihood function i.e. maximise the log-likelihood.

3. Standard errors: From the estimated parameters in the MLE step above, compute the negative Hessian estimated at the MLE estimates which is indeed the Fisher Information (FI) matrix denoted by I(Θ). In order to compute the Hessian, the analytical partial derivatives are derived from Eq. (60). Standard errors are then taken to be the square roots of the diagonal elements of the inverse of I(Θ) since the inverse of the Fisher information matrix is an asymptotic estimator of the covariance matrix.

4. Min-t statistic: Compute the t-statistics for each of the sub-hypotheses which are given by t(ˆµ), t(−ˆλ), t(ˆθ − ˆλ), t(ˆθ − ˆλ + 0.5), and t(ˆθ + 1) and hence the resulting Min-t statistic given by Eq. (62). Obviously, t(ˆθ − ˆλ), t(ˆθ − ˆλ + 0.5) and t(ˆθ + 1) will not need to be considered for the CM test.

5. Critical values: Compute the critical value at the α significance level using the Monte Carlo procedure (uncorrelated normal errors) and Bootstrapping (correlated non-normal errors)

(a) CM model

First, simulate 5000 different profit process using Eq. (58) with (µ, λ, σ2) = (0, 0, 0.01)37. For each of the 5000 profit processes, perform MLE to get estimated parameters, the associated t-statistics and finally the Min-t statistics. tc is the taken to be the 1-α quantile of the resulting distribution of Min-t

values.

Here we are referring to MATLAB’s fmincon function

37_t

c is maximised when µ and λ are zero. σ2 is set equal to 0.01 to approximate the empirical MLE

7. PROBABILITY OF BACK-TEST OVERFITTING

57

6. P-values: Compute the empirical probability of rejecting the null hypothesis at the α significance level using Eq. (63) by utilising the critical value from the previous step and the simulated Min-t statistics.

7. n-Period Probability of Loss: Compute the probability of loss after n periods for each n = 1, . . . , T and observe the number of trading periods it takes for the probability of loss to converge to zero (or below 5% as in the literature). This is done by computing the MLE estimates for the vector (∆ν1, ∆ν2, . . . ∆νn)

for each given n and substituting these estimates into Eq. (61).

There were various issues when implementing the UM statistical arbitrage test on the overall strategies profits and losses. In the original implementation, R’s optim function with the L-BFGS-B method which allows box for constraints whereby that is each variable can be given a lower and/or upper bound. The only reason that constrained optimisation must be used is due to the fact that the variance must be non-negative. All other parameters are free to vary. It was apparent that the optimisation algorithm was not able to find the maximum (minimum) of the log-likelihood function as the score equations were non-zero. Another major issue was the fact that the inverse FI matrix, required to compute the standard errors of the ML estimates, had negative diagonal elements which lead to complex-valued standard errors. The first trial solution to this problem was to replace the numerical Hessian (negative FI matrix) computed by the optimHess function by the analytically derived Hessian. This did not seem to alleviate the problem and we began looking at other methods to estimate the ML parameters. It was decided that a Markov Chain Monte Carlo method may be best suited for this, as there was a strong possibility that the probability distribution of the underlying process was bimodal. Recoding everything in MATLAB and using the in-built constrained optimisation function fmincon solved the aforementioned issues with regards to the CM test but not the UM test. There were also a variety of issues with the optimization involved in the Monte Carlo simulation used to produce the critical values and hence it was decided that it was sufficient to remain with the CM implementation for our purposes.

7 Probability of Back-test Overfitting

When designing an automated trading system (algorithm), it is always recommended that the system be simulated on historical data in order to test their performance. This is known as a back-test and is a process by which the series of profits and losses that such strategy would have generated had that algorithm been run over that time period is computed [18].

When measuring the performance of a back-tested strategy, there are two different readings: in-sample (IS) performance and out-of-sample (OOS) performance. IS performance is simulated over a sample of data used in the design of the trading strategy which can be referred to as the “training set”. OOS performance is simulated of the sample of data used to test the trading strategy which is also known as the “testing set”. Bailey et al. [18] heavily criticise recent studies which claim to have designed profitable investment or trading strategies since many of these studies are only based on IS statistics without evaluating OOS performance. This may lead to a phenomenon called overfitting which occurs when a trading model targets particular observations rather than a general structure [18]. The authors state that it is relatively simple to overfit a trading strategy so that it has good IS performance, however, for a back-test

to be realistic the IS and OOS performance must be consistent with one another.

Given that it is imperative that we assess whether the proposed strategy is able to generalise well on OOS (unseen) data, a nonparametric methodology is implemented to estimate the extent to which the algorithm is overfitting IS data. This is what will be referred to as the estimating the probability of back-test overfitting (PBO) and the procedure to compute such estimates is called combinatorially symmetric cross- validation (CSCV) [19]. Typically, an investor/researcher will run many (N ) trial back-tests to select the parameter combinations which optimise the performance of the algorithm (usually based on some performance evaluation criterion such as the Sharpe Ratio). The idea is to perform CSCV on the matrix of performance series of length TBL38 for N separate trial simulations of the algorithm.

Here we must be clear that from here on when we refer to IS, we do not mean the “training set” per say, during which the moving average look-back parameters were calculate for example. Rather, we refer to IS as being the subset of observations utilised in selecting the optimal strategy from the N back-test trials.

In the case of the algorithm proposed in this study, since the large set of trialled parameters form the basis of the learning algorithm in the form of the experts, we cannot observe the effect of different parameters settings on the overall strategy as these are already built into the underlying algorithm. Rather, we will run N trial back-test simulations on independent subsets of historical data to get an idea of how the algorithm performs on unseen data. We can then implement the CSCV procedure on the matrix of profits and losses resulting from the trials to recover a PBO estimate. Essentially there is no training of parameters taking place in our model as all parameter combinations are considered and the weights of the performance weighted average of the expert’s strategies associated with the different parameters are “learnt”.

More specifically, we choose a back-test length TBL for each subset and split the

entire history of OHLCV data into subsets of this length. The learning algorithm is then implemented on each subset to produce N = bT /TBLc profit and loss time

series. Note that the subsets will be completely independent from one another as there is no overlapping of the data that each separate simulation is run on. A matrix M is then constructed by taking the profits and losses over time for each of the back-test simulations. This matrix will form the first step of the CSCV procedure which is explained in detail in Section7.2but first, in the following subsection, we will introduce the theory required to define back-test overfitting, and hence, the probability of back-test overfitting.

7.1 Back-test Overfitting Framework

Consider the triple (T , F , P) with T representing a sample space of pairs of IS and

In document Trabajo Fin de Máster (página 43-47)