DATOS GENERALES - SÍNTESIS DE LA FASE INSTITUCIONAL

II. SÍNTESIS DE LA FASE INSTITUCIONAL

2.1. DATOS GENERALES

Statistical machine learning is a vast interdisciplinary field, with many disparate research areas. The remainder of this chapter will consider the techniques most relevant to quantitative finance and algorithmic trading in particular.

9.2.1 Regression

Regression refers to a broad group of supervised machine learning techniques that provide both predictive and inferential capabilities. A significant portion of quantitative finance makes use of regression techniques and thus it is essential to be familiar with the process. Regression tries to model the relationship between a dependent variable (response) and a set of independent variables

(predictors). In particular, the goal of regression is to ascertain the change in a response, when one of the independent variables changes, under the assumption that the remaining independent variables are kept fixed.

The most widely known regression technique is Linear Regression, which assumes a linear relationship between the predictors and the response. Such a model leads to parameter estimates (usually denoted by the vector ˆβ) for the linear response to each predictor. These parameters are estimated via a procedure known as ordinary least squares (OLS). Linear regression can be used both for prediction and inference.

In the former case a new value of the predictor can be added (without a corresponding response) in order to predict a new response value. For instance, consider a linear regression model used to predict the value of the S&P500 in the following day, from price data over the last five days. The model can be fitted using OLS across historical data. Then, when new market data arrive for the S&P500 it can be input into the model (as a predictor) to generate a predicted response for tomorrow’s daily price. This can form the basis of a simplistic trading strategy.

In the latter case (inference) the strength of the relationship between the response and each predictor can be assessed in order to determine the subset of predictors that have an effect on the response. This is more useful when the goal is to understand why the response varies, such as in a marketing study or clinical trial. Inference is often less useful to those carrying out algorithmic trading, as the quality of the prediction is fundamentally more important than the underlying relationship. That being said, one should not solely rely on the "black box" approach due to the prevalence of over-fitting to noise in the data.

Other techniques include Logistic Regression, which is designed to predict a categorical response (such as "UP", "DOWN", "FLAT") as opposed to a continuous response (such as a stock market price). This technically makes it a classification tool (see below), but it is usually grouped under the banner of regression. A general statistical procedure known as Maximum Likelihood Estimation (MLE) is used to estimate the parameter values of a logistic regression.

9.2.2 Classification

Classification encompasses supervised machine learning techniques that aim to classify an observation (similar to a predictor) into a set of pre-defined categories, based on features associated with the observation. These categories can be un-ordered, e.g. "red", "yellow", "blue" or ordered, e.g. "low", "medium", "high". In the latter case such categorical groups are known as ordinals. Classification algorithms - classifiers - are widely used in quantitative finance, espe- cially in the realm of market direction prediction. In this book we will be studying classifiers extensively.

Classifiers can be utilised in algorithmic trading to predict whether a particular time series will have positive or negative returns in subsequent (unknown) time periods. This is similar to a regression setting except that the actual value of the time series is not being predicted, rather its direction. Once again we are able to use continuous predictors, such as prior market prices as ob- servations. We will consider both linear and non-linear classifiers, including Logistic Regression, Linear/Quadratic Discriminant Analysis, Support Vector Machines (SVM) and Artificial Neural Networks (ANN). Note that some of the previous methods can actually be used in a regression setting also.

9.2.3 Time Series Models

A key component in algorithmic trading is the treatment and prediction of financial time series. Our goal is generally to predict future values of time series based on prior values or external factors. Thus time series modelling can be seen as a mixed-subset of regression and classification. Time series models differ from non-temporal models because the models make deliberate use of the temporal ordering of the series. Thus the predictors are often based on past or current values, while the responses are often future values to be predicted.

There is a large literature on differing time series models. There are two broad families of time series models that interest us in algorithmic trading. The first set are the linear autoregressive integrated moving average (ARIMA) family of models, which are used to model the variations

in the absolute value of a time series. The other family of time series are the autoregressive conditional heteroskedasticity (ARCH) models, which are used to model the variance (i.e. the volatility) of time series over time. ARCH models use previous values (volatilities) of the time series to predict future values (volatilities). This is in contrast to stochastic volatility models, which utilise more than one stochastic time series (i.e. multiple stochastic differential equations) to model volatility.

All of the raw historical price time series are discrete in that they contain finite values. In the field of quantitative finance it is common to study continuous time series models. In particular, the famous Geometric Brownian Motion, the Heston Stochastic Volatility model and the Ornstein-Uhlenbeck model all represent continuous time series with differing forms of stochastic behaviour. We will utilise these time series models in subsequent chapters to attempt to charac- terise the behaviour of financial time series in order to exploit their properties to create viable trading strategies.

Chapter 10

Time Series Analysis

In this chapter we are going to consider statistical tests that will help us identify price series that possess trending or mean-reverting behaviour. If we can identify such series statistically then we can capitalise on this behaviour by forming momentum or mean-reverting trading strategies.

In later chapters we will use these statistical tests to help us identify candidate time series and then create algorithmic strategies around them.

In document PLAN ANUAL DE TRABAJO (PAT) 2019 (página 6-0)