MARCO CONCEPTUAL - MARCO REFERENCIAL, TEÓRICO, CONCEPTUAL

2. MARCO REFERENCIAL, TEÓRICO, CONCEPTUAL

2.7. MARCO CONCEPTUAL

Publication I relies on Level-II order book data (see Section 2.2, extracted for 5 securities traded at the Helsinki stock exchange. The task of moving from the unstructured raw-data to a workable format suitable for feature extraction and ML applications is however complex: the procedure is here described. The original raw ITCH flow data provided by NASDAQ OMX operating at Helsinki exchange is distributed in day-specific and market-wide files1. The data in each file consists in formatted strings2 where texts and numbers at specific positions from the beginning of each row have a specific meaning. Rows correspond to different events that affect the state of the order book, either for a specific security, either for the whole market. An identifier available for each row defines the event the row is referring to, for instance, limit or market order submissions, partial and total cancellations or market events such as the beginning of a trading halt. This flow of events is often referred to as the message

book. Although the message book is comprehensive ofeveryevent that occurred on

the exchange, the data in this format is of difficult use. First of all, in the raw ITCH flow events for different securities are mixed together within the same file. Second, events can occur at any distance (level) from the current bid or ask prices and can

1_{NASDAQ’s Historical TotalView-ITCH ﬁles.}

2_{For more information consult the ofﬁcial documentation available at:}_{http://www.nasdaqtrader.}

be of little impact (and thus interest) in the short-term mid-price dynamics3_{, third,} simple information about e.g. the number of outstanding limit orders at any time- instance is not directly accessible: the whole ﬂow needs to be processed, submitted orders matched with cancellations and transactions and only the outstanding limit orders counted. Similarly, no immediate information about the current best levels is available: up to a certain time instance all the messages from the beginning of the day needs to be processed, only those corresponding to active orders (orders that have not been fully canceled -by either a single or a sequence of partial cancellations of the originally submitted quantity- or traded) sorted in descending or ascending order depending on the book side. In general, the message book does not provide an immediate outlook at the order book states, i.e. at the best bid and ask and deeper levels of the book in terms of prices and quantities. Although complete the message book is unhandy for analyses, so the ﬁrst step in data analysis is a whole message book preprocessing so that order book states can be easily reconstructed. Figure 2.2 provides a graphical illustration of the message-book dynamics, Figures 2.1 and 2.3 clearly refer to sample order book states at given epochs.

AC++converted has been implemented to achieve this task. In particular, messages

for a specific stock are extracted from NASDAQ’s raw market-wide and day-specific files. For each row information such as the message type, unique ID of the entry that the current event is referring to (e.g. in case of partial cancellation, the ID of the earlier event (order) that is updated, or a new ID in case of a new-order submission), the timestamp, order/cancellation side, price and volume. Timestamps are provided with millisecond precision, below this resolution the ordering of the message book

reﬂects the order at which the events have been submitted to the platform. AC++

converter thus generates day-by-day and stock-speciﬁc tables grouped by message type (order, trades, cancels) where the information for each event is stored in separate columns with numeric formats. Because of the data size and the convenience of tabular structures, the convenient .h5 format is chosen for saving the processed data. Importantly cross-references between the tables are implemented. For each limit-order submission available in the “orders” table the trade or cancel events that removed it from the order book are uniquely referenced and its time-stamp available: for each order the edgest_{i n},t_{ou t} the period in which it has been active in the book are thus readily available. In this way, retrieving the order book state at a timet is 3_{Think for instance about a limit-order submission at a wrong price and its immediate cancellation.}

08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 Time Jun 14, 2010 20 25 30 35 Price

Non-regular trading hours Excluded period Continuous trading hours

Figure 3.1 Illustration of a trading day (June 14, 2010, for KESKOB). Red areas define non-regular trading hours (pre-trading and post-trading periods). The whole blue and green area denotes continuous-time trading hours. However, analyses exclude their the first 30 and last 25 minutes, focusing on the green area only. Other marks are of analogous interpretation of Figure 2.2, here intentionally shrank to display the anomalous flow during non-continuous trading hours.

trivial: take the orders such thatt_{ou t}>t(i) the maximum (minimum) price of all the orders is the bid (ask) price (ii) the number of unique prices at each side corresponds to the number of levels (iii) for a given price level, the sum of the quantities of all the order gives the total quantity at that level. The exact reconstruction of the order

book state at anyt is in this way immediate. A simple Matlab function performs this

evaluation, taking as an input the .h5 ﬁles and returning tables of prices and quantities at an ordinary number of levels at the ask and bid side.

In the auction period aimed at price discovery before the beginning of the so-called continuous trading period and the post-opening period, message submission follows a different mechanism that is structurally different and with different regulations w.r.t. to the normal continuous trading. Moreover “edge-effects” in the proximity of market opening and closing times are likely to alter the ﬂow dynamics observed in the central part of the trading day (Siikanen, Kanniainen and Luoma, 2017; Siikanen, Kanniainen and Valli, 2017), e.g. around the closing time, liquidity providers would adjust their inventories, altering the regular ﬂow. Thus this decision aims at avoiding clear confounding effects (e.g. C. Cao et al., 2009), visible in 3.1. Accordingly, the analysis in Publications I-III ignore the trading activity outside 10:30 am and 6:00 pm

(UTC/GMT+2), whereas the effective continuous-time trading period goes from

10 am to 6:25 pm4_{. The pre-opening session runs from 9:00 am to 10:00 am and the}

post-trading from 6:25 pm. to 6:30 pm. See 3.1 for an illustration.

The analysis in Publication I focuses on the ﬁve Finnish stocks from different sectors reported in Table 3.1. The order book state data for the analysis is limited to the ﬁrst

ten best bid and ask levels. This means 40 order book state entries at each time-stamp, 10 prices 10 quantities for the two sides of the book. These entries, along with time- sensitive and time-insensitive features described in Table 4 of Publication I constitute the set of variables upon which the ML methods are implemented. As pointed out in Publication I full-depth data is not common, and data-oriented order-book analyses have often used Level-II data (limited to the ﬁrst 5 levels), thus the data made available in Publication I doubles this threshold. The period covered by the data corresponds to 10 trading days, between June 1, 2010 and June 14, 2010, and collects approximately 4·106events. This very same dataset has been used for the analyses in Publication II as well.

Ticker ISIN Code Company Sector Industry

KESKOB FI0009000202 Kesko Oyj Consumer Defensive Grocery Stores OUT1V FI0009002422 Outokumpu Oyj Basic Materials Steel

SAMPO FI0009003305 Sampo Oyj Financial Services Insurance RTRKS5 _{FI0009003552 Rautaruukki Oyj Basic Materials} _Steel

WRT1V FI0009003727 Wärtsilä Oyj Industrials Diversiﬁed Industrials

Table 3.1 Stocks used in in Publication I.

On the other hand, order book data for Publication III covers a longer period of 752 trading days (from June 1, 2010 to May 31,2013). The analysis here is limited to the best bid and ask levels only. I.e. the variables involved in the analyses such as order-to-order durations are calculated as durations between consecutive orders submitted at the best level on the same book side. Here the securities involved are not limited to the Helsinki exchange only, but include stocks traded in Stockholm and Copenhagen, see Table 3.2. For a complete list of variables extracted from this data, see Table II in Publication III.

Ticker ISIN Code Exchange Company Sector Industry NRE1V FI0009005318 Helsinki Nokian Renkaat Oyj Consumer Cyclical Rubber & Plastics METSO FI0009007835 Helsinki Metso Oyj Industrials Conglomerates & Steel ATCOA6 _SE0000101032 _Stockholm _{Atlas Copco AB} _Industrials _{Diversiﬁed Industrials}

VOLVB SE0000115446 Stockholm Volvo B Industrials Truck Manufacturing

VWS DK0010268606 Copenhagen Vestas Wind Systems Industrials Diversiﬁed Industrials Table 3.2 Stocks used in Publication II.

5_{Historical ISIN and ticker: ISIN delisted on 20 November, 2014.}

Stationarity and unit-root testing are among the most important analyses in econo- metrics. Not taking these into consideration nor discussing them at any level would represent an egregious omission for a dissertation in this field. This concluding part report results for the unit-root testing applied to a sample from the duration series Publication III uses. The following analyses are completed by the critical discussion in Chapter 6 to which I refer for and understanding of how unit-root testing and stationarity in general related to DFA. For a trading day of a sample stock, Table 3.3 reports rejections for most common stationarity tests, namely the Augmented Dickey-Fuller (ADF) test (Dickey et al., 1979), the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) tests (Kwiatkowski et al., 1992), and the Phillips–Perron (PP) test (Phillips et al., 1988). The last row in 3.3 reports the statistics for the whole daily time series. In virtue of the DFA algorithm described in Section 4.2, windows of different lengths are considered and their detrended fluctuation averaged: is important to address whereas stationarity holds for the disjoint intervals drawn by these windows. The number of intervals in which the day is divided is given in the second column, corresponding to the number of events, while the third column reports the average time-length for the specific window. From moderately low frequencies to half-day windows the joint analysis of the three tests indicates trend stationarity, thus over this domain, the application of DFA can be considered robust and reliable. At moderately low frequencies (10 seconds and less) tests lead to discrepant conclusions7. This is a well-known caveat that does not allow to conclude whether stationarity is to be rejected or not. In this case, heteroscedasticity may play a role (indeed well captured by the Bartlett test, besides the questionable normality of the samples, and by use of boxplots), a variance-ratio test could provide further insights. To which extent these analyses are relevant and to which extent they determine drive and limit the DFA applicability is however not addressed in the literature. Given the DFA procedure outlined in Section 4.2 stationarity appears to be a sensible hypothesis playing a relevant role. However, interestingly, the literature is vague in this regard and econometric analyses aimed at clarifying both the exact DFA hypotheses, defining its applicability, and

7_{Note that the PP test is based on asymptotic theory, i.e., in short time-series its validity is stretched}

e.g.≤30 (Kwiatkowski et al., 1992). Furthermore, note that the three tests are based on testing the significance of an autoregressive coefficient given specific alternative hypotheses about the time-series dynamics, built over specifications involving an AR-like term. Disturbances are generally allowed to be either i.i.d. or of finite-lag correlation. A detailed discussion here is out of scope, however, it would be relevant to clearly frame the role long-range autocorrelation might play in offsetting testing powers and testing distributions.

how unit-root testing relates to it, are missing. This is a critical aspect of the DFA methodology. A broader and complementing discussion is addressed in Chapter 6.

Rejection of H0(blank: no, 1: yes)

Intervals Order-to-order Trade-to-trade Cancel-to-cancel Average length

(sec.)

No. of intervals No. of events ADF KPSS PP ADF KPSS PP ADF KPSS PP

4.71 90 33 5.25 80 37 6.05 70 43 7.05 60 50 8.48 50 60 10.52 40 75 1 1 1 1 1 1 14.02 30 100 1 1 1 1 1 1 20.92 20 151 1 1 1 1 1 1 41.75 10 302 1 1 1 1 1 1 83.82 5 605 1 1 1 1 1 1 140.01 3 1009 1 1 1 1 1 1 216.93 2 1514 1 1 1 1 1 1 419.85 1 3028 1 1 1 1 1 1 1 1 1

Table 3.3 Stationarity testing for different duration time-series (June 06, 2010, VWS). Hypotheses read as follow. ADF & PP. H₀: the process has a unit-root, H₁: the process has no unit root, it’s either stationary or trend stationary. KPSS. H₀: the process is trend-stationary, H₁: the process has a unit root. The multiple-testing nature is taken into account by adjusting the signiﬁcance thresholds (Bonferroni correction).

In document Diseño de un plan de capacitación para la planta docente y administrativa de la sección matutina del Colegio Nacional Benito Juárez para contar con un instrumento técnico para asumir los retos de la nueva educación en el período septiembre 2014-julio 2015 (página 55-63)