Algo que debemos tomar en cuenta es que… - Proceso de capacitación para el fortalecimiento orga

The Time Series Processing Framework (TSPF) has the following user-controlled options:

1. Social media data read-in selection. The user selects if he wishes to read in raw social media data for a particular financial-instrument/Twitter-Filter combination from the Twitter Collection Framework (TCF) for the first time, or if he wishes to open data from the TCF that has been read-in on a previous occasion. Reading data for the first time is more time-consuming as the TSPF has to convert .txt file data into MATLAB’s own .m file data line by line, and this takes place at a rate of up to 2,500 rows per second on a standard desktop machine. Opening the pre-read data any subsequent time is near-instantaneous. There is no limitation on the size of the data files which can be read-in.

2. Financial data read-in selection. The user selects the underlying file which contains the raw price data for a particular financial-instrument/Twitter-Filter combination. Financial data are sourced either from Dukascopy (in which case the data are in the form of a CSV), or from Fulcrum Asset Management (in

which case the data are the form of an .m file), as discussed in Chapter 4.3. Whilst there is no restriction on the granularity of the financial data that can be used, all financial data considered in this study were presented in 5-minute tick intervals.

3. Discretisation-window selection. The user selects the size of the window into which the social media and financial data are aggregated. This allows for the conversion of raw data, which is continuous, into discretised time frames, as discussed in Chapter 5.3. The choice of discretisation frequency in the financial services industry is often ad-hoc, typically dictated by the observation intervals of the available data79. As discussed in Chapter 4.1, the development of SocialSTORM57 provided preliminary access to Twitter data for initial exploration of the relationships between social media data and financial data. Whilst the Twitter data provided by SocialSTORM which was continuous, as is the case with the TCF, the financial data used during this preliminary investigation was not available to discretised resolutions smaller than an houra80. Based on this past data limitation, it was decided that relationships between Twitter data and financial data would be evaluated as discretised to the hourly level, followed by testing the robustness of the relationships at different discretisation levels (as discussed in Chapter 7.1).

For example, if the user selects the window to be 1-hour in size, the system performed the following calculations:

a) A discretised time-series T of time-stamps with elements T_i is created, where T₁ = 00:00:00 on 11th December 2012 and T_n = 23:59:59 on 11th March 2013 (bringing the data-capture period up to 12th March 2013, giving a total of 90 days).

b) The number of periods per 24-hours is determined as a function of the desired window size, W when expressed in hours (in this example, 1):

N_periods= 24 1

The number of elements in the discretised time-series T is therefore:

a_{Financial data used for the preliminary investigation was sourced from Thomson Reuters and from}

Fulcrum Asset Management, and was discretised to hourly windows due to the unavailability of higher- resolution data.

Tn = Nperiods× 90 = 24 × 90 = 2160

c) It is then identified whether the input data time-series of price, sentiment and message volume, Iprice, Isentiment, Imessage volume belong to each location in the discretised time-series T. An input data-point I is deemed to belong to a location in the discretised time-series T if its time-stamp is between up to and including the time-stamp for the current location in the discretised time-series, T_i, and above but not including the time- stamp for the chronologically previous location in the discretised time- series, i.e., Ti−1.

d) For each location in the discretised time-series T, the discretised means of the values for each of the corresponding input data series of price, sentiment and message volume, I_price, I_sentiment, I_{message volume} are determined. Denoted D̅̅̅̅̅̅̅̅̅̅, Dsentimentprice_Tn ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ and Dmessage volume_Tn ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ _Tn respectively, these are calculated as:

D_price_Ti

̅̅̅̅̅̅̅̅̅̅ =Iprice1+ Iprice2 + ⋯ Ipricen

n Dsentiment_Ti

̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ = Isentiment1+ Isentiment2+ ⋯ Isentimentn

n D_{message volume}_Ti

̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ =Imessage volume1+ Imessage volume2 + ⋯ Imessage volumen

e) Finally, the changes in these discretised mean values of Iprice, Isentiment, I_{message volume} are then calculated. Denoted ∆D̅̅̅̅̅̅̅̅̅̅, ∆D_price_Tn ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ _sentiment_Tn and ∆D̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ respectively, these are calculated as message volume_Tn

∆D̅̅̅̅̅̅̅̅̅̅ = D_price_Ti ̅̅̅̅̅̅̅̅̅̅ − D_price_Ti ̅̅̅̅̅̅̅̅̅̅̅̅ _price_Ti−1 ∆D̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ = Dsentiment_Ti ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ − Dsentiment_Ti ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ sentiment_Ti−1

∆D̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ = Dmessage volume_Ti ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ − Dmessage volume_Ti ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ message volume_Ti−1

In this manner, this methodology not only discretises the input data, but also normalises the data by the volume of data-points for each element in the time-series T.

f) Note, the values of ∆D̅̅̅̅̅̅̅̅̅̅, ∆D_price_T1 ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ and ∆D_sentiment_T1 ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ _{message volume}_T1 (i.e., for element T₁) are empty as these are the first entries in the discretised time-series T and therefore there are no prior elements from which to calculate the changes in these discretised mean values of Iprice, Isentiment, Imessage volume.

The TSPF also calculates the net sentiment for each Tweet, as described in Chapter 5.1. This is calculated by subtracting the negative sentiment from the positive sentiment for each message, and is ranked on a scale of -4 (most negative) through 0 (neutral) to +4 (most positive).

A full copy of the code underpinning the TSPF is available in the Appendix (see Chapter 11.2).

In document Proceso de capacitación para el fortalecimiento organizativo a líderes de grupos organizados de comunidad de Zet, San Juan Sacatepéquez. (página 104-111)