Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange
Texto completo
(2)
(3) "Came back to show you I could fly" Robin Klein.
(4)
(5) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange. ACKNOWLEDGEMENTS A lo largo de estos dos años de máster he conocido personas maravillosas, tanto en los pupitres como en las pizarras. Han sido dos años muy duros por diversas razones, con mucha incertidumbre, y que finalmente llega a puerto ahora. Y esto no podría haber sido posible sin el apoyo de la gente de mi alrededor a la que estoy enormemente agradecido. A mis compañeros Adrián, Pablo y Benja, por haber dejado formar parte de uno de los equipos más productivos y en los que más confianza he depositado nunca. A mis compañeros y responsables de trabajo, por haberme permitido compaginar la jornada laboral y las clases, haciendo un esfuerzo considerable por ayudarme. A mis antiguos compañeros de pupitre, Manu, Víctor y Santi, ya que aunque desde la distancia en algunos casos, no han dudado de mi ni un sólo momento, como ya hicieron en mis retos anteriores. A aquellos profesores que saben el aprecio que me merecen, pero sobre todo a los tutores de este trabajo, Pepe y Camino, por haber confiado en mí para llevarlo a cabo y haberme guiado hasta el final. Cómo no a mi familia, por haberme empujado a emprender esta aventura sin saber cómo íbamos a hacer frente a las diferentes adversidades que nos hemos ido encontrando. A mi madre por seguir trabajando como nadie, tomando cada pequeña oportunidad que le brinda la vida para poder darnos lo mejor siempre. A mi padre por siempre ayudarme, sin que él se de cuenta, a pensar y por aportarme todo aquello que los libros y las personas ajenas nunca podrán ofrecerme, y que la vida algún día le valorará. A mi hermano Álvaro, por seguir luchando y haber recuperado el apetito de aprender y emprender, y enseñarme, como siempre le digo y nunca se cree, a ser mejor persona. Y a mi pareja, Elena, porque siempre es un gran apoyo ante mis rabietas y mis dudas, mis miedos y mis alegrías, y dejarme ver cuáles son las verdaderas cosas importantes en la vida. Y finalmente, a aquellos que no están y a la que aún queda a mi lado, que siempre me recibe con una gran sonrisa, y no ha parado de decir a los cuatro vientos que tiene un nieto ingeniero.. Garrido Camino, Carlos. 5.
(6)
(7) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange. ABSTRACT This Master Thesis is developed to apply data mining techniques on stock price prediction in Spanish stock exchange. Based on passive investment strategy, in which portfolios are designed to last at least six months, one of the objectives of this work is to understand which variables are important for stock price prediction in Spain. This analysis is carried out applying CART, Random Forests and Bagging techniques to Spanish listed companies over the last twenty years. The used data consist of financial statements from all those companies. Therefore, the results will be the creation of different models based on different data mining techniques, that will allow investors to create different portfolios. Those portfolios will be compared to see what is the prediction capability of each model and if that model improves the profitability of the Spanish index Ibex35. Key words: Data mining, Investment, Random Forest, Bagging, CART, Stock Exchange, Stock Price. Garrido Camino, Carlos. 7.
(8)
(9) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange. RESUMEN Este trabajo de fin de máster se desarrolla con el objetivo de aplicar técnicas de minería de datos al problema de la predicción de precios de acciones en el mercado bursátil español. Basado en la estrategia de inversión pasiva, en la cual las carteras de inversión de diseñan para mantener los valores durante al menos 6 años. Uno de los objetivos del trabajo es entender cuáles son las variables que tienen un mayor impacto a la hora de predecir el precio de las acciones de empresas en España. Este análisis se ha llevado a cabo aplicando técnicas como Árboles de Clasificación y Regresión (CART), Bagging y Random Forests a una base de datos con las empresas que cotizan y han cotizado a lo largo de los últimos 20 años en este país. Para ello se utilizan como base de datos los estados financieros de cada una de estas empresas. Por lo tanto, el resultado será la creación de diferentes modelos basados en dichas técnicas de minería de datos, para permitir a inversores diseñar diferentes carteras de inversión. Dichas carteras serás comparadas entre sí para ver la capacidad de predicción de cada una de ellas, así como con el índice español Ibex35. De esta manera se comprobará si dichos modelos mejoran el rendimiento de este índice bursátil. Palabras clave: Minería de datos, Inversión, Random Forest, Bagging, Cart, Bolsa, Precio de acciones. Garrido Camino, Carlos. 9.
(10)
(11) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange. TABLE OF CONTENTS Table of Contents ........................................................................................................................ 11 List of figures ............................................................................................................................... 13 List of tables ................................................................................................................................ 17 1.. 2.. 3.. Introduction ........................................................................................................................ 21 1.1.. History of trading ........................................................................................................ 21. 1.2.. Motivation ................................................................................................................... 22. 1.3.. Document Structure .................................................................................................... 23. Objectives ............................................................................................................................ 25 2.1.. General ........................................................................................................................ 25. 2.2.. Specific ........................................................................................................................ 25. How does the Stock Market Work? .................................................................................... 27 3.1.. 3.1.1.. Supply and demand ............................................................................................. 27. 3.1.2.. Short Selling......................................................................................................... 28. 3.1.3.. Trading Time Frame ............................................................................................ 28. 3.2.. 4.. 5.. Trading basics .............................................................................................................. 27. Technical analysis ........................................................................................................ 29. 3.2.1.. The Three Premises on which Technical Analysis is Based ................................. 29. 3.2.2.. Behavioral Finance .............................................................................................. 30. 3.3.. Why invest? ................................................................................................................. 30. 3.4.. Types of investments in stock exchanges ................................................................... 33. Prediction Techniques ......................................................................................................... 35 4.1.. Game Theory Models .................................................................................................. 35. 4.2.. Simulation models ....................................................................................................... 35. 4.3.. Time Series .................................................................................................................. 36. 4.4.. Artificial Intelligence.................................................................................................... 36. 4.4.1.. Neural Networks ................................................................................................. 36. 4.4.2.. Data Mining ......................................................................................................... 36. Stock Market Prediction with Data Mining Techniques...................................................... 37 5.1.. CART ............................................................................................................................ 37. Garrido Camino, Carlos. 11.
(12) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange 5.2.. Bagging ........................................................................................................................ 38. 5.3.. Random Forests........................................................................................................... 39. 6.. Data analysis and financial ratios ........................................................................................ 41 6.1.. Obtaining the Data ...................................................................................................... 41. 6.2.. Description of explanatory and response variables .................................................... 45. 7.. Application .......................................................................................................................... 49 7.1.. Investment strategy description ................................................................................. 51. 7.2.. Variable Importance analysis ...................................................................................... 54. 7.2.1.. Selection of the 5 most important variables for each model ............................. 55. 7.2.2.. Magic formula application based on variable importance and Ibex35 comparison 68. 7.3.. 8.. Classification trees development ................................................................................ 89. 7.3.1.. Description of the models obtained.................................................................... 89. 7.3.2.. Validation of the predictions from each model and Ibex35 Comparison ......... 100. Conclusions ....................................................................................................................... 111 8.1.. 8.1.1.. Model Comparison ............................................................................................ 111. 8.1.2.. Variable Importance Analysis ............................................................................ 114. 8.2. 9.. Research conclusions ................................................................................................ 111. Further Research ....................................................................................................... 117. Bibliography ...................................................................................................................... 119. Annex I.. Budget of the thesis .............................................................................................. 121. Annex II.. Time Management of the thesis ........................................................................... 123. Annex III.. R scripts ................................................................................................................. 125. CART with raw data ............................................................................................................... 125 CART with ratios data ............................................................................................................ 131 Bagging with raw data ........................................................................................................... 137 Bagging with ratios data........................................................................................................ 141 Random Forests with raw data ............................................................................................. 146 Random Forests with ratios data .......................................................................................... 153 Annex IV.. 2014 results by model ........................................................................................... 161. Models based on variable importance analysis .................................................................... 161 Models based on CART, bagging and random forest ............................................................ 169. 12. Escuela Técnica Superior de Ingenieros Industriales (UPM).
(13) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange. LIST OF FIGURES Figure 1. NYSE then and now. Source: (Bulvanoski, 2011) ......................................................... 22 Figure 2. Demand-Supply curve and equilibrium point. Source: cascadeeducationalconsultants.com (Williams, 2011) ............................................................... 27 Figure 3. Supply and demand curve with changes on demand, left, and supply, right. Source: Thismatter.com (Spaulding, 2015) .............................................................................................. 28 Figure 4. 10-year zero-coupon sovereign bonds yield evolution. Source: The Past and Future of Monetary Policy (Bernanke, 2013).............................................................................................. 32 Figure 5. Evolution of the assets, in thousands of euro, managed by international investment funds. Source: Gran crecimiento del patrimonio de los fondos de inversión en España (Cárdenas, 2015). ........................................................................................................................ 32 Figure 6. Machine Learning and Data Mining general process. Source: Prepared by the author. ..................................................................................................................................................... 37 Figure 7. Classification Trees examples. Left: Tree with 3 divisions; Right: Tree with 26 divisions. Source: Predicción del precio de la energía eléctrica utilizando modelos de minería de datos: árboles de clasificación y regresión, random forests y bagging (Juárez Barrios, Mira McWilliams, & González Fernández, 2013)................................................................................. 38 Figure 8. Master Thesis process to obtain the objective results. Source: Prepared by the author. ..................................................................................................................................................... 51 Figure 9. Variable importance analysis with Random Forests for Non-Financial companies, using raw data. Source: Prepared by the author. ....................................................................... 55 Figure 10. Variable importance analysis with Bagging for Non-Financial companies, using raw data. Source: Prepared by the author......................................................................................... 55 Figure 11. Variable importance analysis with Classification Trees for Non-Financial companies, using raw data. Source: Prepared by the author. ....................................................................... 56 Figure 12. Variable importance analysis for Non-Financial companies, using ratios. Source: Prepared by the author. .............................................................................................................. 57 Figure 13. Variable importance analysis with Bagging for Non-Financial companies, using ratios. Source: Prepared by the author. ................................................................................................. 57 Figure 14. Variable importance analysis with Classification Trees for Non-Financial companies, using ratios. Source: Prepared by the author. ............................................................................ 58 Figure 15. Indicator for ordering the database for variable importance analysis for V7, V17, V25 and V13. Source: Prepared by the author................................................................................... 59 Figure 16. Indicator for ordering the database for variable importance analysis for V6, V29, V51, V18, V16 and V15. Source: Prepared by the author. .......................................................... 60 Figure 17. Indicator for ordering the database for variable importance analysis for V14, V52 and V49. Source: Prepared by the author................................................................................... 61 Garrido Camino, Carlos. 13.
(14) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange Figure 18. Variable importance analysis for Financial companies using raw data. Source: Prepared by the author. .............................................................................................................. 62 Figure 19. Variable importance analysis with Bagging for Financial companies, using raw data. Source: Prepared by the author. ................................................................................................. 62 Figure 20. Variable importance analysis with Classification Trees for Financial companies, using raw data. Source: Prepared by the author.................................................................................. 62 Figure 21. Variable importance analysis for Financial companies using ratios. Source: Prepared by the author............................................................................................................................... 64 Figure 22. Variable importance analysis with Bagging for Financial companies, using ratios. Source: Prepared by the author. ................................................................................................. 64 Figure 23. Variable importance analysis with Classification Trees for Financial companies, using ratios. Source: Prepared by the author....................................................................................... 64 Figure 24. Indicator for ordering the database for variable importance analysis for V88, V104, V153, V127, V139 and V117. Source: Prepared by the author. .................................................. 66 Figure 25. Indicator for ordering the database for variable importance analysis for V82, V80, V3, V90, V83 and V100. Source: Prepared by the author. .......................................................... 67 Figure 26. Comparison of profits for the variable importance analysis using Random Forest-Gini criterion not weighting variables and Ibex 35. Source: Prepared by the author. ....................... 70 Figure 27. Comparison of profits for the variable importance analysis using Random ForestMDA criterion not weighting variables and Ibex 35. Source: Prepared by the author. .............. 71 Figure 28. Comparison of profits for the variable importance analysis using Bagging-Gini criterion not weighting variables and Ibex 35. Source: Prepared by the author. ....................... 72 Figure 29. Comparison of profits for the variable importance analysis using Classification TreesGini criterion not weighting variables and Ibex 35. Source: Prepared by the author. ............... 73 Figure 30. Comparison of profits for the variable importance analysis using Random Forest-Gini criterion weighting variables and Ibex 35. Source: Prepared by the author. ............................. 74 Figure 31. Comparison of profits for the variable importance analysis using Random ForestMDA criterion weighting variables and Ibex 35. Source: Prepared by the author. .................... 76 Figure 32. Comparison of profits for the variable importance analysis using Bagging-Gini criterion weighting variables and Ibex 35. Source: Prepared by the author. ............................. 77 Figure 33. Comparison of profits for the variable importance analysis using Classification TreesGini criterion weighting variables and Ibex 35. Source: Prepared by the author. ...................... 78 Figure 34. Comparison of profits for the variable importance analysis using Random Forest-Gini criterion not weighting variables and only ratios data, and Ibex 35. Source: Prepared by the author. ......................................................................................................................................... 80 Figure 35. Comparison of profits for the variable importance analysis using Random ForestMDA criterion not weighting variables and only ratios data, and Ibex 35. Source: Prepared by the author. .................................................................................................................................. 81 14. Escuela Técnica Superior de Ingenieros Industriales (UPM).
(15) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange Figure 36. Comparison of profits for the variable importance analysis using Bagging-Gini criterion not weighting variables and only ratios data, and Ibex 35. Source: Prepared by the author. ......................................................................................................................................... 82 Figure 37. Comparison of profits for the variable importance analysis using Classification TreesGini criterion not weighting variables and only ratios data, and Ibex 35. Source: Prepared by the author. .................................................................................................................................. 83 Figure 38. Comparison of profits for the variable importance analysis using Random Forest-Gini criterion weighting variables and only ratios data, and Ibex 35. Source: Prepared by the author. ..................................................................................................................................................... 84 Figure 39. Comparison of profits for the variable importance analysis using Random ForestMDA criterion weighting variables and only ratios data, and Ibex 35. Source: Prepared by the author. ......................................................................................................................................... 86 Figure 40. Comparison of profits for the variable importance analysis using Bagging-Gini criterion weighting variables and only ratios data, and Ibex 35. Source: Prepared by the author. ..................................................................................................................................................... 87 Figure 41. Comparison of profits for the variable importance analysis using Classification TreesGini criterion weighting variables and only ratios data, and Ibex 35. Source: Prepared by the author. ......................................................................................................................................... 88 Figure 42. Classification Tree for financial companies using raw data. Source: Prepared by the author. ......................................................................................................................................... 90 Figure 43. Classification Tree for non-financial companies using raw data. Source: Prepared by the author. .................................................................................................................................. 91 Figure 44. Pruned Classification Tree for non-financial companies using raw data. Source: Prepared by the author. .............................................................................................................. 92 Figure 45. Pruned Classification Tree for non-financial companies using raw data. Source: Prepared by the author. .............................................................................................................. 92 Figure 46. Classification Tree for financial companies using ratios data. Source: Prepared by the author. ......................................................................................................................................... 93 Figure 47. Classification Tree for non-financial companies using ratios data. Source: Prepared by the author............................................................................................................................... 94 Figure 48. Pruned Classification Tree for financial companies using raw data. Source: Prepared by the author............................................................................................................................... 95 Figure 49. Pruned Classification Tree for non-financial companies using raw data. Source: Prepared by the author. .............................................................................................................. 95 Figure 50. Comparison of profits for the investment strategy based on CART using raw data and Ibex 35. Source: Prepared by the author. ................................................................................. 101 Figure 51. Comparison of profits for the investment strategy based on pruned CART using raw data and Ibex 35. Source: Prepared by the author. .................................................................. 102. Garrido Camino, Carlos. 15.
(16) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange Figure 52. Comparison of profits for the investment strategy based on CART using ratios data and Ibex 35. Source: Prepared by the author. .......................................................................... 103 Figure 53. Comparison of profits for the investment strategy based on pruned CART using ratios data and Ibex 35. Source: Prepared by the author. ........................................................ 104 Figure 54. Comparison of profits for the investment strategy based on Bagging using raw data and Ibex 35. Source: Prepared by the author. .......................................................................... 106 Figure 55. Comparison of profits for the investment strategy based on Bagging using ratios data and Ibex 35. Source: Prepared by the author. .................................................................. 107 Figure 56. Comparison of profits for the investment strategy based on Random Forests using raw data and Ibex 35. Source: Prepared by the author. ........................................................... 109 Figure 57. Comparison of profits for the investment strategy based on Random Forests using ratios data and Ibex 35. Source: Prepared by the author. ........................................................ 110. 16. Escuela Técnica Superior de Ingenieros Industriales (UPM).
(17) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange. LIST OF TABLES Table 1. Prediction models comparison. Source: Important variable assessment and electricity price forecasting based on regression tree models: classification and regression trees, Bagging and random forests (Juárez Barrios, Mira McWilliams, & González Fernández, 2013). ............. 40 Table 2. Database Summary. Source: Prepared by the author. .................................................. 42 Table 3. Companies taken into account in the Thesis. Source: Prepared by the author. ........... 45 Table 4. Variables used for Non-Financial companies. Source: Prepared by the author. .......... 47 Table 5. Variables used for Financial companies. Source: Prepared by the author. .................. 48 Table 6. Example of Magic Formula application. Source: Prepared by the author. ................... 52 Table 8. Random Forests - MDA variable importance analysis Top-5 analysis on Non-Financial companies using raw data. Source: Prepared by the author. ..................................................... 56 Table 9. Random Forests - Gini variable importance analysis Top-5 analysis on Non-Financial companies using raw data. Source: Prepared by the author. ..................................................... 56 Table 10. Bagging - Gini variable importance analysis Top-5 analysis on Non-Financial companies using raw data. Source: Prepared by the author. ..................................................... 56 Table 11. Classification Trees - Gini variable importance analysis Top-5 analysis on NonFinancial companies using raw data. Source: Prepared by the author....................................... 57 Table 12. Random Forests - MDA variable importance analysis Top-5 analysis on Non-Financial companies using ratios. Source: Prepared by the author. .......................................................... 58 Table 13. Random Forests - Gini variable importance analysis Top-5 analysis on Non-Financial companies using ratios. Source: Prepared by the author. .......................................................... 58 Table 14. Bagging - Gini variable importance analysis Top-5 analysis on Non-Financial companies using ratios. Source: Prepared by the author. .......................................................... 58 Table 15. Classification Trees - Gini variable importance analysis Top-5 analysis on NonFinancial companies using ratios. Source: Prepared by the author............................................ 58 Table 16. Random Forests - MDA variable importance analysis Top-5 analysis on Financial companies using raw data. Source: Prepared by the author. ..................................................... 63 Table 17. Random Forests - Gini variable importance analysis Top-5 analysis on Financial companies using raw data. Source: Prepared by the author. ..................................................... 63 Table 18. Bagging - Gini variable importance analysis Top-5 analysis on Financial companies using raw data. Source: Prepared by the author. ....................................................................... 63 Table 19. Classification Trees - Gini variable importance analysis Top-5 analysis on Financial companies using raw data. Source: Prepared by the author. ..................................................... 63 Table 20. Random Forests - MDA variable importance analysis Top-5 analysis on Financial companies using ratios. Source: Prepared by the author. .......................................................... 65 Garrido Camino, Carlos. 17.
(18) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange Table 21. Random Forests - Gini variable importance analysis Top-5 analysis on Financial companies using ratios. Source: Prepared by the author. .......................................................... 65 Table 22. Bagging - Gini variable importance analysis Top-5 analysis on Financial companies using ratios. Source: Prepared by the author. ............................................................................ 65 Table 23. Classification Trees - Gini variable importance analysis Top-5 analysis on Financial companies using ratios. Source: Prepared by the author. .......................................................... 65 Table 24. Number of companies selected from each sector for the chosen methodology. Source: Prepared by the author. ................................................................................................. 68 Table 25. Results for the investment strategy based on variable importance using Random Forest-Gini criterion without weighting variables. Source: Prepared by the author. ................ 69 Table 26. Results for the investment strategy based on variable importance using Random Forest-MDA criterion without weighting variables. Source: Prepared by the author. ............... 71 Table 27. Results for the investment strategy based on variable importance using Bagging-Gini criterion without weighting variables. Source: Prepared by the author. ................................... 72 Table 28. Results for the investment strategy based on variable importance using Classification Trees-Gini criterion without weighting variables. Source: Prepared by the author. .................. 73 Table 29. Results for the investment strategy based on variable importance using Random Forest-Gini criterion weighting variables. Source: Prepared by the author. .............................. 74 Table 30. Results for the investment strategy based on variable importance using Random Forest-MDA criterion weighting variables. Source: Prepared by the author. ............................ 75 Table 31. Results for the investment strategy based on variable importance using Bagging-Gini criterion weighting variables. Source: Prepared by the author. ................................................. 76 Table 32. Results for the investment strategy based on variable importance using Classification Trees-Gini criterion weighting variables. Source: Prepared by the author. ............................... 77 Table 33. Results for the investment strategy based on variable importance using Random Forest-Gini criterion not weighting variables and ratios. Source: Prepared by the author........ 79 Table 34. Results for the investment strategy based on variable importance using Random Forest-MDA criterion not weighting variables and ratios. Source: Prepared by the author. ..... 80 Table 35. Results for the investment strategy based on variable importance using Bagging-Gini criterion not weighting variables and ratios. Source: Prepared by the author. ......................... 81 Table 36. Results for the investment strategy based on variable importance using Classification Trees-Gini criterion not weighting variables and ratios. Source: Prepared by the author. ........ 82 Table 37. Results for the investment strategy based on variable importance using Random Forest-Gini criterion weighting variables and ratios. Source: Prepared by the author. ............. 84 Table 38. Results for the investment strategy based on variable importance using Random Forest-MDA criterion weighting variables and ratios. Source: Prepared by the author. ........... 85 Table 39. Results for the investment strategy based on variable importance using Bagging-Gini criterion weighting variables and ratios. Source: Prepared by the author. ................................ 86 18. Escuela Técnica Superior de Ingenieros Industriales (UPM).
(19) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange Table 40. Results for the investment strategy based on variable importance using Classification Trees -Gini criterion weighting variables and ratios. Source: Prepared by the author. ............. 87 Table 41. Complexity parameter, relative error, cross validation error and cross validation standard deviation variation with number of splits for maximum classification tree for Financial companies using raw data........................................................................................................... 89 Table 42. Complexity parameter, relative error, cross validation error and cross validation standard deviation variation with number of splits for maximum classification tree for NonFinancial companies using raw data. .......................................................................................... 90 Table 43. Complexity parameter, relative error, cross validation error and cross validation standard deviation variation with number of splits for maximum classification tree for Financial companies using ratios data. ...................................................................................................... 92 Table 44. Complexity parameter, relative error, cross validation error and cross validation standard deviation variation with number of splits for maximum classification tree for NonFinancial companies using ratios data. ....................................................................................... 93 Table 45. Confusion Matrix for Bagging applied on non-financial companies using rough data. ..................................................................................................................................................... 96 Table 46. Confusion Matrix for Bagging applied on financial companies using rough data. ...... 96 Table 47. Confusion Matrix for Bagging applied on non-financial companies using ratios data.97 Table 48. Confusion Matrix for Bagging applied on financial companies using ratios data. ...... 97 Table 49. Confusion Matrix for Random Forest applied on non-financial companies using rough data. ............................................................................................................................................ 98 Table 50. Confusion Matrix for Random Forest applied on financial companies using rough data. ............................................................................................................................................ 98 Table 51. Confusion Matrix for Random Forest applied on non-financial companies using only ratios data. .................................................................................................................................. 99 Table 52. Confusion Matrix for Random Forest applied on financial companies using only ratios data. ............................................................................................................................................ 99 Table 53. Results for the investment strategy based on CART with raw data. Source: Prepared by the author............................................................................................................................. 100 Table 54. Results for the investment strategy based on pruned CART with raw data. Source: Prepared by the author. ............................................................................................................ 102 Table 55. Results for the investment strategy based on CART with only ratios data. Source: Prepared by the author. ............................................................................................................ 103 Table 56. Results for the investment strategy based on pruned CART with only ratios data. Source: Prepared by the author. ............................................................................................... 104 Table 57. Results for the investment strategy based on Bagging with raw data. Source: Prepared by the author. ............................................................................................................ 105. Garrido Camino, Carlos. 19.
(20) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange Table 58. Results for the investment strategy based on Bagging with ratios data. Source: Prepared by the author. ............................................................................................................ 106 Table 59. Results for the investment strategy based on Random Forests with raw data. Source: Prepared by the author. ............................................................................................................ 108 Table 60. Results for the investment strategy based on Random Forests with ratios data. Source: Prepared by the author. ............................................................................................... 109 Table 61. Comparison among every developed model in the Master Thesis. Source: Prepared by the author............................................................................................................................. 111 Table 62. Summary of variable importance for financial companies. Source: Prepared by the author. ....................................................................................................................................... 114 Table 63. Summary of variable importance for non-financial companies. Source: Prepared by the author. ................................................................................................................................ 116. 20. Escuela Técnica Superior de Ingenieros Industriales (UPM).
(21) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange. 1. INTRODUCTION 1.1.HISTORY OF TRADING Stock Exchange can be seen as a trading method that was born in the last century. However, the way stocks are traded nowadays and its objective come from the ancient times, where agriculture was at the center of the economy. Furthermore, one of the first documented events in which company shares were used is in a sentence pronounced by Cicero in which he said “shares that had a very high price at the time”(Goetzmann & Rouwenhorst, 2005). This sentence shows how people already valued companies by that time. Later, when Italy became again a trading center in the Mediterranean Sea during the Renaissance, money lenders filled the gaps that banks did not covered because of the intrinsic risk of different missions to Asia. These people exchanged debts with each other in order to decrease the risk of their debt portfolio. They also started to buy government debts, what we now as bonds, and had the idea of trading them with other people who became their customers. During the century XVI, money lenders met in Antwerp to trade business, government and individual debts. It was the same idea, but taking into account the concept of profit maximization. The difference with modern stock market is that company ownerships did not change hands. Nevertheless, one of the first moments in history where modern companies are documented is with the East India Companies of countries like the Netherlands in the century XVII. Governments fleeted boats to Asia to bring different goods. However, pirates and weather conditions became a major risk for these businesses. That is why governments looked for investors, to share the risk as well as the profits. These investors were shareholders of the company who did not have any management right, but were rewarded for their investment. Furthermore, these shares where reflected on papers. These papers could be traded before receiving the profits but there were no stock exchange by that time. That is why investors needed brokers to find other investors who wanted to buy the shares they owned. Therefore, most brokers in London started to meet at coffee shops, which became the first stock exchanges of the history. New York Stock Exchange (NYSE), although it was not the first in the USA, became suddenly the most powerful one in the world. The first one was in Philadelphia, but the strategic location of Manhattan was key to empower the NYSE. Manhattan, more specifically Wall Street, received most of the goods coming from Europe and other parts of America. In the 1920s the NYSE suffered two of its biggest hits in history, weakening the power it had until that date. In 1920, a bomb shocked Wall Street, killing 38 people and leaving several damaged buildings. Later in 1929, the Great Depression shook American economy, writing the poorest lines of American history. Because of this last event, the NYSE saw how many. Garrido Camino, Carlos. 21.
(22) 1. Introduction regulatory decisions were taken over its business, increasing the requirements for listing and reporting.. Figure 1. NYSE then and now. Source: (Bulvanoski, 2011). Meanwhile in Europe, the London Stock Exchange (LSE) became the most powerful in the old continents. There were also others exchanges like those in Germany, France, the Netherlands or Switzerland. However, the power of NYSE and LSE were not comparable to the others. That is why one of the most important objectives for companies was to get listed on LSE and hopefully in NYSE. With the appearance of computers, trading became has become a different game. To be physically in the exchange is not a must, what has reduced transaction costs and has open the business to other people from all around the world and with less money to invest. Apart from that, computers have led to the growth of algorithms that try to get the highest profit of the investments by trying to predict the behavior of the stocks.. 1.2.MOTIVATION If we follow the argument line of the previous point, the development of prediction techniques is one of the most important fields studied nowadays linked to trading on stock markets. The nature of the business makes these techniques very dependent on volatility. This comes from the fact that stock price prediction not only depends on measurable values, but on events directly or indirectly related to the studied stock. However, it is very interesting the widely spread willingness for predicting the price of stocks. This can be done taking into account past values of the measured variable, or studying how other variables affects the behavior of the respond variable. Apart from that, different techniques have been improved thanks to the increasing power of computers, which enables researchers and investors to implement their models with little computing effort. Methods like the Magic Formula from The Little Book That Still Beats the Market(Greenblatt, 2010)try to develop a passive method for value investing. These techniques seek to maximize the returns and to minimize of the effort made on deciding where to investing on. In section 22. Escuela Técnica Superior de Ingenieros Industriales (UPM).
(23) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange 7.1 the Magic Formula concept is explained in order to apply it to the database used in this paper. During the Master degree this thesis is based, the author has learnt how to interpret financial data of companies, as well as different predictive methods on more statistical courses. That is why the mix of these two fields result in a work that tries to model the behavior of stock prices based on company financial statements and previous values of the studied variable.. 1.3.DOCUMENT STRUCTURE This document is split in several parts to make it easy for the reader to follow the arguments presented along the paper. These parts are: The introduction, current point, in which the motivation for this work is presented. In this step of the work the history of the stock market is explained as well as the continuous effort of the investors to improve prediction techniques to get the highest value of their decisions. The objectives, divided into general objectives and specific objectives, define what the author wants to develop along the work. This will give the reader a sense of how the work started and how it has ended during the last weeks. In the third part of the report the stock market is described in order to describe the framework in which the statistical analysis is applied. Apart from that, readers who are not familiar with this the field of finance are able to receive an insight on the topic to understand the application of the methodology. Different prediction methods are listed and explained in the fourth part of the current report. This gives the reader a background of the most extended techniques to predict stock prices as well as other type of data. Following with the previous point, the selected prediction technique, data mining, is presented to the reader. This seeks to show a theoretical approach of the methodology to the problem chose by the author. In the sixth point of the work the data used for the problem is developed. In this part each variable used to predict the price of the stock is explained. In this manner, the reader is able to understand which one impacts on the model that is created in the following point. During part number seven the reader is able to see how the model designed in point number six is applied to the data described in point number five. Furthermore, the final results are shown in order to establish the base for the conclusions obtained in the following point. Finally, in part eight the conclusions to the work are given in order to value how well the model and the approach of author’s effort have been fitted to reality. Apart from that, further research lines are highlighted in order to open the reader new opportunities to improve this research.. Garrido Camino, Carlos. 23.
(24)
(25) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange. 2. OBJECTIVES 2.1.GENERAL The primary objective of this Master Thesis is to develop a model for predicting company’s value behavior. This will lead to an analysis that will enable the investor to put his/her money on the best values of the market in order to maximize his/her profit. Furthermore, effort minimization is a must, so this study tries to simplify investment decisions. Therefore, an amateur investor should be able to decide where to invest in within a few minutes of reading and practice. Apart from that, one of the aims is to develop a method for people who do not want to spend money on expensive computing tools or simply cannot afford them. That is why R programming is used, an open source statistical software that is easily manageable with a few days of practice, and that is widely and increasingly used by different groups like statistics researchers and data scientists.. 2.2.SPECIFIC For this paper, the general objectives already described are applied to the Spanish Stock Exchanges; Bilbao, Barcelona, Madrid and Valencia. Due to the Great Recession that started in 2008, the Spanish economy has suffered a great impact, causing bankruptcy to many companies as well as a decrease in value of many others. That is why one of the main objectives of this paper is the detection of the main variables that affect companies on their journey on the stock exchange. Having this issue covered, the next step is using that analysis in order to develop several different models to decide whether to invest in on company or the other. Moreover, the most important variables in order to predict stock price are analyzed to see whether they are logical and representative or not. Finally, a comparison is carried out between the models developed and the IBEX35, in order to see if the models can beat the market, and improve investor returns.. Garrido Camino, Carlos. 25.
(26)
(27) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange. 3. HOW DOES THE STOCK MARKET WORK? 3.1.TRADING BASICS In this chapter the basics of stock markets, trading, and general price prediction techniques are introduced. The primary focus will be on the principles that run stock markets and one of the stock analysis technique, technical analysis. The aim here is not to go into precise details of methods in technical analysis, but rather to give an overview of the basic foundation in the field. Trading stocks is the process of buying and selling shares of a company on a stock exchange with the purpose of making profitable returns. The stock exchange works like any other economic market; that is to say, there must be a buyer who wants to buy a certain quantity of a particular stock at a certain price and a seller willing to sell the stock at that offered price. Naturally, buyers want to minimize the price paid for the stock and sellers want to maximize the selling price for the stock. Thus, the stock market is governed, as any other market, by the basic economic principle of supply and demand. Transactions in the stock market are processed by brokers who connect buyers and sellers. Those brokers earn either a fix or a variable commission for each transaction they complete.. 3.1.1. SUPPLY AND DEMAND Supply and demand are one of the most primary concepts in economic theory. The following figures show the relationship between the supply curve (provided by the sellers) and demand (provided by the buyers). The intersection between the supply curve and demand curve is the equilibrium point as seen in Figure 2, that is, the price at which the seller and the buyer agree to sell/buy a certain quantity and a transaction can take place.. Figure 2. Demand-Supply curve and equilibrium point. Source: cascadeeducationalconsultants.com(Williams, 2011). In Figure 3, in the left diagram, there is a right shift of the demand curve which means an increase in the demand. Such increase creates a raise inprice from P1 to P2which becomes the new equilibrium price. In stock markets, traders basically want to recognize this shift in demand before it happens so that they can purchase at a price close to P1and sold at a price Garrido Camino, Carlos. 27.
(28) 3. How does the Stock Market Work? close to P2, making a profit of P2−P1. Likewise, the same figure but in the right hand side diagram shows an enlargement in supply that results in a reduction in price, which we would like to establish early so that we can buy the stock at P2, avoiding to buy the stock at P1, or short selling the stock at P1.. Figure 3. Supply and demand curve with changes on demand, left, and supply, right. Source: Thismatter.com (Spaulding, 2015). 3.1.2. SHORT SELLING Short selling is a trading strategy that seeks to make a profit on an expected decline in the price of a stock.An investor who short sells stocks believes the price of the stock will fall and hopes to buy it at a lower price. Basically, a short seller is trying to sell high and buy low(Turner, 2000). Essentially, short selling involves that traders borrow shares of a security from a broker lending. Afterward, the trader will sell the shares immediately at the market price. Then, the trader will repurchase the shares (hopefully at a lower price) and return them to whoever they borrowed them from. After all this, the traders will have pocketed the difference if the share price fell, but they will have lost money if the price increased. Short selling is controversial because when a large number of investors decide to short a certain stock, their joint actions can have an intense impact on the firm's share price. In fact, a lot of companies blame short sellers for severe depreciation in their stock. Prohibitions on short selling have been endorsed in several occasions, for instance, during the recent financial crisis, investors were forbidden from short selling particular banks and credit institutions.. 3.1.3. TRADING TIME FRAME Turner describes four basic trading time frames that are commonly used by traders: . Position trades: stocks may be held from weeks to months. Swing trades: stocks may be held for two to five days. Day trades: stocks are bought and sold within the same day. Momentum trades: stocks are bought and sold within seconds, minutes or hours.. Each of these time frames has its own risk-reward ratio, where shorter time frames are typically associated with greater risk (Turner, 2000). Due to the availability of data sets consisting of monthly stock prices, the main focus of this thesis will be on position trades.. 28. Escuela Técnica Superior de Ingenieros Industriales (UPM).
(29) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange. 3.2.TECHNICAL ANALYSIS Fundamental and technical analysis are two different kind of typically applied tools used by investors at the time of deciding what stocks they are going to purchase or sell. Both are used with the aim of analyzing and predicting shifts in supply and demand (Turner, 2000). As mentioned earlier, shifts in supply and demand is the basis of most economic and fundamental forecasting. If there is an increase in supply, the theory states that the price should fall, and in the same way, if there is an increased in demand, the price should rise. The ability to predict these movements (drops and falls) in supply and demand gives the trader the capacity to establish profitable entry and exit positions, which is the final purpose of stock analysis. Since technical analysis is solely concerned with price and volume data, particularly price patterns and volume spines, fundamental analysis involves the study of company basic information such as revenues and expenses, market position, annual growth rates, and so on, (Turner, 2000).. 3.2.1. THE THREE PREMISES ON WHICH TECHNICAL ANALYSIS IS BASED (Murphy, 1986)describes three premises on which technical analysis is based: . Market action discounts everything. Prices move in trends. History repeats itself.. 3.2.1.1.. MARKET ACTION DISCOUNTS EVERYTHING. Market action is defined as the sources of information available to the trader such as price and volume data. By assuming that market action discounts everything we are essentially assuming that everything that could influence the price (that is, fundamentals, politics, psychology, etc.) is integrated and reflected in the price and volume data. Price thus indirectly provides a perspective of the fundamentals and a study of price action is therefore all that is required to predict shifts in supply and demand. For example, if prices are rising, the technician assumes that, for whatever specific reason, demand must exceed supply and the fundamentals must be positive. Practitioners of technical analysis thus believe that there is an inherent correlation between market action and company that can be used to forecast the direction of future prices.. 3.2.1.2.. PRICES MOVE IN TRENDS. A price tendency is the predominant direction of a stock’s price over a period of time. The concept of trend is perhaps the archetypal idea in technical analysis and most of the technical indicators are designed to determine and trail existing trends (Turner, 2007). When doing technical analysis, patterns in the price data are searched. We want to identify situations that indicate persistence in trend so that we can ”ride” the trend as long as possible. We also want to look for situations that point out a reversal in the tendency in order to sell the stock before the trend turns or buy the stock at the moment it reverses. When analyzing and selecting stocks it is important to look for stocks that are trending, trying to analyze the strength of such. Garrido Camino, Carlos. 29.
(30) 3. How does the Stock Market Work? trend with the aim of making the proper decision. Accordingly, for the methodology in technical analysis it is essential to assume that that prices do move by tendencies.. 3.2.1.3.. HISTORY REPEATS ITSELF. As it has been already mentioned, technical analysis examine stock price data for price patterns, assuming that in some way, their lagged prices are good for predicting the direction of the prices in the future. As financial markets are driven by human actions and expectations, Murphy (1999) associates the creation of regular and predictive price patterns to a study of human psychology and group dynamics which is the basis for behavioral finance.. 3.2.2. BEHAVIORAL FINANCE Initially financial theory was mainly based on the efficient markets hypothesis (EMH). This hypothesis was originally stated in (Fama, 1965)and states that the price of traded assets such as stocks are informationally efficient, that is, prices always reflect all known information and all agents in the market seek to maximize their utility and have rational expectations. Taking into account these assumptions, any attempt to analyze past prices and trading stocks would be a waste of time as it would be impossible to outperform the market since all known information is integrated in the price and all agents value the information similarly (Fama, 1965)(Schleifer, 2000). The theory was supported by successful theoretical and empirical work, and was widely considered to be proved. Nevertheless, its supremacy from the 1970s to the 1990 has been challenged and the focus has moved towards behavioral finance (Schiller, 2003). Behavioral finance looks at finance from a broader social science perspective, including theory from psychology and sociology. Human desires, goals, motivations, errors and overconfidence are included as factors that affect finance (Shefrin, 2002). Hence it triggered the assumption that investors are utility maximizing agents with rational expectations. By contrast, behavioral finance sets that when two investors are confronted with the same price information, their reactions will be different, and they will value the information in different ways. As pointed out by (Turner, 2000), when a trader buys a stock at a certain price p it is certainly with expectations that it will rise. Likewise, the seller at price p expects the price to drop. Only one of them can win and make a profit. This difference in valuation is what drives market changes, trends, and profitable situations. This, (Turner, 2000)classifies greed and fear as key emotions that drive the market.. 3.3.WHY INVEST? From a financial point of view, an investment is the transfer to the financial market of an excess of liquidity, expecting a profit after a given period. From an economic point of view, an investment is the acquisition of productive assets in order to get a profit. Furthermore, there are financial investments that are also economic, but not every financial investment are economic investment and vice versa. Therefore, the financial investments in which this thesis focuses on are those carried out in the stock exchange. This kind of investment, depending on the risk appetite of the investors, can be very attractive. 30. Escuela Técnica Superior de Ingenieros Industriales (UPM).
(31) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange First of all, there are almost no entry barriers to the market, because anyone with some savings can enter and buy shares of any listed company. This makes the stock exchange a place to invest for both wealthy people and middle class. Secondly, the investment on company stocks offers a diversification opportunity for the investment risk. This comes from the fact that even with small amounts of money, an investor may put his/her money in several stocks, or even several markets. Thirdly, and comparing to other assets like real state, stocks are very liquid. Therefore, in case the investor needs the money or wants to close his/her position, it will not be difficult to find someone who wants to buy those shares. A fourth reason to invest is that the information of listed companies and other kind of stocks is easily found in the internet. There are several websites like Yahoo Finance, Google Finance, Invertia and several others, that offer a lot of information, both current and historical, for free. Furthermore, there are other more specialized companies, like Bloomberg, that offer premium services with more information, more tools to analyze the markets, and so on. As a fifth reason to invest, stock investments offers the possibility of earning money with the obtained dividends without the need to sell the stocks. And finally, it has to be pointed out that the current financial situation helps investors to decide to put its money in stocks. As it can be seen in Figure 4, the yield for the 10-year zerocoupon sovereign bonds has decreased all over the world, but in Japan where it was already low because of the so called "Abenomics". Therefore, investors have tried to move their investments to other products, like investment funds. In Figure 5 it can be seen that when the Financial crisis in 2007 appears, the assets in investment funds decrease. This comes from the fact that investors may need their money to hedge other investments because of the bad economic situation. However, when this period of recession has passed, people will start to look for investment funds, seeking for a higher profit than sovereign bonds.. Garrido Camino, Carlos. 31.
(32) 3. How does the Stock Market Work?. Figure 4. 10-year zero-coupon sovereign bonds yield evolution. Source: The Past and Future of Monetary Policy(Bernanke, 2013). Figure 5. Evolution of the assets, in thousands of euro, managed by international investment funds. Source: Gran crecimiento del patrimonio de los fondos de inversión en España (Cárdenas, 2015).. Furthermore, there are two types of markets. The primary market is the place where shares, bonds and other kind of stocks are first traded. Also, there is a secondary market in which investor with stocks bought in the primary market can sell them whenever they want. Therefore, the stock exchange is a secondary market where offers and demands for stocks are centralized in order to make it easy for investors to trade their stocks. In Spain there are four stock exchanges, Barcelona, Bilbao, Madrid and Valencia. A share is a unit of ownership interest in a corporation or financial asset. Therefore, these shares give the investor the chance to use the rights that an owner of the company has, if his/her investment meets some previously specified requirements. These rights are obtaining dividends, the voting possibility in shareholders meetings, and the preemptive right. 32. Escuela Técnica Superior de Ingenieros Industriales (UPM).
(33) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange Apart from that, and it is an important characteristic for Spanish stock exchange is the prohibitionofshort selling for different periods during the last years. Short selling is the act of borrowing a share to someone else to sell it in the market. This is done when an investor thinks the stock price is going to decrease. Therefore, after some time, the investor could by that share in the market, give it back to the borrower, and earn some money if the current price is higher than the price when the investor sold the share. Thereby, along this master thesis the investment techniques considered will focus on buying shares and selling later, in order to meet the rules of the Spanish market in every period in history.. 3.4.TYPES OF INVESTMENTS IN STOCK EXCHANGES There are two types of investment approaches when investing in stocks, active and passive management. Active management is based on the skills on the investor to detect, studying deeply each stock, those that are not well valued, in this case, those that are undervalued. This management needs more knowledge because it is more dynamic, and more time to value where and how much to invest. Besides, the risk is higher because the profits roughly come from the process of buying cheap and selling expensive. In contrast, passive management is based on the idea of efficient markets, that is the stock price shows the real value of each stock at every moment. Therefore, the portfolios under this type of management are well diversified, and the investment techniques are based on indeces like Ibex35. In other words, the investor buys stocks that in the long term, from 6 months, will reward a good profit. Thereby, the investor's portfolio selection depends, apart from the stock price, on financial characteristics of the companies, like financial results and structures. As a consequence, this type of investors are supposed to look for good stocks, and not undervalued stocks. Good stocks are understood as optimum for its selection based on certain characteristics. This Master Thesis provides a method for passive management in which the investor, with little effort will be able to create his/her own profitable portfolio.. Garrido Camino, Carlos. 33.
(34)
(35) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange. 4. PREDICTION TECHNIQUES Prediction of the companies' stocks value are essential for investors in order to make proper investment decisions. The different tools and techniques used to do so are crucial to provide insighton the evolution of the share prices. Nevertheless, volatility is one of the main characteristics of the stock exchange market and there is a trend to develop new and sophisticated models to predict market prices more accurately. Nowadays, there are numerous models to deal with share price prediction. The applied techniques vary depending on the temporal horizon, input variables, output variables, the type of model, the methodology used to analyze results, etc. Four groups of prediction model for stock exchange prices can be identified. a) b) c) d). Game theory models Simulation models Time series analysis Artificial intelligence models. The data mining techniques applied in this project are included in the last group of models. However, a brief description of the remaining models follows.. 4.1.GAME THEORY MODELS These techniques build a model from the various strategies used by the different participants in the market in order to achieve their maximum benefit. This approach consists on setting out diverse situations according to the number of agents that are part of the game, the number of periods in which the agents interact , the acting sequence in the system , etc. Taking this into consideration, a game or strategic model is elaborated. Finally a mathematical solution to these games is given. The advantage of applying this technique is that when a player acts by fixing the price or by making an offer, he/she pretends to maximize his/her own utility but taking into account that other agents' actions have an impact on his/her utility.. 4.2.SIMULATION MODELS These tools model the behavior of the stocks in a realistic and detailed approach, trying to find a mathematical algorithm that gives solution to the model. There are two different types of simulation techniques, those who try to seek balance in the market, that is to say, the point where all agents involved in the system coincide and everybody wins; and those that instead of trying to search for the equilibrium point, pretend to build a "virtual mockup" where each actor is represented by a computer program in which they make decisions and interact with the rest of agents. Both simulations techniques have disadvantages. On the one hand, the search for an equilibrium point in a big market with numerous agents is complicated since it is possible Garrido Camino, Carlos. 35.
(36) 4. Prediction Techniques thatno balance point exists, or there can be multiple equilibrium points. On the other hand,it is not an easy task to program decisions to optimize strategies based on the dynamic behavior of the different agents in the market.. 4.3.TIME SERIES This prediction method based on the historical behavior of a dependent variable. The goal of this type of studies is to model the performance of the variable along a period of time in order to beable to predict future values. There are other techniques based on statistical models within time series models. These techniques consist on analyzing some statistics obtained from the market, seeking patterns and tendencies to help in the future projections. ARIMA models and dynamic regression models are comprised in this group of techniques. In the first one, prediction values are a weighted average of past values (Pankratz, 1991). Dynamic regression models are based on the relationship existing between an output variable and the present and past values of some input variables.. 4.4.ARTIFICIAL INTELLIGENCE Artificial intelligence methods are flexible models which allow to work easily with difficult problems even when they are non-linear.Data mining techniques and neuronal networks among othersare part of this kind of models(Azzalini & Scarpa, 2013)(Awata & Hiroyuki, 2007).. 4.4.1. NEURAL NETWORKS Neural networks techniques arise from the study of the human brain and its comparison with the digital computer. The idea is to propose new connection models and learning methods based on modeling the brain with the objective of achieving a capacity of generalization and robustness similar to it. There are diverse types of neural networks according to their use such as to adjust functions, to classify a group of data or toidentify patterns.. 4.4.2. DATA MINING The use of data mining techniques is increasing nowadays. They are based on a massive data collection from where hypothesis are extracted and validated. In contrast, conventional models use hypothesis to find data in order to validate or refute the initial hypotheses. The information that can been extracted by applying these techniques is very varied. They can be used for associations, sequences, classifications, grouping or predictions.. 36. Escuela Técnica Superior de Ingenieros Industriales (UPM).
(37) Creation of an investment strategy using Data Mining techniques in Spanish Stock Exchange. 5. STOCK MARKET PREDICTION TECHNIQUES. WITH. DATA MINING. One of the goals of data mining is to predict the tendency and behavior of a determined variable. This concept arises from the need to search models to explain the given changes of a variable with respect to other different group of variables with which it is related in an easy and precise way. In most of the cases, the database used to apply this type of model is so huge that it is not possible to study the variables with conventional models. The advantage of this type of model is that it is possible to access hidden information in the database due to the abstract representation of reality. The way this kind of model works is the following one: The database is formed by the input variables, which are used in the learning algorithm that produces a model. Once the model is obtained, it must be validated by introducing a group of data whose values are known and observing if the predictions work. Figure 6shows an example of such process.. Training Set. Historical Data. Model Builder. Evaluation + -. Training Set. + Prediction. Figure 6. Machine Learning and Data Mining general process. Source: Prepared by the author.. 5.1.CART Classification and Regression Trees, CART, is the most popular algorithm to create this kind of trees in Data Mining (Breiman, Friedman, Stone, & Olshen, 1984). A tree is developed when splitting a sample into smaller groups according to an independent variable, known as explanatory variable. Each splitting step depends only on one explanatory variable.. Garrido Camino, Carlos. 37.
(38) 5.Stock Market Prediction with Data Mining Techniques In that way, there are variables more discriminant than others according to their capability of splitting the sample into two more internally homogeneous groups. These groups should be as different to each other as possible. A branch of the tree is not split anymore when: . there is no discriminant enough variable, or there are not enough observations to make the analysis, or in the case of classification trees, all the observations have the same response class.. The development of a tree consists of two different steps, growing the tree and pruning it. As already mentioned, classification trees grow until each terminal node has observations with only one response class. Therefore, when working with big samples trees can grow very much. In such a way much means that the analysis of the tree is not worthwhile, so the analyst may prune the tree, eliminating unnecessary branches.Figure 7 shows two examples of classification trees.. Figure 7. Classification Trees examples. Left: Tree with 3 divisions; Right: Tree with 26 divisions. Source: Predicción del precio de la energía eléctrica utilizando modelos de minería de datos: árboles de clasificación y regresión, random forests y bagging (Juárez Barrios, Mira McWilliams, & González Fernández, 2013). In the previous paragraphs it has been said that the basic idea is to split the samples into more homogeneous groups. The homogeneity criterion, or purity criterion, depends on the nature of the response variable. When qualitative, the analyst works with a classification tree, if quantitative, with a regression tree. Therefore each type of response variable needs different criteria in order to split the sample correctly.. 5.2.BAGGING Bootstrap aggregating, widely known as bagging, is a technique complementary to CART. It is based on the creation of several trees in order to improve the predictive capability of the searched model. Bagging is a more computational demanding technique but its results confirm the improvement (Breiman, Bagging predictors, 1996) and it is widely used in the fields of biostatistics, remote sensing and many others.The idea that underlies Bagging trees is that part of the output error in a single regression tree comes from the specific choice of the training data set. Therefore, several similar data sets are created by resampling with replacement(Prasad, Iverson, & Liaw, 2006). 38. Escuela Técnica Superior de Ingenieros Industriales (UPM).
Documento similar
• An example of how tabular data can be published as linked data using Open Refine;.. •The expected benefits of linked data
• An overview of the metadata management and exchange approach implemented by Open Data Support through the Open Data Interoperability Platform...
Linked data, enterprise data, data models, big data streams, neural networks, data infrastructures, deep learning, data mining, web of data, signal processing, smart cities,
This work shows the practical utility on the combination of those Data Mining techniques with Complex Network methods, to automatically discover knowledge and collective trends
And finally, using these results on evolving data streams mining and closed frequent tree mining, we present high performance algorithms for mining closed unlabeled rooted
The density of points in parameter space gives you the posterior distribution To obtain the marginalized distribution, just project the points. To obtain confidence intervals,
[r]
In this paper we present an enhancement on the use of cellular automata as a technique for the classification in data mining with higher or the same performance and more