Adapting autonomously classification data mining algorithms for ubiquitous devices

Texto completo

(1)Adapting autonomously classification data mining algorithms for ubiquitous devices. Andrea Zanda Universidad Politécnica de Madrid. A thesis submitted for the degree of Doctor of Philosophy Yet to be decided.

(2) To my parents Nando and Pina..

(3) Acknowledgements. After all those years I would thank my supervisor: Ernestina for all her richness she has shared with me, her inspiration and her enthusiasm that kept me motivated. I would like also to thank Santiago for the assistance and availability he always provided with strong perseverance at all levels of the research project. From the same DaME group, I would like to thank Aysegul Cayci and Joäo Paulo Bártolo Gomes for his friendship, philosophical debates, exchanges of knowledge and skills, which helped to enrich the experience. In addition, i would also like to acknowledge Bettina Berendt, Myra Spiliopoulou, Yucel Saygin and Joäo Gama valuable seminars and lectures that had a significant impact on my background during the PhD. Among them, Yucel Saygin has further inspired me with his experience and knowledge during an exchange period in Sabanci University, Turkey. In conclusion, I recognize that this research would not have been possible without the financial support of Master and Back PhD grant from Regione Sardegna (Italy) and the Spanish Ministry of Science and Innovation which allowed to finance the presentation of this thesis results in several international conferences, Project TIN2008-05924. Further, thanks to Prof. Yucel Saygin an exchange research period in Sabanci University (Turkey) has been possible..

(4) Abstract. Due to recent scientific and technological advances in information systems, it is now possible to perform almost every application on a mobile device. The need to make sense of such devices more intelligent opens an opportunity to design data mining algorithm that are able to autonomous execute in local devices to provide the device with knowledge. The problem behind autonomous mining deals with the proper configuration of the algorithm to produce the most appropriate results. Contextual information together with resource information of the device have a strong impact on both the feasibility of a particular execution and on the production of the proper patterns. On the other hand, performance of the algorithm expressed in terms of efficacy and efficiency highly depends on the features of the dataset to be analyzed together with values of the parameters of a particular implementation of an algorithm. However, few existing approaches deal with autonomous configuration of data mining algorithms and in any case they do not deal with contextual or resources information. Both issues are of particular significance, in particular for social networks application. In fact, the widespread use of social networks and consequently the amount of information shared have made the need of modeling context in social application a priority. Also the resource consumption has a crucial role in such platforms as the users are using social networks mainly on their mobile devices. This PhD thesis addresses the aforementioned open issues, focusing on i) Analyzing the behavior of algorithms, ii) mapping contextual and resources information to find the most appropriate configuration.

(5) iii) applying the model for the case of a social recommender. Four main contributions are presented: • The EE-Model: is able to predict the behavior of a data mining algorithm in terms of resource consumed and accuracy of the mining model it will obtain. • The SC-Mapper: maps a situation defined by the context and resource state to a data mining configuration. • SOMAR: is a social activity (event and informal ongoings) recommender for mobile devices. • D-SOMAR: is an evolution of SOMAR which incorporates the configurator in order to provide updated recommendations. Finally, the experimental validation of the proposed contributions using synthetic and real datasets allows us to achieve the objectives and answer the research questions proposed for this dissertation..

(6) Contents Nomenclature. xi. 1 Introduction. 1. 1.1 1.2. Introduction and motivation . . . . . . . . . . . . . . . . . . . . Goals and contributions . . . . . . . . . . . . . . . . . . . . . . .. 1 4. 1.3. Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . .. 6. 2 Related work 2.1 2.2. 8. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Mining and Knowledge Discovery . . . . . . . . . . . . . . .. 8 9. 2.2.1. Knowledge Discovery Process . . . . . . . . . . . . . . . .. 10. 2.2.2. Types of Data Mining Tasks . . . . . . . . . . . . . . . . .. 13. 2.2.2.1 2.2.2.2. Classification . . . . . . . . . . . . . . . . . . . . Regression . . . . . . . . . . . . . . . . . . . . . .. 14 17. 2.2.2.3. Clustering . . . . . . . . . . . . . . . . . . . . . .. 18. 2.3. Ubiquitous systems . . . . . . . . . . . . . . . . . . . . . . . . . .. 23. 2.4 2.5. Ubiquitous Knowledge Discovery . . . . . . . . . . . . . . . . . . Context-aware data mining systems . . . . . . . . . . . . . . . . .. 23 25. 2.6. Meta-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 26. 2.7. Multi-objective optimization . . . . . . . . . . . . . . . . . . . . .. 27. 2.8 2.9. Social recommenders . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 28 30. v.

(7) CONTENTS. 3 Setting the problem. 31. 3.1 3.2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Algorithm behavior model . . . . . . . . . . . . . . . . . . . . . .. 31 35. 3.3. Choosing the appropriate configuration of the algorithm . . . . .. 37. 3.4. Social recommender . . . . . . . . . . . . . . . . . . . . . . . . . .. 38. 3.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 40. 4 Autonomous configurator. 42. 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 42. 4.2. EE-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 43. 4.2.1. Input of the EE-Model . . . . . . . . . . . . . . . . . . . . 4.2.1.1 Dataset Information . . . . . . . . . . . . . . . .. 43 43. 4.2.1.2. Configuration information . . . . . . . . . . . . .. 45. 4.2.2. Output of the model . . . . . . . . . . . . . . . . . . . . .. 45. 4.2.3. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3.1 Selecting the data mining algorithm . . . . . . .. 46 46. 4.2.3.2. Dataset of historical executions . . . . . . . . . .. 46. 4.2.3.3. Model of the behavior of the algorithm . . . . . .. 47. P-EE-Model . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4.1 Dataset particularization . . . . . . . . . . . . . .. 48 48. 4.2.4.2. Device particularization . . . . . . . . . . . . . .. 49. Deployment of the EE-Model . . . . . . . . . . . . . . . .. 50. SC-Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Generating the candidate set of configurations . . . . . . .. 51 53. 4.3.2. Finding the appropriate configuration . . . . . . . . . . . .. 55. 4.3.3. Multiobjective optimization algorithm . . . . . . . . . . .. 58. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 60. 5 Automatic recommendations in social networks 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 61 61. 4.2.4. 4.2.5 4.3. 4.4. 5.2. SOMAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 63. 5.3. Social Graph Generator . . . . . . . . . . . . . . . . . . . . . . .. 66. 5.3.1 5.3.2. 67 68. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . Social graph computation . . . . . . . . . . . . . . . . . .. vi.

(8) CONTENTS. 5.4. 5.5 5.6. Activity recommendation . . . . . . . . . . . . . . . . . . . . . . .. 71. 5.4.1 5.4.2. Data Integration . . . . . . . . . . . . . . . . . . . . . . . Activity Recognizer . . . . . . . . . . . . . . . . . . . . . .. 72 72. 5.4.3. Social Graph applicator . . . . . . . . . . . . . . . . . . .. 72. 5.4.4. Activity Matcher . . . . . . . . . . . . . . . . . . . . . . .. 73. Change Detector and Updater . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 74 75. 6 Experiments. 77. 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 77. 6.2 6.3. Experiment setting . . . . . . . . . . . . . . . . . . . . . . . . . . EE-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 78 79. 6.3.1. 80. 6.4. 6.3.1.1. Methodology to generate the historical execution 81. 6.3.1.2. dataset . . . . . . . . . . . . . . . . . . . . . . . Configurations of the algorithm for the historical dataset . . . . . . . . . . . . . . . . . . . . . . .. 82. 6.3.1.3. Dataset chosen for the executions . . . . . . . . .. 84. 6.3.1.4 6.3.1.5. Environment setting for the executions . . . . . . Resulting datasets of executions . . . . . . . . . .. 88 88. 6.3.1.6. Datasets generated . . . . . . . . . . . . . . . . .. 90. 6.3.1.7. Discussion . . . . . . . . . . . . . . . . . . . . . .. 91. 6.3.2. Comparison of the different EE-Models obtained . . . . . . 95 6.3.2.1 Discussion . . . . . . . . . . . . . . . . . . . . . . 100. 6.3.3. EE-Model Vs P-EE-Model . . . . . . . . . . . . . . . . . 100. SC-Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.4.1. 6.5. Generation of historical Datasets of executions . . . . . . .. Candidate set of configurations . . . . . . . . . . . . . . . 105 6.4.1.1 Data preparation . . . . . . . . . . . . . . . . . . 106 6.4.1.2. Modeling . . . . . . . . . . . . . . . . . . . . . . 106. 6.4.1.3. Evaluation . . . . . . . . . . . . . . . . . . . . . 106. SOMAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.5.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.5.2. Social Graph generation performance analysis . . . . . . . 109. vii.

(9) CONTENTS. 6.6. SOMAR and the EE-Model . . . . . . . . . . . . . . . . . . . . . 112 6.6.1. Analysis of the results . . . . . . . . . . . . . . . . . . . . 115. 7 Conclusions and future work. 122. 7.1. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122. 7.2. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.2.1 7.2.2. EE-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 SC-Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . 124. 7.2.3. SOMAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125. References. 134. viii.

(10) List of Figures 2.1. Ubiquitous Knowledge Discovery, Meta-Learning, Multi-objective Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8. 2.2. Example of a dendrogram . . . . . . . . . . . . . . . . . . . . . .. 21. 2.3. Single, complete and average linkage algorithms . . . . . . . . . .. 22. 3.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 34. 3.2 3.3. Mining algorithm inputs and outputs . . . . . . . . . . . . . . . . User relationships in social networks . . . . . . . . . . . . . . . .. 35 39. 4.1. The configurator module. . . . . . . . . . . . . . . . . . . . . . . .. 42. 4.2 4.3. Data mining algorithm inputs and outputs . . . . . . . . . . . . . EE-Model inputs and outputs . . . . . . . . . . . . . . . . . . . .. 44 50. 4.4. SC-Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 53. 4.5. Clustering the the algorithm configurations to obtain a represen-. 4.6. tative for each cluster. . . . . . . . . . . . . . . . . . . . . . . . . The steps for finding the appropriate configuration. . . . . . . . .. 5.1. Recommendation input, process and output. The process can be divided into two phases: the generation of the recommender and the application of the recommender. . . . . . . . . . . . . . . . . .. 5.2. 54 56. 64. A social network of users with a mobile phone and Facebook access. On the right, the zoom shows the SOMAR architecture. . . . . . .. 66. The steps of social graph generation (left) and partial corresponding results of the final graph (right). . . . . . . . . . . . . . . . . .. 69. 5.4. the application process of the recommender. . . . . . . . . . . . .. 71. 5.5. Examples of how people share activities on Facebook. . . . . . . .. 73. 5.3. ix.

(11) LIST OF FIGURES. 6.1. Distribution of attributes 1(left-up), 2(right-up), 7(left-down) and 86. 6.2. 8(right-down) for original dataset. . . . . . . . . . . . . . . . . . . Distribution of attributes 1(left-up), 2(right-up), 7(left-down) and 8(right-down) for a dataset sample of 877 records. . . . . . . . . .. 86. 6.3. Distribution of attributes 1(left-up), 2(right-up), 7(left-down) and. 6.4. 8(right-down) for a dataset sample of 152 records. . . . . . . . . . Distribution for attribute 4 . . . . . . . . . . . . . . . . . . . . . .. 6.5. Maximum memory consumption when variating the configurations. 90. 6.6. root mean squared error of the model when variating the configu-. 6.7. rations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Accuracy of the model when variating the configurations . . . . .. 91 92. 6.8. Accuracy of the check configuration with different dataset. 93. 6.9. mean absolute error of the check configuration with different dataset 93. . . . .. 87 87. 6.10 root mean squared error of the check configuration with different dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 94. 6.11 CPU cycles of the check configuration with different dataset . . .. 94. 6.12 Average memory of the check configuration with different dataset. 95. 6.13 The models obtained for the variable maximum memory. . . . . . 6.14 The models obtained for the variable accuracy. . . . . . . . . . .. 97 98. 6.15 The models obtained for the variable root mean squared error. . .. 99. 6.16 Comparison of the data mining techniques: 1(M5P), 2(Linear regression), 3(REPTREE). . . . . . . . . . . . . . . . . . . . . . . . 101 6.17 Comparing average memory . . . . . . . . . . . . . . . . . . . . . 104 6.18 Comparing CPU cycles . . . . . . . . . . . . . . . . . . . . . . . . 105 6.19 The output of hierarchical clustering algorithm performed on the configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.20 The result of the Calinski and Harabasz measure to the clustering model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.21 A zoom of the dendrogram resulting from hierarchical clustering. . 110 6.22 CPU time in relation to the number of Root’s friends. . . . . . . . 111 6.23 CPU time of Step 3 while altering the number k. . . . . . . . . . 112 6.24 The correlation between the number of clusters and CPU cycles. . 115 6.25 The correlation between the number of clusters and and memory.. x. 116.

(12) LIST OF FIGURES. 6.26 The correlation between CPU cycles and memory. . . . . . . . . . 117 6.27 The correlation between the number of clusters and the error of the model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.28 The correlation between the number of items and CPU cycles. . . 118 6.29 The correlation between the number of items and and memory). . 118. xi.

(13) Chapter 1 Introduction 1.1. Introduction and motivation. The advances in wireless technologies, advanced electronics and the Internet have made Ubiquitous computing (ubicomp) possible. Ubiquitous computing environments are widespread today with the popularity of the small devices Arcelus et al. (2007), Bahati & Bauer (2008), Beetz et al. (2007), Bergmann (2007). The computation model has changed, the desktop model is giving way to human-computer interaction in which information processing has been thoroughly integrated into everyday objects and activities. More formally, Ubiquitous computing is defined as ”machines that fit the human environment instead of forcing humans to enter theirs.” York & Pendharkar (2004). This paradigm is also described as pervasive computing, ambient intelligence Ducatel et al. (2001), Internet of Things Mattern & Floerkemeier (2010), where each term emphasizes slightly different aspects. The dominant features of ubicomp devices are: they have constrained resources, continuous stream of data reception and usage in different contexts; their advent is also making ubiquitous access to large quantity of distributed data a reality. Knowledge Discovery in Databases (KDD) in such devices for extracting useful knowledge is the next natural step in the increasingly connected world of ubiquitous computing. The term KDD refers to the broad process of finding knowledge in data. KDD is of interest for researchers in machine learning, pattern recognition, databases,. 1.

(14) 1.1 Introduction and motivation. statistics, artificial intelligence, knowledge acquisition for expert systems, and data visualization, where the unifying goal is to extract knowledge from data in the context of large databases. KDD makes use of data mining techniques to extract knowledge according to specification of measures and preprocessing and transformations of a determined database - but further, KDD is the high-level application of such techniques as it involves the evaluation and possibly interpretation of the patterns. Efforts in systematization of the process have lead to the definition of the process model of CRISP-DM, Chapman et al. (2000). Focusing on the modeling phase of the CRISP-DM, data mining methods have been proposed to discover patterns in data for classification, estimation, prediction, affinity grouping, clustering, description. Such methods can be grouped in the following categories: Neural networks (pattern recognition), Memory Based Reasoning, Cluster Detection, Market Basket Analysis, Link Analysis, Visualization, Decision Tree/Rule Induction, Genetic Algorithms and OLAP. Despite of the efforts in systematizing the process of KDD, the process still requires of data mining experts to lead the process Marbán et al. (2009). If computers could be as commonplace as the written word, our everyday world would be transformed. That was the believe of the visionary scientist Mark Weiser, now the next natural step as the computer are everywhere is to make them intelligent by incorporating the capability to generate knowledge Warren (2004), but the integration of Data Mining techniques in ubicomp devices is still a challenge. The integration of data mining techniques can be performed in different ways, here we describe the main approaches. One possibility is to generate knowledge offline and then integrate it into the ubicomp devices, another possibility is to provide a cloud service which is in charge of mining the data sent from the ubicomp device and output the knowledge models back to the ubicomp device, to end, it is also possible to execute the KDD process locally in the ubicomp devices whenever needed. In this thesis we focus on the last approach, so the data mining algorithms able to generate the models are executed in the devices. This approach has the main advantage of keeping the user data private as it is. 2.

(15) 1.1 Introduction and motivation. not sent to third parties as we have described in the first two approaches. The main drawback is related to the resource consumption needed to compute the data mining algorithms in the device. Before facing the challenges of such integration we outline the main factors motivating the need for the integration (drivers) and the factors easing the integration (enablers). Among the drivers we highlight: i) demand of intelligence ii) privacy iii) personalization. For intelligence here we refer to the capability of the machines to learn from past behavior, this can be considered the main driver forcing the integration. Another driver refers to the privacy of the ubicomp device users, in fact sensitive user data has to be protected from third parties; data mining models generated locally in the devices guarantees the privacy of such data. To end, as ubicomp requires computers to exist in the everyday world of people, users require always more personalized support and results. The enablers are the factors easing the integration of data mining in ubicomp devices, and we highlight here i) data availability and ii) computational power of devices. Ubicomp devices are able to collect information about user location, mood, schedules, social activities, context to name but a few, enabling the use of data mining techniques to mine such data. The computational power of the ubicomp devices as memory, CPU, storage, among others, allows almost any everyday task to be processed, data mining tasks included. Despite the drivers and enablers, the application of data mining in ubicomp devices still has to face some challenges. We will focus in this thesis on the following: i) autonomy ii) maintenance and evolution of data mining models and iii) feasibility. Autonomy is the ability to determine independently the actions to be taken. The definition of MerrianWebster dictionary is ” the quality or state of being self-governing; especially, the right of self-government”. In the case of data mining algorithms, autonomy refers to the challenge to carry out the data mining process without any human intervention, ”self-governing” the decisions to be taken before running the algorithm. Consequently a method able to configure a data mining algorithm autonomously is needed. In a sense, autonomy is related to adaptability, that is defined as ”capable of being or becoming adapted; to make. 3.

(16) 1.2 Goals and contributions. fit (as for a new use) often by modification”. The relation is in the sense that the proposed method that self-governs the process should adapt to new requirements or new context variables influencing the process. In fact, the method has to be able to configure the algorithm according to the quality of the expected results and the resources needed to obtain such quality, taking into account the available resources of the device in which it will execute. On the other hand, the changing nature of the world implies that data constantly change. Consequently, also the mining models have to change and be maintained when data evolves. This is to say that the method to autonomous configure the algorithm will have to be executed each time a change on the underlying data population advises of outdated models.. The challenge of feasibility can be divided in: (1) the capability of being done or carried out, and (2) the capability of being used or dealt with successfully. It is related to two main concepts, on one hand checking the data mining execution of such a configuration can be carried out and this has to do both with the computational cost of the process and with the available resources on the machine and on the other hand, feasibility is related with the possibility of the process being successfully, which implies the achievement of results complying the mining objective. Obtaining a ”proper” configuration for a data mining algorithm implies to check exhaustively all the possible configuration, and this is not feasible due the exponential number of the possible configurations. So the challenge will be to find a solution method which is possible to be executed in a ubicomp device and which can provide mining results complying with the mining objectives.. 1.2. Goals and contributions. The main goal of the thesis can be stated as: development of autonomous, adaptable and feasible method able to configure data mining algorithms and apply it to a social recommender.. 4.

(17) 1.2 Goals and contributions. In order to achieve this goal the following subgoals have to be fulfilled: • Modeling the behavior of data mining algorithms. • Customizing models of the behavior of data mining algorithm for predetermined domains. • Configuring data mining algorithms under particular context and resource requirements based on their behavior model. • Finding a fixed set of configurations representing in the most accurate possible way the kinds of behavior of a data mining algorithm. • Developing a social recommender. • Instantiate the model in the social recommender. The solution we propose in this thesis is based on the following hypothesis: • H1: deterministic nature of a data mining algorithm. • H2: representation of the behavior of the of a data mining algorithm in terms of resource consumed and quality of the model it will obtain is possible. • H3: inputs as dataset and algorithm parameter affect the final result in terms of quality of the model and cost of the execution. • H4: the model of the behavior can provide better results when the domain or the device are fixed. • H5: a method to calculate the best configuration of an algorithm in a particular situation based on the algorithm behavior model can be a feasible solution. • H6: identification of a behavior model of a data mining algorithm is possible. • H7: building a recommender based on a social network in a ubiquitous device is possible.. 5.

(18) 1.3 Structure of the thesis. • H8: recommendations in a social network should be based on a dynamic model of the behavior of the user.. Consequently the main thesis contributions of the thesis are as follows: • The EE-Model: A behavior model able to estimate the cost of the execution of a given algorithm with a given configuration. • The P-EE-Model a customized method of the behavior model as well as the method to obtain the customization. • SC-mapper: a configurator of data mining algorithms able to be executed locally in the device. • SOMAR: a recommender for mobile devices. • Dynamic SOMAR: a version of SOMAR in which the EE-Model presented is used to dynamically adapt recommendations made to the user.. 1.3. Structure of the thesis. In Section 2 we present the related work of this thesis, in which we provide the background knowledge of the three main fields involved: Ubiquitous knowledge discovery, meta-learning and multi-objective optimization. In Section 3 we analyze the problems related to the execution of data mining algorithms in ubiquitous devices, in particular how to estimate the behavior of a data mining algorithm in terms of resources consumed and quality of the mining models obtained in order to select the most appropriate algorithm configuration. In the same section, we also face the problem of developing a social recommender. The Section 4 deals with the solution, we present the EE-Model to provide the estimations of a data mining algorithm as well as the SC-Mapper which is in charge of selecting the most appropriate configuration based on the EE-Model estimations. In Section 5, we present a social recommender for activities (SOMAR), in which we integrate the configurator in order to provide the user with always updated recommendations.. 6.

(19) 1.3 Structure of the thesis. To end, Section 6 is the section in which the experiments on the configurator and SOMAR are presented. In particular in this section we firstly provide the reader with some tests about the feasibility and performances of the EE-Model in Section 6.3, then we describe some general performance experiments on the SCMapper and to end we analyze both performance of the generation of SOMAR recommendation and the integration of the EE-Model with SOMAR. In Section 7 we present conclusions and future work that result from the development of this research.. 7.

(20) Chapter 2 Related work 2.1. Introduction. In this chapter we review the works related to the contribution presented with this thesis, the main topics - which are depicted in Figure 2.1 - involved are: Ubiquitous Knowledge Discovery (UKD), Meta-Learning, Multi-objective Optimization (MO).. Figure 2.1: Ubiquitous Knowledge Discovery, Meta-Learning, Multi-objective Optimization. 8.

(21) 2.2 Data Mining and Knowledge Discovery. Consequently, in the following we will describe the most relevant works relative to the above topics and we will also present background knowledge for this thesis concerning ubiquitous systems, data mining process and standardization and context-aware data mining systems. Further we will present the works related to social recommenders in order to frame our social recommender.. 2.2. Data Mining and Knowledge Discovery. Knowledge Discovery from Databases (KDD) is a field of computer science which in Fayyad et al. (1996) is addressed as: ” the process of mapping low-level data into other forms that might be more compact, more abstract, or more useful”. KDD is a field of intersection of statistics, databases, machine learning, and artificial intelligence, the term was introduced at the first KDD workshop in 1989 Piatetsky-Shapiro (1991), and since then it has become popular. KDD refers to the overall process of discovering useful patterns from data, while data mining refers to a particular step in this process; in fact the KDD process is commonly defined in Fayyad et al. (1996) with the stages: 1. Selection 2. Pre-processing 3. Transformation 4. Data Mining 5. Interpretation/Evaluation. Nevertheless, both KDD and data mining terms are used interchangeably, even if the term data mining has became increasingly more popular in the business community and in the press. KDD is an interdisciplinary field, that combines results from different fields such as, statistics, databases, machine learning and visualization, in order to. 9.

(22) 2.2 Data Mining and Knowledge Discovery. find these interesting patterns. While Data mining has much in common with these fields, particularly, statistics and with machine learning, there are differences that go beyond them. On one hand, KDD focuses on the entire process of knowledge discovery from data, including: data cleaning; how the data is stored and accessed; machine learning algorithms; how these algorithms can be scaled to massive data sets and still run efficiently; how the results can be interpreted and visualized; and how the overall man-machine interaction can usefully be modeled and supported. On the other hand, data mining focuses on the process of discovering patterns in data.. 2.2.1. Knowledge Discovery Process. KDD cannot be seen as the simple application of a machine learning method to a dataset in a single step, the KDD is a complex continuos process with many loops and feedbacks. In 1997, with the goal to establish a standard for the process of Data Mining, an industry group called CRISP-DM (The CrossIndustry Standard Process for Data Mining) Chapman et al. (2000) proposed a standard KDD process with the following main steps: • Business (or Problem) Understanding: Initial phase that focuses on understanding the project objectives and requirements from a business perspective. This knowledge is then converted into a data mining problem where a preliminary plan is designed to achieve the project objectives. • Data Understanding: This phase starts with an initial data collection and proceeds with several activities in order to: get familiar with the data; identify data quality problems; discover first insights about the data; or detect interesting data subsets to form hypotheses about hidden information. • Data Preparation (including data cleaning and preprocessing): Covers all activities to construct the final dataset (i.e., data that will be fed into the modeling tool(s)) from the initial raw data. Data preparation tasks are likely to be performed multiple times, and not in any prescribed order. Tasks include selection and transformation of tables, records, and attributes and cleaning the data for the modeling tools.. 10.

(23) 2.2 Data Mining and Knowledge Discovery. • Modeling (applying machine learning and data mining algorithms): Various modeling techniques are selected and applied. Their parameters are calibrated to optimal values. Typically, there are several techniques for the same data mining problem type. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often needed. • Evaluation (checking the performance of these algorithms): Before proceeding to final deployment of the model (or models) that have been built, it is important to more thoroughly evaluate the it, and review the steps executed to construct the model to be certain it properly achieves the business objectives. • Deployment: The creation of the model is generally not the end of the project. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that the customer can use it. In many cases it will be the customer, not the data analyst, who will carry out the deployment steps.. Researchers in data mining and knowledge discovery are creating new, more automated methods for discovering knowledge to meet the needs of the 21st century. This need for analysis will keep growing, driven by the business trends of one-toone marketing, customer-relationship management, enterprise resource planning, risk management, intrusion detection and Web personalization - all of which require customer-information analysis and customer-preferences prediction. Deploying a data mining solution requires collecting data to be mined, cleaning and transforming its attributes to provide the inputs for data mining models. Also these models need to be built, used and integrated with different applications. Moreover it is required that currently deployed data management software be able to interact with the data mining models using standards APIs. The scalability aspect calls for collecting data to be mined from distributed and remote. 11.

(24) 2.2 Data Mining and Knowledge Discovery. locations. Employing common data mining standards greatly simplifies the integration, updating, and maintenance of the applications and systems containing the models, Clifton & Thuraisingham (2001). Over the past several years, various data mining standards have matured and today are used by many of the data mining vendors, as well as by others building data mining applications. With the maturity of data mining standards, a variety of standards-based data mining services and platforms can now be much more easily developed and deployed. Related fields such as data grids, web services, and the semantic web have also developed standards based infrastructures and services relevant to KDD. These new standards and standards based services and platforms have the potential for changing the way the data mining is used. The data mining standards as CRISP-DM (Chapman et al. (2000)), JDM (Hornick et al. (2007)) and PMML (Guazzelli et al. (2009)) cover one or more of the following issues: • The overall process by which data mining models are produced, used, and deployed. This includes, for example, a description of the business interpretation of the output of a classification tree. • A standard representation for data mining and statistical models. This includes, for example, the parameters defining a classification tree. • A standard representation for cleaning, transforming, and aggregating attributes to provide the inputs for data mining models: This includes, for example, the parameters defining how zip codes are mapped to three digit codes prior to their use as a categorical variable in a classification tree. • A standard representation for specifying the settings required to build models and to use the outputs of models in other systems: This includes, for example, specifying the name of the training set used to build a classification tree. • Interfaces and Application Programming Interfaces (APIs) to other languages and systems: There are standard data mining APIs for Java and. 12.

(25) 2.2 Data Mining and Knowledge Discovery. SQL. This includes, for example, a description of the API so that a classification tree can be built on data in a SQL database. • Standards for viewing, analyzing, and mining remote and distributed data: This includes, for example, standards for the format of the data and metadata so that a classification tree can be built on distributed web-based data.. 2.2.2. Types of Data Mining Tasks. The modeling phase focuses on the application of data mining algorithms in order to discover patterns in data; the algorithms can be divided in two big categories: • Descriptive: oriented to data interpretation, which focuses on understanding the underlying data relations, such as finding correlations, trends, groups, clusters and anomaly detection. This type of techniques are of exploratory nature. • Predictive: based on inductive learning, where a model is constructed by generalizing a set of training records. This model aims to predict the value of a particular attribute (i.e., target variable) based on the values of other attributes. The underlying assumption of the inductive approach is that the trained model is able to accurately predict the values of future unseen records. We briefly introduce the main data mining task and then we go in depth in the description of the techniques we will use in this thesis: clustering and classification techniques as regression and decision tree algorithms. • Clustering: is the division of data into groups (clusters) that contain similar records (according to some similarity measure) and that separates dissimilar records into different clusters. Jain et al. (1999a) presents a taxonomy of clustering techniques, and an overview of its fundamental concepts and methods. Moreover, it describes several successful applications of clustering such as image segmentation or object and character recognition. A more recent survey on this topic can be found in Kogan et al. (2006).. 13.

(26) 2.2 Data Mining and Knowledge Discovery. • Association Rule Mining: explores the relations between attributes that exist in data and thus detecting attribute-value conditions that occur frequently together. Popularized due to Agrawal et al. (1993) that proposed Apriori, the best-known algorithm to mine association rules. Furthermore, association rules can be used as the basis for decisions about marketing activities and its best known application is market basket analysis. A survey on the algorithmic aspects of association rule mining can be found in Hipp et al. (2000). • Prediction: aims to predict future trends and behavior patterns. The data of a prediction task is divided in class attribute, which is the target of the prediction and the training data, from which the prediction is built. Prediction is also divided into: – Regression: that attempts to predict a numeric attribute value. – Classification: which aims to predict a discrete class label. 2.2.2.1. Classification. Classification is a predictive technique involving discrete class attributes where the goal is to learn/build a concise model (hypotheses) from training data (i.e., labeled records) and apply this model to predict the class of new/future unlabeled records. Applications of classification include computer vision (ex. medical diagnostics), drug discovery, document classification, internet search engines, speech recognition, performance prediction, selective marketing, to name a few. More formally, let X be the feature space and its possible values. Let Y be the space of possible discrete class labels of the variable to predict. Classification algorithms try to learn the function f : X → Y that assigns the right class label to any unlabeled record. However, its not possible to know f directly and classification algorithms learn an approximate function g : X → Y given a set of correctly labeled records. The classification algorithm aims to minimize the expected error between the learned g and the true concept f . Consequently, the. 14.

(27) 2.2 Data Mining and Knowledge Discovery. classifiers are usually evaluated assessing their prediction accuracy (i.e., the number of correct predictions divided by the total number of predictions). In the last years, many classification algorithms have been proposed, in the following we group and describe the most successful and well known types. • Logic based approaches have been developed based on Artificial Intelligence logical/symbolic techniques. The most important are: – Decision trees. The node of the decisions tree is an attribute while the edge between two nodes is a condition on the value of an attribute. Leaf nodes give a classification that applies to all instances that reach the leaf, or a set of classifications or a probability distribution over all possible classifications. To classify an unknown instance, it is routed down the tree according to the values of the attributes tested in successive nodes, and when a leaf is reached the instance is classified according to the class assigned to the leaf. The problem of constructing a decision tree is a recursive process, first, select the attribute to place at the root of the tree and make one branch for each possible value of the attribute. Then the process can be repeated for each branch which involves a subset of the original data. If all instances at the node have the same classification, stop developing that part of the tree. The priority in the placement of the attributes, for example the decision of which attribute set as root, can be decide on the basis of different measures as information gain Hunt et al. (1966) and gini index Breiman et al. (1984). ID3 algorithm Quinlan (1986) is one of the most popular algorithm, a series of improvements to ID3 culminated in a practical and influential system for decision tree induction called C4.5, these improvements include methods for dealing with numeric attributes, missing values, noisy data, to name the most important. In this thesis we present how to learn the behavior of an implementation of C4.5 in terms of consumed resources and quality of the model it will obtain.. 15.

(28) 2.2 Data Mining and Knowledge Discovery. – Rule induction, the aim is to find frequent patterns, associations, correlations, or causal structures among sets of items or objects in the training data with a set of rules. Rules induction is common in basket data analysis, cross-marketing and catalog design among others. In Quinlan (1987) to create these rules from decision trees, where each path from the root to a leaf represents a rule is proposed. However, direct induction algorithms have also been proposed such as the RIPPER Cohen (1995). Frnkranz (1999) provides an extensive overview of existing works in rule induction. • Artificial Neural Networks (ANN), also called neural network (NN), is inspired by the structural aspects of biological neural networks. ANN are represented by an interconnected group of artificial neurons, the information is represented by the connections between the neurons (connectionist approach). The simplest kind of neural network is the single layer perceptron Rosenblatt (1962). However, since perceptron-like methods are binary, in the case of multi-class problems, this problem must be reduced to multiple binary classification problems. Moreover, single layer perceptrons are only capable of learning linearly separable functions. Therefore, multiple layer perceptrons have been proposed, a good review of ANN works can be found in Zhang (2000). • Statistical based approaches, use a probabilistic model to assign records to classes instead of simply assigning a class label to it. Bayesian networks (BN) are amongst the most well known statistical learning algorithms, Stephenson (2000) provides a complete review. The Naive Bayes Algorithm is a well known BN approach that assumes independence amongst the attributes Cestnik et al. (1987). • Instance based algorithms, uses a distance function to determine which record of the training set is closet to an unknown test instance. Once the nearest record has been located, its class is predicted for the test instance. For instance based algorithms no explicit model is created in advance, consequently they require less computation time during the training phase when. 16.

(29) 2.2 Data Mining and Knowledge Discovery. compared to the other algorithms that explicitly create a model. However, they require more computation time during the classification process. In Cover & Hart (1967) one of the most well known algorithms is presented. It classifies a record using the most frequent class in its k nearest (using some distance metric) records. The distance metric must minimize the distance between two similarly classified records, while maximizing the distance between records of different classes. Simoudis & Aha (1997) provides a review of instance-based classifiers. • Support Vector Machines (SVM), take a set of input data and predicts, for each given input, which of two possible classes forms the input. SVM are non-probabilistic binary linear classifiers that model concepts by creating hyperplanes in a multidimensional space and records are mapped into this space where they become linearly separable. SVM were proposed in Vapnik (1995) and a comprehensive tutorial can be found in Burges (1998). 2.2.2.2. Regression. When the target class to predict is a numeric value, linear models are the natural technique to apply. There are several and complex algorithms in the this field, a review can be found in Roweis & Ghahramani (1999). Here we focus on a basic algorithm we have used for the generation of the EE-Model in Section 4.2: linear regression. The basic idea is to express the numeric value of the class as a linear combination of the attributes with determined weights. An example can the following: x = w0 + w1 a1 + w2 a2 + ... + wk ak. (2.1). where x is the value of the class to predict, a1 , a2 , ..., ak are the attributes and w0 , w1 , w2 , ..., wk are the weights. The weights are calculated from the training data. More formally, the first instance will have a class, say x(1) , and attribute (1). (1). (1). values a1 , a2 ,..., ak .The predicted value for the first instance’s class can be. 17.

(30) 2.2 Data Mining and Knowledge Discovery. written as: (1) w 0 a0. +. (1) w 1 a1. +. (1) w2 a2. + ... +. (1) w k ak. =. k X. (1). w i ai. (2.2). i=0. This is the estimated value for the first instance’s class. In order to have a better model, the difference between the estimated value of the class and the actual has to be minimized, in fact the method consists to choose the coefficients wi to minimize the sum of the squares of the differences associated to all the training instances. Supposing there are n training instances, then the sum of the squares of the differences is: n X. (x(i) −. i=1. k X. (i). wj aj )2. (2.3). j=0. where the expression inside the parentheses is the difference between the ith instance’s actual class and its predicted class. This sum of squares is what the algorithm tries to do by choosing the coefficients appropriately. The result of the minimization is a set of numeric weights, based on the training data, which we can use to predict the class of new instances. The drawback of Linear regression concerns the linearity of the model, if the data exhibits a non-linear dependency, the result will always be the ”best”straight line. 2.2.2.3. Clustering. The goal of clustering is to group similar objects, a review of the problem and an overview of the main algorithm can be found in Jain et al. (1999b). The clustering algorithms can be divided into two macro categories: • Agglomerative hierarchical; • Partitioning.. Partitioning Clustering The partitioning clustering algorithms organize the n objects of a data set D in k groups, where k is an input parameter. The way they divide the n objects. 18.

(31) 2.2 Data Mining and Knowledge Discovery. depends on an objective criterion, which minimizes the distance between each object and its cluster center (or cluster distribution in the case of EM algorithm). To find the k centers (or distributions) has a NP-hard complexity, here we present two partitioning algorithms which differ from their objective criterion (K-means and EM ), but they are similar for the iterative relocation shown in algorithm 1.. Algorithm 1 Iterative relocation Input: the set of n objects D and the number of clusters k Output: the set of object D organized into k clusters, with the maximization of the objective function F Method: (1) arbitrary choose k centers/distributions as the initial solution (2) repeat (3) (re)compute membership of the objects; (4) update some/all cluster centers/distributions according to new memberships; (5) until no change to F ;. The K-means, given a data set D with a n objects, the K-means algorithm (MacQueen) divides the objects in k groups, where k is an input parameter. The algorithm calculates the cluster center as the mean value of its objects and uses the squared-error function as object criterion:. SSE =. k X X. dist(x, c̄i )2. (2.4). i=1 x∈Ci. where dist(x, c̄i ) represents the Euclidean distance between an instance x and its cluster center c̄i . The algorithm follows the steps of Algorithm 1, it starts with an arbitrary selection of centroids and until the function SSE changes no more after an iteration, the step 3 assigns each object to the closest center and the step 4 calculate the centers of the new clusters. The K-means does not guarantee a global optimum. It often finds a local optimum and has O(nkt) complexity, where t is the number of iterations.The quality of the final solution depends strongly on. 19.

(32) 2.2 Data Mining and Knowledge Discovery. the initial set of clusters and on the data, it is very sensitive to noise and outliers.. The EM algorithm proceeds similar to k-means but models the membership to a cluster probabilistically. The algorithm views the data as sample from a mixture of probability distributions and maximizes the likelihood of the mixture model. Given a data set D with n objects and k number of clusters as input parameter, the probability of the object x to belong to the cluster k1 is calculated with the formula: P (i|x) = Si. P (x|i) P (x). (2.5). where Si is the part of D containing the objects of ki . The EM algorithm is similar to Algorithm 1, in Step 3 the membership probability of the n objects is computed. The Step 4 approximates the new clusters with a mixture of Gaussians. The loop repeats until there is no eligible increase in the log likelihood to the objective criterion: E=. X. log(P (x)). (2.6). x∈D. The lack of partitioning algorithm is that k, the number of clusters has to be specified as input. However, cross validation can be used to estimate k by means of the objective criterion. Starting with a single cluster, the number of clusters is incremented as long as the average likelihood over all test folds increases.. Hierarchical Clustering Hierarchical clustering is the second family of algorithms we have applied to this thesis work. They can be divided in two groups: • Agglomerative: they start from clusters representing single instances and then they continue merging clusters until one single cluster is obtained;. 20.

(33) 2.2 Data Mining and Knowledge Discovery. • Divisive: it is the reverse process of agglomerative. The algorithms start dividing a single cluster until all instances belong to a single cluster. This approach is described more formally in the Algorithm 2.. Algorithm 2 Agglomerative hierarchical clustering algorithm (1) Compute the proximity matrix if needed; (2) repeat (3) Merge the closest two clusters; (4) Update the proximity matrix to reflect the proximity between the new cluster and the other clusters; (5) until One cluster remains;. Hierarchical algorithms have the characteristic that the number of clusters is not specified as input, but can be decided a-posteriori. In fact, the output is a tree-like structure called dendrogram, which displays how the clusters are created. An example of a dendrogram can be seen in figure 2.2.. Figure 2.2: Example of a dendrogram. There are many different agglomerative and divisive algorithms, and they differ from the way they calculate the proximity between clusters. We can distin-. 21.

(34) 2.2 Data Mining and Knowledge Discovery. guish them in two groups: i) the algorithms which calculate the proximity based on centroids and ii) the algorithms which do not. The second group includes the single linkage, average linkage and complete linkage. The single linkage calculates the proximity considering the closest instances of the two clusters, Figure 2.3 (a), while the complete linkage considers the most distant instances, figure 2.3 (b). The average linkage compute an average distance, figure 2.3 (c).. Figure 2.3: Single, complete and average linkage algorithms. In the first group the clusters are represented by a centroid, therefore the cluster proximity is calculated between cluster centroids, with Euclidean distance for example. Wards algorithm also assumes a centroid method, but it calculates the proximity with the squared-error function that results from merging the two clusters. The hierarchical algorithms do not need the number k of clusters as input. It can be chosen or calculated afterwards, intuitively it could be interpreted as cutting horizontally a dendrogram. There are different methods to calculate the best k, the criterion we have used in this work is the Calinski and Harabasz stopping rule. The idea of the Calinski and Harabasz stopping rule is to compute the intercluster similarity over the intra-cluster similarity. The inter-cluster similarity is the sum of squared errors between the k clusters and the intra-cluster similarity is the internal sum of squared errors for the k clusters. The best number of cluster. 22.

(35) 2.3 Ubiquitous systems. is the biggest number defined by: CH(k) =. B(k)/(k − 1) W (k)/(N − k). (2.7). where B(k) is the between cluster sum of squares and W (k) the within cluster sum of squares.. 2.3. Ubiquitous systems. Pervasive computing, ubiquitous computing, and ambient intelligence are concepts evolving in a plethora of applications in several domains. In the literature, pervasive computing is loosely associated with the further spreading of miniaturized mobile or embedded information and communication technologies (ICT) with some degree of ’intelligence’, network connectivity and advanced user interfaces Saha & Mukherjee (2003). Because of its ubiquitous and unobtrusive analytical, diagnostic, supportive, information and documentary functions, pervasive computing is predicted to improve traditional applications in several domains. In the domain of healthcare for example, some of its capabilities, such as remote, automated patient monitoring and diagnosis Mattila et al. (2007), may make pervasive computing a tool advancing the shift towards home care, and may enhance patient self-care and independent living. Automatic documentation of activities, process control or the right information in specific work situations as supplied by pervasive computing are expected to increase the effectiveness as well as efficiency of health care providers. For example, in hospitals pervasive computing has the potential to support the working conditions of hospital personnel, e.g., highly mobile and cooperative work, use of heterogeneous devices, or frequent alternation between concurrent activities. ’Anywhere and anytime’ are becoming keywords a development often associated with ’pervasive health-care’.. 2.4. Ubiquitous Knowledge Discovery. The availability of data and the increased computational power has motivated in the past the development of Machine Learning, which is the ability to predict. 23.

(36) 2.4 Ubiquitous Knowledge Discovery. future trends on the basis of training data. Nowadays, Ubiquitous Computing stimulates a similar leap forward. In fact, small devices are spread everywhere, they enable registration of large amounts of information, thus generating a wide range of new types of data for which new learning and discovery methods are needed. Consequently, the challenge for next generation data mining focuses on the extension of Data Mining to ubiquitous computing. Ubiquitous Knowledge Discovery (UKD) is then a new discipline that is the consequence of the integration of Data Mining, on the one hand, and Ubiquitous Computing, on the other. The focus of UKD encompasses the whole process of knowledge discovery in mobile, finely distributed, interacting, dynamic environments, in presence of massive amounts of heterogeneous, spatially and temporally distributed sources of data. The UKD discovery process needs substantial improvements of existing techniques, when not a totally new re-thinking. In particular, the challenges of UKD aims at developing algorithms and systems for: • Locating data sources - The current availability of world-wide distributed databases, Web repositories, and sensor networks adds a novel problem to todays Knowledge Discovery, i.e., the need to locate, in a transparent way, possible sources of data required to satisfy a users demand. • Rating data sources - As multiple sources may usually be identified as candidate data suppliers, effective evaluation methods must be designed to rate the sources w.r.t. the search target. • Retrieving information - The nature and type of data available today are different from the past, as much emphasis is set on semantically rich, multimodal, interacting pieces of information, which need clever and fast data fusion processes. • Elaborating information - The development of new approaches and algorithms must match the changes in the data distribution and nature. Algorithms are not only faced to the scalability problem (as in the past), but also to resource constraints, communication needs, and, often, real-time processing. Moving the data mining software from a central database to. 24.

(37) 2.5 Context-aware data mining systems. the ubiquitous computing devices demands a design of systems and algorithms which differs considerably from that of classical data mining. This is the price to pay for introducing intelligence in small devices. • Personalizing the discovery process - Due to the large number and variety of potential users, adaptation to individual users is a must. People, in fact, more than ever play a pivotal role in the knowledge handling process: people create data, data and knowledge are about people, and people are often the ultimate beneficiaries of the discovered knowledge. This requires that the benefit-harm tradeoff for different stakeholder groups is understood and used for system design. • Presenting results - As most often people are the intended users of the discovered knowledge, it is very important that adequate visualization techniques be designed to easy their task of interpreting the (intermediate and final) results of the process. User models may help designing more effective human/machine interactions, and guiding the development of new services.. 2.5. Context-aware data mining systems. In Singh et al. (2003) authors lay out a data mining framework where context information (domain specific, location, etc) is used to determine the behavior of data mining process. There are also examples in the literature where context is intermingled into the data mining process flow to produce results rather than using context as a determinant factor to change the behavior of knowledge discovery dynamically. Two examples are restaurant recommendation system explained in Lee et al. (2006) and the Bayesian Network based recommender system designed for context-aware devices Park et al. (2007). In Preuveneers & Berbers (2008) the use of the mobile phone as a tool for personalized health care assistance for individuals diagnosed with diabetes is explored. The system relies on the user giving the information on the glucose level. In the proposed system the long-term objective is to improve support for online unsupervised learning to remain reflective of user changing situations and preventing. 25.

(38) 2.6 Meta-learning. blood glucose level swings. Nevertheless the unsupervised learning is not performed in the device. Conversely, in Siewiorek et al. (2003) SenSay, a context-aware mobile phone that adapts to dynamically changing environmental and physiological states is presented. Five modules are presented for the architecture: the sensor box, sensor module, decision module, action module, and phone module being the decision module out of the device. The authors also recognize that in order to provide a fully mobile solution, the decision module would have to be migrated to the phone and they claim to include such functionality in the next release. Nevertheless, today’s mobile computing approaches either fail to incorporate context, or do it in a very limited way. Consequently, regardless the application, the development of an infrastructure for the development of ubiquitous data mining services to provide systems with the required knowledge supporting intelligent services is required. In Haghighi et al. (2007), an architecture composed by a situation manager and a strategy manager is proposed, to execute a data stream mining process in adaptive context-aware situations. Although, parameter tuning is done using predefined correlation functions between parameters and context variables the mechanism is not configurable. Also in Haghighi et al. (2009), the authors present a mechanism able to automatically tune a stream algorithm, for but solutions for adaptable algorithms for the static case are still lacking. The main difference in the problem lies on the fact that is not possible to adjust the algorithm configuration while running, thus the state-of-the-art solutions cannot be applied for batch learning algorithms.. 2.6. Meta-learning. The field of meta-learning has become an active area of research and concerns applying data mining techniques to meta-data about machine learning experiments.. 26.

(39) 2.7 Multi-objective optimization. Meta-learning has several possible applications, in Vilalta & Drissi (2002) related to fault management the prediction of algorithm’s resource consumption is fundamental, in Gujrati et al. (2007) how to make a dynamic prediction by means of data mining techniques is presented. Meta-Learning has two main challenges: 1) to improve data mining algorithms performance and 2) make automatic learning possible for different kinds of learning problems in a flexible way. In this thesis the ideas from automatic learning are borrowed in order to find a model to understand the data mining algorithm behavior and predict the cost of the execution in terms of time and resources as well as the accuracy of the data mining model. In other words, for building a cost model of the data mining algorithm, the analysis of meta-data relative to algorithm past execution is fundamental. In Leyton-brown et al. (2005) the authors present the problem of predicting the runtime over an NP-complete problem and provide a solution and relative experiments based on machine learning techniques. In Bensusan & Kalousis (2001) the authors make use for meta-learning techniques to predict the accuracy of a classifier on the basis of the dataset characterization.. 2.7. Multi-objective optimization. To find the optimum algorithm configuration is a Multi Criteria Decision Analysis (MDCA) problem where the objective function is more than one. Such a problem is also called Multi-objective optimization, a state of art can be found in Branke et al. (2008). In our problem the objective is to find a proper data mining configuration, this is an optimization where the variables to take into account like available resource or context situation are the objective functions. In fact, the possible configurations are many and only one has to be chosen according to the resources consumption and the quality of the desired data mining model. There are different approaches in the literature that can be used to find the proper configuration for a data mining algorithm, one is based on constructing a single aggregate objective function (AOF) from many objective functions, other solutions solve the optimization by constructing several AOFs Das & Dennis (1998) Mueller-Gritschneder et al. (2009), and others are based on genetic algorithms Coello (1999) Deb et al. (2002) Deb & Kalyanmoy (2001).. 27.

(40) 2.8 Social recommenders. In this thesis we will show how genetic algorithms and in particular Evolutionary Multiple Objective (EMO) algorithm are used in order to solve the problem regarding the most appropriate configuration of an algorithm given the information available on context and resources. The main challenge behind the application regards the definition of the objective functions and the definition of the constraints for context and resources. In fact, for this solution we do not modify techniques, but we just integrate the algorithm into the configurator component.. 2.8. Social recommenders. Recommender systems are supporting systems providing a prediction of a user rating or preference on an information, product or service (such as books, movies, music, digital products, web sites, and TV programs). The prediction is built by analyzing the suggestion and the preferences of other users. The recommender tools can be divided into three groups: content-based Pazzani & Billsus (2007), collaborative Schafer et al. (2007) and hybrid tools Burke (2002) Adomavicius & Tuzhilin (2005). Content-based recommenders base their recommendations on a description of the item and a profile of the users interests, so the main challenges of the technique concerns how to represent items, creating a user profile for describing the user preferences and comparing the user profile to item characteristics Lops et al. (2011). The main advantages of CB recommenders are outlined in the following. • The user independence: the preferences are provided by the active user, so there is no need for data on other users; • New items: the system is capable of recommending new and unknown items as we can infer the interest on the user profile. Among the drawback we mention the item representation problem concerning the impossibility to recommend items if there is not enough information to represent items.. 28.

(41) 2.8 Social recommenders. Collaborative Filtering is the process of filtering or evaluating items using the opinions of other people. The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B’s opinion on a different issue x than to have the opinion on x of a person chosen randomly. If CF systems solve the problem of item representation problem described for CB recommenders, and they also are able to recommend unexpected items to the users, nevertheless, CF uses a user-item matrix that could be extremely large and sparse. This brings about the challenges in the performances of the recommendation. Finally, hybrid recommenders combine both solutions in order to suppress the drawbacks of an individual technique in a combined model. A common feature of recommenders is the application of data mining technique to offer personalized information to the user through analyzing his/her preference Amatriain et al. (2011). In this thesis, we propose a hybrid recommender that is generated through in situ data mining that exploits three types of information: (1) sensor data, (2) mobile traffic data and (3) social networks. SocialFusion, which is described in Beach et al. (2010) , combines information from these three sources to provide recommendations, whereas WhozThat, which is described in Beach et al. (2008), provides context-aware audio by combining Facebook and mobile phones. However, both of these tools generate recommendations by sending sensitive information to a third party. The main problem behind the information being sent to a third party regards privacy issues Saygin & Ulusoy (2002). To avoid this problem, we propose the social recommender to calculate the recommendation based only on information available locally. However, this solution is not free of problems as the problem of resources consumption has to be tackled. Consequently, we propose a resource aware data recommendation engine to deal with resources consumption. In addition to the recommender itself, the functionality to make recommendations possible must also be available. Text mining techniques to detect activities in undefined formats is needed, the usefulness of this. 29.

(42) 2.9 Conclusion. technique to determine social behavior is demonstrated in Java et al. (2007). In addition, Thelwall et al. (2010) present a method for extracting emotional information from social network communication. As we have mentioned above, recommenders have been widely used in order to make recommendation in web. Due to the structure of the web and its users, social graphs are often integrated in social recommenders Papadopoulos et al. (2010). In fact in Harary & Norman (1953) the social graph has been defined as the global mapping of everybody and how they are related. In the solution proposed in this thesis, for social recommendations a social graph has been used. The main contribution of our social graph concerns is that a social graph is built for each user and each node of the graph be either another user or a group of users. The motivation behind this novelty is that the users do not interact with all friends in the same way. Rather, users tend to have social interactions only with a small group of their social network friends Wilson et al. (2009). The aim of our proposed social graph is to represent the relationships of a user with the users friends or subgroups of friends and show how frequently the user interacts with them.. 2.9. Conclusion. From the state of the art just reviewed it is clear that the next stage of data mining will have to deal with the production of components to be executed in u-devices. For this reason it is a need to automatically configure their execution according to context and resources. The intelligence provided by data mining models in multiple domains, can be dynamically obtained without human intervention in case of context changes, or in case the data mining models loose validity or accuracy. One of the domains that would definitely benefit from such developments will be the one of recommender systems specially in social networks. In what follows we present the solution we propose to deal with the challenges just mentioned.. 30.

(43) Chapter 3 Setting the problem 3.1. Introduction. In the background chapter, the main lacks as challenges regarding the development of a new generation of data mining components to be executed in ubiquitous environments have been analyzed. In this thesis we deepen in the problem of data mining algorithm execution, and in particular how to find an ”appropriate” configuration for the algorithm execution in a particular situation defined by resource state and context. Finding an appropriate algorithm configuration means dealing with the challenges of autonomy, adaptability and feasibility. We will assume in this work that the behavior of the data mining algorithm is deterministic. Consequently, we propose to split the problem to find the appropriate configuration of an algorithm into the following subproblems: 1. Calculating a model of a data mining algorithm behavior. We understand as a model of behavior of the algorithm the model that given the inputs (algorithm parameters, input dataset) of an algorithm, the model is able to estimate the resources (memory and cpu cycles) used by the algorithm and the quality of the data mining model the algorithm will obtain. 2. Calculating for a given situation (context and resource constraints) the most appropriate configuration of the data mining algorithm taking into account its behavior model.. 31.

(44) 3.1 Introduction. In this thesis, for the first sub-problem, we present a data mining based approach. In particular, we propose to analyze past behavior of an algorithm to learn its behavior so to be able to extract a model able to predict in the future the behavior of the algorithm. We will concentrate on the design of implementation of a model to estimate both the quality of the model obtained by a data mining algorithm and the cost to obtain such a model. We call our method EE-Model that stands for Efficacy (quality of the model) and Efficiency (resource to obtain the model). The EE-model will be a model that taking into account the information of the execution of a particular data mining algorithm in a device will estimate the efficiency and efficacy of the data mining algorithm execution. We assume to analyze data to get the model from executions of the algorithm in different devices and with possibly different datasets. This makes it possible to generate a general purpose model. Nevertheless, in some cases it may be necessary to adjust the model for a particular device and a particular domain. Consequently in this thesis we will also present the development of a particularized model for predetermined device and datasets what has been called P-EE-Model. The EE-Model would estimate behavior of a data mining algorithm under changing device and dataset features, and consequently producing more flexible and adaptable models but less accurate. On the other hand, the P-EE-Model would end up with more accurate models despite the loss of flexibility. In Section 3.1 we will further analyze the requirements of both methods. The second objective of this thesis is to analyze given a behavior model of a data mining algorithm and a particular situation determined by information regarding context and resources, which is the best configuration of the algorithm. The solution of this problem is not straight forward as it involves the calculation of each possible configuration and then analyzing having the EE-model its feasibility and cost. Thus the two issues to be analyzed in each situation that the configuration of the algorithm is required are as follows: • checking each possible configuration to see whether it is feasible and if feasible calculating the cost of the execution and the quality of the model produced.. 32.

(45) 3.1 Introduction. • Choosing among the feasible configurations the most appropriate configuration according to available context and resource information.. For the first issue, every possible algorithm configuration has to be tried. The number of the possible configurations depends on the number of the parameters of the algorithm and the number of possible values of each parameter, as the number of parameters grows, the complexity grows exponentially. So, a subset of the feasible configurations is needed to restrict the computational cost. Thus in this thesis we propose a method to calculate a set of representative configurations, so to decrease the number of configurations to be tested. Once the set of representative feasible configurations is obtained, an algorithm to find the most appropriate configuration according to a situation and given the behavior model is required. The method that calculates representative configurations together with the algorithm to find the most appropriate one in a given situation has been called in this thesis the SC-Mapper that stands for situation to configuration mapper. Figure 3.1 depicts the whole approach we present in this thesis. The lefthand side of the figure shows the method to obtain either the EE-model or P-EEModel while the righthand side part of the figure depicts the SC-mapper.. The three methods altogether fulfill the objectives we plan when setting the problem: • Feasibility is provided by means of the EE-model that makes it possible to know for each configuration its cost • Adaptability is provided by the mappers making it possible to choose the most appropriate configuration for each particular situation. 33.

(46) 3.1 Introduction. Figure 3.1: Summary • Autonomy is provided by the EE-Model together with the SC-mapper as they altogether make it possible to find a configuration to be run so to find a data mining model of certain quality in a given situation. In the introduction, it has been mentioned that many domains would benefit from development of autonomous data mining components. In the related work, recommenders were described as one of the possible scenarios in which this kind of components would be interesting. On the other hand, social networks have shown to be one domain that clearly would benefit from data mining models. Consequently, in this thesis we propose to validate the proposed to implement a recommender for the case of a social network and later apply the EE-model develop to the recommender. In chapter 5, we will consequently present a recommender for activities called SOMAR and later we will present how to instantiate the EE-Model so the model in which recommendations are based in SOMAR is updated as data changes. In what follows we detail the assumptions for the. 34.