• No se han encontrado resultados

Características de las campañas sanitarias 129

Capítulo  3.   Comunicación y Salud 95

4.1.   Características de las campañas sanitarias 129

The research has a twofold purpose, the first of which is to assess the performance of a range of supervised classification and clustering algorithms by calculating credit scorings and/or cross-sell candidates. In this context, the research takes a closer look at how well selected algorithms deal with these issues. Based on the knowledge gained from the literature review in chapter 3, no evidence was found that any researcher has yet explored in greater detail the performance of various applied supervised machine learning algorithms when comparing their outputs with the given financial dataset. The second research purpose is to investigate customers’ digital footprint by looking at customer behavioural changes considering categorisation types on existing payment transactions from a data science perspective and developing a forecast model that can identify relevant patterns and predict categorised payment transactions within an uncategorised payment ecosystem with a small percentage of error. At present, the systematic literature review shows that there are presumably no significant research contributions available that provides new and in-depth insights into this specific research field of customer behavioural changes nor applies a data-driven approach to predict customer behavioural changes through various payment transaction channels in the digital age.

In this thesis, we also aim to review the theoretical concepts behind the data science techniques applied as well as predictive analytics science and how it works. Subsequently, we will provide an overview of the tools applied in the underlying research and how we can leverage them to answer the research questions and gain more valuable insights. Van der Putten (1999) already mentioned that a large number of choices can be made when detailing data mining objectives, preparing the data, evaluating the newly-gained insights, applied models and their results. Generally, the research methodology is derived from the following data science process illustrated in figure 4-2 below. As a result, the process flowchart reflects the logical structure of the entire chapter.

Figure 4-2: Visual guide to the data science process flowchart

The flowchart provides an overview of the key data mining steps along the applied data science journey for this research. Figure 4-2 also outlines the theoretical framework of the research study, which is based on a four-phase approach described as follows: (1) Assess the performance of a set of supervised classification and clustering

algorithms in terms of whether they are accurate for credit scoring and/or cross- sell candidate predictions.

(2) Investigate the effectiveness of advanced forecasting and predictive methods and assess whether they are suitable and applicable for changing customer behaviour identification based on the categorised payment history.

(3) Obtaining and investigating categorised payments data to advance precursor events in uncategorised payments data and assess whether supervised or unsupervised learning algorithms are the most appropriate methods.

(4) Different forecast models are explored and applied to the transactional dataset to help predict future credit scores for credit applicants and cross-selling candidates for promotions.

The research approach comprises various elements, which will also be included in this investigation. Regarding the data science process displayed in figure 4-2, certain types of transactions will be diagnosed during the data pre-processing phase, and the exploratory analysis phase uses different data analysis techniques that can be applied in transactional datasets. Other research elements to be mentioned are the more effective use of data in existing data analytical methods.

Further research elements of this thesis include scrutinising the effectiveness of advanced forecasting and predictive methods in the transactional dataset. The

research investigates the use of statistical and machine learning techniques for the specific field of payment transaction flows. In fact, applying different forecast models or a combination of cost-sensitive models is also a major theme of this thesis. The reason behind this is to value the effectiveness of different forecast models for the first two research projects and identify a high-performing mining algorithm for every single research project. The newly-gained in-depth insights by using various machine learning algorithms such as clustering, decision trees, logistic regression, random forest models and neural network algorithms or other supervised learning algorithms like support vector machines can also be useful for the last research project. These outcomes can serve as a starting point for pre-processing the data in a different way to predict uncategorised transactions more efficiently.

Overall, the research considers many important research elements, whereby the four- phase approach also requires a well-managed research map. For instance, the process for building an important data-driven model should not be underestimated. Figure 4-3 below highlights that various steps within the process will allocate different timeframes among these illustrated building blocks. It is widely known that modelling is only a minor part of building high-end models in the research area of artificial intelligence (AI). Researchers devote roughly 80% of their time to preparing and managing the data for analysis. Thus, data munging1 is the most time-consuming part

of the research study.

Figure 4-3: Approach for building a data-driven model

1 https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least- enjoyable-data-science-task-survey-says/#349631686f63, accessed on 27th December 2018.

The research questions are implemented by using a real dataset of a Czech bank. The goal of the data science approach demonstrated above is to show common pitfalls in data munging and how to avoid them, given that the success of research depends on the ability to collect the dataset, as well as cleaning and organising it for data mining purposes in an appropriate way. The remaining mining activities such as building the training set and refining the machine learning algorithms used are only allocated up to 20% of the entire time. For instance, Miksovsky, Matousek and Kouba (2003) highlighted that the success of every data mining algorithm is strongly dependent on a quality of data processing, which can result in a very complicated and challenging task. Further details about the research design and the research methodology applied will be discussed in the following sections.