• No se han encontrado resultados

Capítulo 2: EMPRESAS AGROALIMENTARIAS Y SU CADENA DE

2.4 Tendencia actual del cooperativismo

2.4.1 Cooperativas

Credit card fraud is a criminal activity that financially benefits an individual or group of individuals. It is deliberately carried out by individuals working against the law. Fraud prevention technologies have been used in the banking industry for a long time to prevent fraudulent transactions. However, since fraud masters are adaptive, they normally find a way around credit card prevention technologies.

Credit card fraud detection within the banking industry is an evolving discipline for detecting non-preventable fraudulent transactions. Machine learning and statistics are two broad fields that have demonstrated their effectiveness in fraud detection. Common state-of-the-art statistical and machine learning techniques for classification problems utilized for the fraud detection of the credit card transactions include techniques such as LR, ANN, SVM, decision tree, k-NN and outlier detection (Malini

& Pushpa 2017, pp. 1 – 4; Ganji 2012, pp. 1 – 5; Das, et al. 2017, pp. 1 – 4;

Manjaramkar & Kokare 2017, pp. 1 – 4; Liu, et al. 2017, pp. 1 – 6; Mao, et al. 2017, pp. 1 - 8).

Given a dataset, some classification techniques are more successful at detecting future credit card fraudulent transactions than others. These classification techniques need historical data for the effective prevention of future credit card transactions. However, historical data is usually accompanied by challenges associated with imbalanced data, whereby the percentage of illegal transactions is far lower than the

40

percentage of legal transactions (Gosain & Sardana 2017, pp. 79 – 85; Matsuda &

Murase 2016, pp. 349 – 354; Pengfei, et al. 2014, pp. 217 - 222).

To address the issue of imbalanced data, techniques such as oversampling of the class of minority data observations and under-sampling of the class of majority data observations are used. In this research study, oversampling techniques such as safe- level Synthetic Minority Oversampling Technique (SL-SMOTE) and Synthetic Minority Oversampling Technique (SMOTE) are used to generate synthetic data observations between the two data observations of the minority class. SMOTE and SL-SMOTE are sampling methods which oversample the minority class by computing median features vectors between nominal features sample and its potential nearest neighbours through Euclidean distance of standard deviations (Chawla, et al. 2002, pp. 321 - 357). However, SL-SMOTE generates synthetic data observations between the two data observations at the safe level of the minority class. The use of SL-SMOTE and SMOTE have been successful (Bunkhumpornpat & Subpaiboonkit 2013, pp. 570 – 575; Gosain & Sardana 2017, pp. 79 – 85; Bunkhumpornpat, et al. 2011, pp. 1 – 4; Meidianingsil, et al. 2017, pp. 1167 – 1171).

The concept of the stacking ensemble method is the specialization of machine learning that takes different machine learning techniques and allows them to vote for an output. The stacking ensemble method has previously been implemented for voting the output using weighted voting and majority voting (Ali, et al. 2015, pp. 211 – 216; Li & Wang 2017, pp. 73 – 77; Dalvi & Vernekar 2016, pp. 1747 - 1751). However, this research study is against the usage of majority voting for a binary classification problem on the bases that a model with 10% predictive accuracy percentage cannot be treated similarly with a model with 90% predictive accurate predictive accuracy percentage when voting the output. For this reason, weighted voting is used for final voting of transactions in this research study. The usage of weighted voting is widespread (Ali, et al. 2015, pp. 211 – 216; Mu, et al. 2005, pp. 2661 – 2666; Li & Wang 2017, pp. 73 - 77).

The weights for classifier models per transaction must be high for a class that performs well, and low for a class that does not perform well. This means that selection of the appropriate weights of votes for each class per classifier is very important. To select the appropriate weights of classifier models, Differential Evolution (DE) method is used

41

to search for optimum weights is used in this research study. DE is a stochastic method for optimization that is simple in structure but efficient for global numerical optimization (Funaki & Takagi 2011, pp. 287 - 290). Literature evidence on the successful application of DE is plentiful (Bouteldja & Batouche 2017, pp. 1 – 8; Domingo, et al. 2013, pp. 105 – 111; Hui & Suganthan 2013, pp. 135 – 142; Funaki & Takagi 2011, pp. 287 - 290; Goudos, et al. 2016, pp. 1 - 4).

Once the prediction of transaction is made, the model must be evaluated. Various studies have used accuracy score for computing the accuracy of the classifiers (Zaza & Al-Emran 2015, pp. 275 – 279; Nizar, et al. 2008, pp. 1 – 8; Mei & Jiang 2016, pp. 301 – 305; Singh, et al. 2015, pp. 388 - 393). The model accuracy score is the count of correctly predicted transactions over the total count of transactions (Singh, et al. 2015, pp. 388 - 393). In simple terms, the number of fraudulent transactions classified correctly and the number of non-fraudulent transactions classified correctly cannot be derived from the accuracy score.

The actual accuracy score of the model is obtained through the correct prediction of the balance between the percentage of fraudulent and non-fraudulent transactions. The aim is to build a classification model that equally represents both fraudulent and non-fraudulent classes. Accordingly, to evaluate the classification model, this research study computes the confusion matrix evaluation method on the values of fraudulent and non-fraudulent classes. The confusion matrix shows the number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) predicted transactions (Rajak & Mathai 2015, pp. 1 - 4).

The number of true positive (TP) transactions is the number of output transactions that the base classifier has predicted as being fraudulent when they were in fact fraudulent. Similarly, the count of true negative (TN) transactions is the count of output transactions that the base classifier has predicted as being non-fraudulent when they were in fact non-fraudulent, the count of false positive (FP) transactions is the count of output transactions that the base classifier has predicted as being fraudulent when they are non-fraudulent, and the count of false negative (FN) transactions is the count of output transactions that the base classifier has predicted as being non-fraudulent when they are in fact fraudulent.The confusion matrix evaluation method has been

42

applied successfully by (Rajak & Mathai 2015, pp. 1 – 4; West & Bhattacharya 2016, pp. 1796 - 1801).

Documento similar