• No se han encontrado resultados

CAPITULO II MARCO TEORICO

2.2 Bases Teóricas

2.2.2 VOLUMEN DE VENTAS

The above experimental results have shown the strong performance of our OWA fuzzy rough set based classifier from Chapter 3 relative to existing semi-supervised classification meth- ods. However, it is important to note that we have only included semi-supervised classifiers performing self-labelling in the comparison in Section5.2.3. Alternative approaches (see Sec- tion 5.1.2) are not represented. The reason for this is that our ‘Lower’ method is in fact not a true semi-supervised classifier, in the sense that it does not use the elements in U to derive its class predictions. We have shown in Section 5.2.2 that our fuzzy rough set based method does not benefit from self-labelling in the classification of semi-supervised data. We

Table 5.7: Results of the Friedman test comparing the algorithms from Table5.6. P-values implying statistically significant differences at the 5% significance level are printed in boldface.

Transductive

10% 20%

Rank pHolm Rank pHolm

Lower 3.0333 (1) - 2.7833 (1) - CoTr(SMO) 5.5000 (7) 0.000577 4.9667 (5) 0.002225 TriTr(C45) 4.8000 (5) 0.020866 5.2000 (7) 0.000797 DemCo 5.2333 (6) 0.002521 5.1500 (6) 0.000913 CoBag(C45) 6.1333 (8) 0.000007 5.8333 (8) 0.000010 SEGSSC-TriTr(C45) 3.6000 (2) 0.740528 3.5000 (2) 0.257151 SEGSSC-DemCo 3.6000 (2) 0.740528 4.0667 (3) 0.084980 SEGSSC-CoBag(C45) 4.1000 (4) 0.275071 4.5000 (4) 0.019925 pF riedman 0.000001 0.00001 30% 40%

Rank pHolm Rank pHolm

Lower 2.7000 (1) - 2.8667 (1) - CoTr(SMO) 4.4333 (3) 0.012264 4.4000 (3) 0.030666 TriTr(C45) 5.1000 (7) 0.000887 5.0833 (6) 0.002284 DemCo 4.9333 (5) 0.001655 4.5833 (4) 0.019925 CoBag(C45) 5.6333 (8) 0.000025 5.2000 (8) 0.001574 SEGSSC-TriTr(C45) 3.5667 (2) 0.170587 3.9000 (2) 0.102292 SEGSSC-DemCo 4.6000 (4) 0.007989 4.8000 (5) 0.008946 SEGSSC-CoBag(C45) 5.0333 (6) 0.001124 5.1667 (7) 0.001657 pF riedman 0.000057 0.002493 Inductive 10% 20%

Rank pHolm Rank pHolm

Lower 3.0333 (1) - 2.9000 (1) - CoTr(SMO) 5.5000 (8) 0.000673 4.6000 (5) 0.028758 TriTr(C45) 5.2333 (5) 0.002017 5.4333 (7) 0.000371 DemCo 5.4667 (7) 0.000716 5.2833 (6) 0.000822 CoBag(C45) 5.4000 (6) 0.000913 5.7333 (8) 0.000052 SEGSSC-TriTr(C45) 3.4667 (2) 0.858391 3.4667 (2) 0.370264 SEGSSC-DemCo 3.5333 (3) 0.858391 4.3167 (4) 0.075283 SEGSSC-CoBag(C45) 4.3667 (4) 0.105045 4.2667 (3) 0.075283 pF riedman 0.00004 0.00002 30% 40%

Rank pHolm Rank pHolm

Lower 3.3167 (1) - 2.8167 (1) - CoTr(SMO) 4.1000 (3) 0.431018 4.6500 (4) 0.011239 TriTr(C45) 5.1500 (7) 0.022479 4.9333 (6) 0.004088 DemCo 4.3667 (4) 0.290625 4.7333 (5) 0.009765 CoBag(C45) 5.7333 (8) 0.000930 5.6000 (8) 0.000075 SEGSSC-TriTr(C45) 3.8000 (2) 0.444738 3.7667 (2) 0.133076 SEGSSC-DemCo 4.5333 (5) 0.290625 4.5333 (3) 0.013284 SEGSSC-CoBag(C45) 5.0000 (6) 0.038887 4.9667 (7) 0.013284 pF riedman 0.003332 0.000664

Table 5.8: Results of the Wilcoxon test comparing our fuzzy rough classifier ‘Lower’ to the other algorithms in Table 5.6 in the format ‘R+/R/p’. The R+ value always corresponds to the fuzzy rough method. P-values implying statistically significant differences at the 5% significance level are printed in boldface.

Transductive 10% 20% CoTr(SMO) 399.0/66.0/0.000593 386.5/78.5/0.001443 TriTr(C45) 369.0/96.0/0.004834 376.0/89.0/0.003058 DemCo 372.0/93.0/0.003982 360.0/75.0/0.001987 CoBag(C45) 386.0/79.0/0.001537 388.0/77.0/0.001334 SEGSSC-TriTr(C45) 316.0/149.0/0.084035 342.0/123.0/0.023665 SEGSSC-DemCo 287.0/178.0/0.257946 345.5/119.5/0.019257 SEGSSC-CoBag(C45) 285.0/180.0/0.275659 339.0/126.0/0.027749 30% 40% CoTr(SMO) 373.0/92.0/0.00373 363.0/102.0/0.00705 TriTr(C45) 386.0/79.0/0.001537 363.0/72.0/0.001594 DemCo 372.0/93.0/0.003982 374.0/91.0/0.003492 CoBag(C45) 390.0/75.0/0.001155 387.0/78.0/0.001432 SEGSSC-TriTr(C45) 316.0/149.0/0.084035 338.0/127.0/0.029239 SEGSSC-DemCo 287.0/178.0/0.257946 355.5/79.5/0.002674 SEGSSC-CoBag(C45) 285.0/180.0/0.275659 368.0/97.0/0.005153 Inductive 10% 20% CoTr(SMO) 394.0/71.0/0.000862 368.0/97.0/0.005153 TriTr(C45) 380.0/85.0/0.002274 387.0/78.0/0.001432 DemCo 377.0/88.0/0.002789 375.0/90.0/0.003269 CoBag(C45) 389.5/75.5/0.001163 397.0/68.0/0.000689 SEGSSC-TriTr(C45) 298.0/167.0/0.174619 329.0/136.0/0.046029 SEGSSC-DemCo 297.0/168.0/0.181242 340.5/124.5/0.025259 SEGSSC-CoBag(C45) 281.0/184.0/0.312281 325.0/140.0/0.055767 30% 40% CoTr(SMO) 347.0/118.0/0.018013 340.0/95.0/0.007822 TriTr(C45) 347.5/87.5/0.004662 376.0/89.0/0.003058 DemCo 324.5/140.5/0.056459 362.5/102.5/0.007122 CoBag(C45) 388.0/77.0/0.001334 387.0/78.0/0.001432 SEGSSC-TriTr(C45) 320.0/145.0/0.070294 349.0/116.0/0.016106 SEGSSC-DemCo 359.0/106.0/0.008821 370.0/95.0/0.004534 SEGSSC-CoBag(C45) 352.0/113.0/0.013549 373.0/92.0/0.00373

only wished to verify whether its performance consequently sits below that of algorithms that do use self-labelling to their advantage. This is not the case.

We do not claim that there exists no possibility other than self-labelling to further enhance the performance of our fuzzy rough classifier on semi-supervised data by exploiting the infor- mation inU in its prediction mechanism. One possible area of future research lies with the modification of the definitions of the fuzzy rough approximation operators (3.7-3.8) according to the presence of unlabelled training instances. In their present form, the lower approxima- tion of a classC relies on labelled training instances not inC and the upper approximation ofC relies on labelled training instances in C. In other words, the former relies on elements inLco(C) and the latter on elements in LC. On a semi-supervised dataset, these sets could be extended with certain elements in U, for instance by including unlabelled elements near x when computing C(x) and C(x) and weighing their contribution according to how strongly they relate to co(C) or C respectively. Inspiration for the latter could be found with the techniques listed in Section5.1.2. This approach stays close to self-labelling, which iteratively adds elements toL and thereby naturally extends the sets Lco(C) and LC, but differs from it by (i) only performing one ‘labelling’ iteration and (ii) using an adaptive set of instances inU for the prediction of each elementx(namely, only those sufficiently near x).

Two other components are present in definitions (3.7-3.8), namely the instance similarity relation R(x,·) and the OWA weight vectors WL and WU. The former could be replaced

by an alternative directly incorporating the information in U for example in a similar way as semi-supervised support vector machine classifiers use the unlabelled training elements. As described in Section5.1.2, these methods avoid to let the decision boundary cross dense regions of unlabelled elements. An option for our fuzzy rough set based methods would be to penalize the similarity between labelled instances separated by a dense unlabelled region. Very informally put, this corresponds to considering two people standing on either bank of a 50 meter wide river as more distant to each other as when they were standing on either end of a 50 meter long meadow. It is more difficult to cross the former than the latter. Another option is the integration of the training set label sparsity in the definitions of WL and WU.

The weight definitions discussed in Chapter3could be updated to accommodate both labelled and unlabelled elements, for instance by interweaving weight vectors of different schemes with each other (e.g.WLadd forL and WLexp forU).

5.3

Conclusion

In this chapter, we have studied the topic of semi-supervised classification. In a semi- supervised training set, only part of the observations are associated with a class label, while the remainder is unlabelled. The classification task can be split up in two aspects, the trans- ductive and inductive performances. The former refers to the prediction of class labels for unlabelled training elements, while the latter evaluates the predictions for unseen test ele- ments.

We have evaluated our fuzzy rough classifiers proposed in Chapter3for this task using a set of 30 semi-supervised datasets with 10%, 20%, 30% and 40% of labelled training instances (that is, 30 datasets for each of these percentages). These are fairly basic and easy-to-understand

algorithms: to classify an instancex, they compute its membership degree to the OWA based fuzzy rough lower or upper approximation of all decision classes and assign x to the class for which this value is largest. The weighting schemes used within the lower and upper approximation calculations are chosen according to our strategy proposed in Chapter3. In a first step of our evaluation, we were able to show that our fuzzy rough classifiers retain a strong performance even when only a small portion of the training set is labelled. Using the relatively limited amount of information available in the labelled part of the training set, our methods clearly outperform other base classifiers used in the experimental study of [406]. Only small performance differences are observed between our methods using the lower approximation, upper approximation or both and we decided to focus on the former. Secondly, we combined this OWA based fuzzy rough lower approximation classifier with existing strong self-labelling techniques and showed that its performance is not improved by extending the labelled part of the training set. Instead, statistically significant decreases in performance were observed. We concluded that the most prudent course of action is to only allow our fuzzy rough method to use the originally labelled training instances to derive its predictions. In the final part of our experiments, we compared the classification performance of our classifier to existing semi-supervised classifiers that do rely on self-labelling. We selected well-performing methods from recent studies [405,406]. Our method outperforms all included algorithms and often so with statistical significance.

The reader will recognize that this has been an atypical chapter, in which we showed that our existing algorithm from Chapter 3 already performed very well and is only hindered by making modifications for this specific classification setting. As should be clear from our discussion in Chapter 1, the difference between supervised and semi-supervised data is of a different order than the difference between traditional data and multi-instance or multi- label data (Chapters 6-7). Semi-supervised data does not imply a new structure of the observations, only a smaller labelled training set. As we will discuss in our overall conclusion in Chapter 8, fuzzy rough set based methods are particularly suited for small classification problems, that is, problems with small to moderately-sized training sets. Aside from this aspect, our optimized weighting scheme selection strategy from Chapter 3 has allowed our OWA based lower approximation predictor to use the limited information in the training set to its full advantage. As discussed in Section5.2.2, extending the labelled part of the training set by means of self-labelling can result in a relatively more challenging training set to learn from (although containing more labelled data) and the adverse effect is clear on our fuzzy rough classifier due to its strong dependence on instance similarity values.

Our main conclusion from this study is twofold, namely (i) our OWA based fuzzy rough lower approximation classifier proposed in Chapter 3 performs strongly even on semi-supervised data and (ii) the method does not benefit from any self-labelling. We acknowledge that the second conclusion has been derived using only (variants of) established self-labelling methods reviewed in [406]. However, while studying the challenge of semi-supervised classification, we did test many other ideas, both based on fuzzy rough set theory and not, but no improvements over the baseline performance of our method could be obtained. As shown in the experimental study, the performance of the naive fuzzy rough self-labelling method proposed in [311] is far below that of the other methods reported in this chapter as well. The inherent characteristic of self-labelling remains that these methods can intuitively be expected to create several dense

areas of same-class instances (in which confident class predictions were made) separated by sparser regions (where prediction confusion is present). The resulting self-labelled training sets do not lend themselves as appropriate training data for our fuzzy rough classification algorithm in its current form. Lessons learnt from the development of so-called safe semi- supervised classifiers (e.g. [283,285]), which is an active topic of current research that ensures that including the unlabelled instances in the prediction process causes no harm, can further assist us in the study of a true semi-supervised fuzzy rough set based classification method. The research question has been whether our fuzzy rough classifier benefits from self-labelling. We have shown that it does not, but wish to stress that this conclusion should not be regarded as a negative result. Instead, we have once again confirmed the strength of our weighting scheme selection strategy proposed in Chapter 3. Its strong performance compared to the existing semi-supervised classifiers as evaluated in Section5.2.3 furthermore suggests an ef- ficiency gain for the classification of this type of data. Our classifier does not require a self-labelling step, which is usually an iterative procedure in which many classifiers are con- structed, and only uses the small labelled training set in its predictions. As a consequence, the classification procedure is both swift and relatively accurate.

6

Multi-instance learning

The domain of multi-instance learning (MIL) deals with datasets consisting of compound data samples. Instead of representing an observation as an instance described by a single feature vector, each observation (called a bag) corresponds to a set of instances and, consequently, a set of feature vectors. The instances within a bag can represent different parts or alternative representations of the same object. Initially proposed in [127], the MIL domain has devel- oped into a mature learning paradigm with many real-world applications. A comprehensive overview can be found in the recent book [217].

In this chapter, we propose multi-instance classifiers based on fuzzy and fuzzy rough set theory. Our methods classify unseen bags using either instance-level or bag-level information. We first provide an introduction to MIL in general in Section6.1and to multi-instance classification in Section6.2. Our fuzzy multi-instance classifiers are described in Section6.3, while Section6.5 defines our fuzzy rough multi-instance methods developed for class imbalanced multi-instance data. These two sections provide a complete overview of our algorithms and their proposed internal parameter settings.

We conduct a thorough experimental validation of our proposals. A high number of experi- mental results are included and, for the sake of clarity, we divide the experimental study into two main parts. Sections6.4and 6.6compare the different internal settings of our fuzzy and fuzzy rough set methods respectively and explain why certain choices for the given parameters are more appropriate than others. We report these after the sections introducing our frame- works, such that Sections 6.3-6.4 both consider our fuzzy methods, while Sections 6.5-6.6 are on the fuzzy rough set based classifiers. In Section 6.7, with lessons learnt from Sec- tions6.4-6.6, we compare our methods to existing multi-instance classifiers on both balanced and imbalanced multi-instance datasets. Finally, Section6.8 sums up our conclusions. We have aimed to retain a clear structure in the large amount of material included in this chapter, which consists of four parts (followed by the conclusion in Section6.8), summarized as follows:

1. Sections 6.1-6.2 serve as introduction to the topic of multi-instance classification. 2. Sections6.3-6.4introduce our framework of multi-instance classifiers based on fuzzy set

theory. The former describes the classification process of our methods in great detail, including illustrative examples and a discussion on their computational complexity. The

latter performs an internal experimental comparison of these methods to gain a better insight into their behaviour.

3. Sections6.5-6.6considers our fuzzy rough set based multi-instance classifier framework. As before, the former chapter provides a detailed description of our methods and the latter presents a thorough experimental comparison of their parameter settings. 4. Section 6.7 presents the concluding part of the experimental study conducted in this

chapter. Based on the observations made in Sections 6.4-6.6, we compare our proposed algorithms to existing multi-instance classifiers.

6.1

Introduction to multi-instance learning

We first provide a brief introduction to the domain of MIL. Its origins are presented in Sec- tion6.1.1, where we discuss the original proposal [127] by Dietterich et al. initiating research in this field. Having provided the intuition behind multi-instance data, its formal description is given in Section6.1.2. Several prominent application areas are listed in Section6.1.3. This section (as well as Section6.2) is based on the recent book [217] and we refer the interested reader to that work for a more detailed survey of the field.

Documento similar