1.2. Sensores táctiles: antecedentes
1.2.2. Tipos de sensores según su principio de funcionamiento
The following listing summarizes the multiple advantages and disadvantages of learning by pairwise comparison (LPC) in the frame of multilabel classification, particularly in comparison to binary relevance (BR). They can mainly be categorized as systemic (system inherent) versus empirical arguments, though it is the objective of this work to provide the experimental evidence in the following chapters. Some obvious points are introduced for the first time in this list.
• LPC is comparable to BR in training complexity within a small factor, which is determined by the average label cardinality in the training data (cf. Section ). With a super-linear base learner, the comparison can even turn out to be favorable for LPC (cf. Section )
• LPC has to train and evaluate a quadratic number of base classifiers (cf. Sec- tion ) and maintain them in memory. It is the main objective of this work to analyze and develop solutions in order to overcome this bottleneck.
• LPC allows a high degree of parallelization, which becomes even more important with increasing number of cores per processor and advantages in distributed com- putingin the cloud. In contrast to BR, the number of possible parallel jobs is incre- mented by a factor linear in the number of classes.
• LPC allows adding or removing classes from the model even after training. It allows class-incrementaltraining and testing.
• LPC decomposition produces smaller problems which are easier to learn, with re- spect to accuracy as well as time (cf. Section ).
• LPC and the chosen model of strict total preferences only considers the case of pairwise exclusion and ignores pairwise co-occurrence of labels, which could lead to the loss of valuable information (cf. Section ).
• Basic LPC is parameter-free. No configuration and costly parameter fitting is nec- essary. Moreover, no prerequisites are made to the underlying base learner, which also makes the base learner easier to configure.
• The extension of calibration naturally integrates into framework of pairwise pref- erence learning. It provides an elegant solution to bipartitioning and also works absolutely parameter-freeout of the box(cf. Section , ).
• Within the general framework of preference learning, LPC allows to encode and model the smallest piece of information available, namely pairwise relations be- tween classes (cf. Section ).
• LPC training does not allow for direct minimization of a particular measure such as e.g. holistic approaches (cf. Section ). However this step is transferred to the aggregation on the basis of instance-dependent pairwise preferences, for which several specialized solutions exist (cf. Section ). Moreover, in contrast to holis- tic approaches also considering pairwise preferences between labels in a global optimization problem, such as Rank-SVM ( ) or SVMrank
( , ), LPC is much more efficient in training.
• And finally, LPC has shown to be superior to BR in a wide range of studies on multiclass and multilabel classification with respect to predictive quality (cf. Sec- tion ).
The following chapters will particularly focus on the first two points in the list regarding the efficiency of pairwise decomposition and present appropriate solutions. However, the remaining points will accompany us through the whole work and new aspects will be subsequently added.
4 Pairwise Learning of Efficient
Perceptrons
A main challenge for multilabel classification in general and for pairwise classification in particular is the amount of instances that have to be processed (cf. Section ). This chapter introduces several possible solutions, all based on the usage of an efficient base learner. The examination is not restricted on the solution of this particular challenge. This study also allows to experimentally evaluate the most relevant approaches in Chapter , namely binary relevance and pairwise decomposition and the associated calibration tech- nique, and to empirically confirm some of the advantages and disadvantages claimed in Section .
( ) combined the one-against-all (cf. Section ) method and the label ranking (cf. Section ) idea in their multiclass multilabel perceptron algorithm (MMP). Instead of learning the relevance of each class individually and inde- pendently, MMP incrementally trains the entire classifier ensemble as a whole so that it predicts a real-valued relevance score for each class. This is done by always evaluat- ing the performance of the entire ensemble, and only producing training examples for the individual classifiers when their corresponding classes are incorrectly ordered in the ranking.
Perceptrons are used as base classifiers. This algorithm has recently received increased attention, especially in text classification, since it is simple, efficient and effective. In addi- tion, perceptrons allow incremental training, which makes them particularly well-suited for large-scale classification problems such as the large Reuters Corpusrcv1 benchmark (cf. Section ).
It is composed of more than 800,000 documents, which are assigned to on average 3.2 of 101 possible classes. This collection constituted a new challenge in text classification and in particular in multilabel classification. It is still one of the datasets with the largest amount of documents, one of the key dimensions of MLC scalability (cf. Section ). The
rcv1corpus is also an early representative for similar collections from the Web 2.0, which came up to an increased extent in the years following the publication ofrcv1 in .
In this chapter, we propose the use of pairwise decomposition as an alternative train- ing method for an effective and efficient ensemble of perceptrons. Multilabel pairwise perceptrons (MLPP) are trained and used as described in Section , i.e. we train one classifier for each possible class pair and we test by producing one overall label ranking by combining the predictions of the individual classifiers by simple voting.
This first study demonstrates the multiple advantages of learning and classifying by pairwise comparison, as well as using the fast perceptrons as base learner: Despite the quadratic number of perceptrons and the additional information between labels pro-
cessed, MLPP’s training is competitive to BR’s and MMP’s since its costs only differs by a constant factor. In addition, the work of ( ), followed by
( , ), was one of the first works confirming the superiority of the pairwise approach on the multilabel and label ranking task. The main reason for this is that, while BR and MMP propose to include information about the ranking task into the training signals, the pairwise approach addresses the ranking and bipartition problem by breaking the ranking signal down into elementary binary preferences that induce a final ranking (cf. Section ). As it turns out, perceptrons seem to particularly benefit from the smaller and thus simpler pairwise subproblems.
The basic version of MLPP described herein still evaluates a quadratic number of per- ceptrons for prediction, and we will see in Chapter how to substantially improve this circumstance. However, the study in this chapter still demonstrates that pairwise percep- trons are very suitable for the demands and challenges that are imposed by large datasets with a vast amount of documents such as the Reutersrcv1. At the time of development, it was also the most exhaustive evaluation of pairwise classification in terms of label dimen- sionality, and probably also in terms of number of features. To the best of our knowledge, it is also the first and unique study dedicated to incremental pairwise learning.
Furthermore, we extended MLPPs by the calibration technique introduced in Sec- tion . This chapter reflects the examination on four dataset from different domains carried out by the first major study on this extension of the pairwise preference learning framework ( ). The study demonstrated the effectivity of the arti- ficial label and combination with the conventional relevance classification approach in order to produce accurate bipartitions based on the label ranking outputs. Although we did not evaluate competing thresholding techniques (cf. Section ), the experimen- tal framework chosen allows us to evaluate independently from bipartitioning. Hence, we are still able to deduce that even using highly accurate thresholding techniques, the calibrated pairwise approach would outperform the BR approaches.
This chapter is organized as follows: Firstly, we introduce the base learning algorithm, the perceptron, in Section . Sections and continue with the description of the competing multilabel algorithms, the MLPP algorithm itself is discussed in Section . While Section provides an extensive analytical comparison, Section evaluates the approaches experimentally. Section discusses the results and Section sum- marizes the study in this chapter.
4.1 Perceptrons
A perceptron is a binary classifier initially developed as a model of the biological neuron ( , ). Internally, it computes a linear combination of a real-valued input vector and predicts the positive class if the result is positive, and the negative class
33 ( ) firstly introduced the calibration technique, but their evaluation was only on a
very small subset of rcv1 and on only one additional dataset. In addition, this work focused on the comparison of pairwise and uni-label-focused decomposition, while the present chapter takes actual improvements and adaptations to label ranking into account.
otherwise. Therefore, it belongs to the family of linear classifiers, which is defined in the following.