CONDICIONS DE TREBALL a) Salaris - La negociació col·lectiva

La negociació col·lectiva

2. CONDICIONS DE TREBALL a) Salaris

HOMER is strongly related to existing approaches for learning hierarchical multilabel data, some of which were already discussed in the respective Section . The main difference is that those approaches assume an explicit hierarchical structure on the labels and are aimed at exploiting this. HOMER’s objective in contrast is to reduce computa- tional costs and increase effectivity. Consequently, this approach tries to preferably build up balanced hierarchies (in order to obtain beneficial balanced subproblems) and to max- imally exploit label correlations (in order to reduce expensive branching and improve predictive accuracy). Moreover, using the real hierarchy potentially ignores dependencies between labels in different subtrees, which could be a further advantage of using an artificial but adapted and optimized structure.

Nonetheless, hierarchical multilabel classifiers often predict and are structured and trained the same way as in HOMER: test and training examples are passed from the root trough the tree according to label tests at the edges (in contrast to feature tests for decision trees for example). ( ), ( ) and

( ) e.g. adopt this approach or compare to a baseline working this way. However, the main difference is certainly that these approaches have labels at the inner nodes, which generally are also predicted if the path stops at these nodes, whereas HOMER uses metalabels and labels fromLcan only be predicted at the leaves. Note that HOMER can easily be adapted in order to use a predetermined hierarchy on the labels, however, as explained above, HOMER follows a different objective which is probably not fully compatible to this approach.

A further difference to is that they interpret the classifier outcomes as probabilities. Specifically, they approximate the conditional probability P₍λ_u|λ_v ∈ P_x),

λv Hλuwithh(x). Hence the probability for a labelλu becomesP(λu|λv)·P(λv|λw)· · · for λ_u _H λ_v _H λ_w· · ·. This is an interesting option for extending HOMER in order to additionally predict rankings on the predicted set of relevant and irrelevant labels whenever soft classifiers are used as base. Note that pairwise decomposition naturally predicts such scores in the form of number of votes while BR explicitly requires soft base classifiers.

7.5 Summary

This chapter provided an empirical and analytical study of the performance of HOMER. Compared to previous work ( ), HOMER was extended by the inte- gration of pairwise decomposition, particularly QVoting calibrated label ranking.

Interestingly, the results on four large multilabel datasets with a variety of characteris- tics showed that the instantiation of the multilabel learner of HOMER to QCLR can lead to better results compared to instantiating it to BR at a small expense in training and classification time. HOMER improves the training time of conventional BR and the difference is even more important for QCLR. In terms of classification time, HOMER substantially improves QCLR, while for BR the analytically proven benefits appear for the two largest datasets in terms of number of labels. Except for themediamill dataset (where the dif- ferences are rather small), HOMER managed to improve the performance of the base multilabel learner (both BR and QCLR).

The main reason for employing HOMER was the scalability problem of QCLR in terms of memory with respect to the number of labels. The additional hierarchical decomposition and hence the pre-selection of considered pairwise preferences substantially reduces the amount of needed binary classifiers. In the same manner it provides a significant re- duction in training and testing time for the pairwise CLR methods. It is also shown that HOMER is able to equilibrate recall and precision, especially for QCLR which is known to underestimate the number of labels per instance for problems with a low label density.

8 Exploitation of Label Dependencies in

Parallel Tasks

In the previous chapters several improvements were presented in order to extend the application of the pairwise decomposition approach to domains with high dimensionali- ties. This enables us to consider the pairwise preference learning strategy to application areas which were unreachable before due to scalability constraints. One of these applica- tions consists in the joint processing of several interconnected tasks, commonly refered to asmulti-task learning. We consider in this chapter to simultaneously solve several multilabel tasks from different domains, which essentially results in a multiplication of the label dimensionality. However, this approach allows to exploit label dependencies accross domains.

The starting point of this study is the following exemplary scenario: Books in a library are typically cataloged according to different types or domains of associated character- istics, e.g. genre, language, topic, epoch, author, etc. This type of annotation of objects is a very natural and common approach not only in the cataloging of texts (Section ) but also e.g. when indexing music ( ). Each of these mappings could be seen and treated as independent from each other. In reality, however, there may be dependencies between the different associated values from different domains. An author may write only in a specific language and focus exclusively oncrime fiction. At the same time, crime fiction novels may often havemurder as one of their topics, etc. Thus, if we consider to learn a model that automatically catalogs books in a library database as a text classification problem, for instance, it may be advantageous to consider all the parallel subproblems as a single large joint problem instead of tackling each subproblem separately.

In principle this is the same idea as in multi-task learning. In multi-task learning, we have a set of related learning problems (tasks), i.e. problems that have a common shared representation of their objects. It has been shown that learning these tasks simultaneously and jointly outperforms the common approach of learning them separately (single-task learning, cf. Section . The library example can be seen as a special multi-task learning scenario in which each categorization domain represents a separate task, and all tasks share the same representation of their objects (books have the same representation, e.g. the same bag of words, in every task).

Simultaneously, the approach of considering the whole task rather than each sub-task separately is in principle also the basic idea behind many multilabel classification al- gorithms. Instead of considering each label as a separate problem, as in the popular binary relevance (one-against-all) approach, most of the recent approaches try to implic-

itly or explicitly take into consideration existing label correlations in order to improve the predictive quality (cf. Section ).

The approach that we propose is to consider the set of parallel multilabel tasks in the library as a single joint task, as in multi-task learning, and solve it with a conventional multilabel classification algorithm. Most of the recent and more sophisticated multilabel approaches may benefit from the parallel processing as they also benefit from the com- monality in a conventional multilabel setting. We propose in this to our knowledge first work on the subject to use pairwise decomposition, which implicitly considers label re- lationships by learning preferences between pairs of labels. Furthermore, the advances presented in this thesis in handling many, even thousands of classes despite the quadratic number of models enable us to address the considerably increased complexity when the subtasks are joined.

8.1 Related Work

Approaches that try to explicitly exploit label dependencies include the early work of ( ), in which generative models for labelsets are generated as a mixture of topic based word distributions, the conditional random fields parameterized by label co- occurrences by ( ) and the label correlations conditioned maximum entropy method of ( ), among others. A middle way is followed by ( ), ( ) and their (probabilistic) classifier chains by stacking the underlying binary relevance classifiers with the predictions of the previous ones, and by ( ), whose k-NN approach stacks the appearances of labels in the neighborhood as new features. Indeed, adding label depen- dent features is a very popular approach in order to consider dependencies. Chapter presents an approach relying on locally exceptional label constellations. Several other approaches are discussed within this context in Section .

However, the majority of the approaches implicitly consider dependencies by optimiz- ing a loss on the predicted ranking of the labels. MMP perceptrons (

), Rank-SVM ( ), Structural SVMs (

) and the BP-MLL neural network algorithm ( ) e.g. rely on this. The latter approach is conceptually very similar to the multi-task neural networks of ( ), as both train a common network with several outputs denoting the labels, i.e. task outcomes. This is a popular approach in multi-task learning, also applied to Bayesian networks ( ). Other techniques try to develop special kernel functions which model inter-task relations ( ), or use statistical Dirichlet processes for the bayesian modeling ( ).

A recent work in the field of natural language processing considers to jointly perform named entity recognition and syntactic parsing ( ). The approach in Chapter is similar, with the difference that we consider the recognition of each type of syntactic entity as a different task itself and merge these into one overall multilabel task.

A common problem to the referenced multilabel methods is their scalability in terms of number of labels, a factor which significantly increases when the subtasks are joined. Ex- isting large scale approaches rely on the binary relevance decomposition (

, , ) or using one-class classifiers ( ) (cf. also Section ). However, solving each sub label relevance problem in a separate way would not change anything in comparison to solving it as multiple single-task problems in our proposed setting, neither computationally nor predictively.

In document Les relacions laborals a Barcelona 1998 (página 56-67)