• No se han encontrado resultados

5. a) INTEGRIDAD Un contador debe ser recto, honesto y sincero al realizar su trabajo profesional.

2.1.2. PRINCIPIOS CONTABLES RELACIONADOS CON ACTIVOS FIJOS

Another popular algorithm for semi-supervised learning is co-training, introduced by Blum and Mitchell [11]. In the Co-training algorithm, the features in the training set are divided into two different sets (views). Co-training starts with training two separate classifiers, with the labelled data from its respective view. Then, each classifier labels the unlabelled data of its own view and the most confident predictions of each classifier on the unlabelled data are used to expand the training set of the other classifier. Afterwards, both classifiers are retrained with the newly training labelled data given by the other classifier, and the process repeats. For the co-training algorithm, there are two assumptions on the feature sets: Algorithm 3Co-training 1: Inputs: Xl←n(x(i),y(i)) o li=1 Xu← n x(i) o l+u i=l+1 f(1), f(2): two classifiers 2: Initialise: learn hypothesis f splitXl toXz(1) andXz(2);Xl= (X (1) z ,Xz(2)) 3: repeat 4: untilXu ̸= φ

5: ClassifyXuwith f(1)and f(2) separately

6: Select f(1)′stopkmost confident predictions intoXs(1), select f(2)

stopkmost confident predictions intoXs(2)

7: Xu=Xu−Xs(1)−Xs(2)

8: Xz(1)=Xz(1)+Xs(2)

9: Xz(2)=Xz(2)+Xs(1)

• Features can be split into two views and each view is sufficient to train a good classifier.

• The two views must satisfy the conditional independence given the class label.

The first assumption on the quality of the views is essential to the generalisation of both classifiers. If both views are sufficiently good, then we can trust the labels of each classifier

on the unlabelled data. The second assumption of conditional independence between the views is necessary for adding most confident data with predicted label by one classifier for the other classifier. If the conditional independence between the views holds, then each view can add the most informative unlabelled patterns to other view. However, co-training makes strong assumptions on the conditional independence between the views that are unlikely to be satisfied in a real world applications.

Dasgupta et al. [22] introduced a new theoretical study based on the Blum and Mitchell [11] paper. They proved that a lower generalization error for the co-training algorithm can be obtained by maximising the agreement with unlabelled patterns when the assumptions of the co-training algorithm are true. Blum and Mitchell [11] empirically investigated the possibility of the co-training algorithm working well. For this purpose, the original input features, which consist of the small amount of the labelled patterns and large amount of the unlabelled patterns, are artificially divided for the two views in order to achieve better result via the co-training algorithm. The results obtained show that using the unlabelled patterns to improve the base classifier through the co-training algorithm is difficult when a few labelled patterns are available. The best explanation for this conclusion is that the co-training assumptions (finding the best separation of the features for two views) is not valid with a small amount labelled data.

Due to the success of co-training but its relatively limited application, many works have proposed the improvement of standard co-training by eliminating the required conditions. Nigam and Ghani [57] proposed the Co-EM algorithm which is the combination of the co-training and expectation maximization (EM) and can probabilistically label the unlabelled patterns. The EM algorithm is used as a base for the new version of the co-training while in the basic co-training algorithm naïve Bayes classifiers are used. The Co-EM algorithm was applied to web page datasets and their results were better than the co-training algorithm. Goldman and Zhou [30] relaxed the co-training split views assumption that does not require

splitting the input feature for two views. In addition, the co-training algorithm used two different classifiers instead of just a single classifier. Each classifier can be obtained via an equivalence classes set by divide the input space.

In order to relax the conditional independence assumption between views, Zhou and Li [90] proposed the Tri-training algorithm, which uses three classifiers. In order to train one classifier, the remaining two classifiers should agree on the labelling of the unlabelled data and then it will be used in the training set of the given (third) classifier. If the splitting of the feature set is not straightforward, both tri-training and co-training may over-fit with the use of the most confident instances. In addition, the over-fitting classifier can lead to the degradation of classification accuracy because both methods depend on the quality of the subsets of features. More generally, we can define learning paradigms that utilize the agreement among different classifiers. Subsequently, they expanded this idea by proposing a new algorithm named the co-forest algorithm, that involved ensemble techniques to include a large number of base classifiers [49]. Multi-view learning models do not require the particular assumptions of co-training and it has access to separate classifiers. The classifiers might be of different types (e.g., naïve Bayes, decision tree, neural network, etc.) but they are trained on the same labelled data, and are necessary to make similar predictions on any given unlabelled data [93].

Similarly, the co-training algorithm was modified fro another semi-supervised method [89] called democratic co-learning. The current algorithm does not use multiple views but it uses multiple classifiers. In addition, democratic co-learning uses a weighted majority voting procedure for labelling the unlabelled patterns. The ensemble method is applied to the training of each classifier separately, using only the labelled patterns, then each classifier uses the unlabelled patterns to obtain predictions separately. Finally, majority voting is applied between the classifiers for labelling the unlabelled patterns. In the labelling procedures, the cross-validation was used over the labelled patterns to select the confidence of the unlabelled

patterns and also to evaluate the performance of the classifier. However, cross-validation might give poor estimates when the amount of labelled patterns is small Zhou and Goldman [89] and Zhou and Li [90].