Usually, the process of labelling data samples needs to be done by hand (i.e. by a person), which is time consuming and can be error prone. Sometimes, because the cost of associating each data sample with a label is high (e.g. the labelling process requires experts and/or special devices), there is only a portion of (typically, a small amount of) the data samples that are labelled. To address this special type of dataset, a group of algorithms called semi-supervised learning have been proposed, where both labelled and unlabelled data samples are used to determine mapping functions. Some commonly used ones in the direct approach group are listed, each of which combines an unsupervised learning technique with a supervised one.
Semi-supervised Local Fisher discriminant analysis
SEmi-supervised Local Fisher discriminant analysis (SELF) [69] is a method that com- bines PCA and LFDA, where it utilises PCA for both labelled and unlabelled data samples, and LFDA for the labelled ones alone. An example of embedding learn- ing that compares the learning results of SELF, LFDA and PCA is shown in Figure 2.8a. Two sets of data samples were generated from two different multivariate normal distributions, with different means (0.551,0.848) and (5.551,3.848) while a common covariance [1,1.5; 1.5,3]. Each set only has a small number of labelled data samples, with 5% of the data samples labelled. The SELF feature space (i.e. the magenta line shown in Figure 2.8a) has the most discriminant power, compared to the PCA and FDA feature spaces.
SELF uses the same optimisation problem as in FDA and LFDA, but with the between-class and within-class scatter matricesSb and Sw being redefined to be:
Sb = (1−β)Slb+βSt (2.51)
Sw = (1−β)Slw+βId×d (2.52)
where Sl
b and Slw are the respective between-class and within-class scatter matrices
calculated from the labelled data samples, according to LFDA. 0< β <1 is a trade-off parameter that inherits the properties and characteristics of both LFDA and PCA, and St is the total scatter matrix of all the data samples that is defined by PCA, as:
St= 1 2 n X i,j=1 1 n(xi−xj) (xi−xj) T (2.53) The optimal projection matrix V∗ is then obtained in the same way as FDA and LFDA, with its columns [v1,v2, . . . ,vk] being thekgeneralised eigenvectors of Sb and
Sw, corresponding to the klargest generalised eigenvalues [λ1, λ2, . . . , λk].
Semi-Supervised Fisher Discriminant Analysis Semi-Supervised Fisher Dis- criminant Analysis (SSFDA) [68] is another method of semi-supervised learning, which combines OLPP with FDA. It employs the same maximisation problem that is in FDA, but with the within-class scatter matrix Sw redefined to be:
Sw =Slw+β1XTLX+β2Id×d (2.54)
whereSlw is the within-class scatter matrix computed from the labelled data samples, based on FDA, while XTLX are computed from all the data samples, according to OLPP. β1 >0 and β2>0 are parameters that control the balance of the terms. Here, XTLXis added as a regularisation term [68], which is introduced to prevent overfitting. It can be computed based on the Laplacian matrix of a neighbouring graph, where it makes the use of the information provided by both labelled and unlabelled data samples.
Semi-Supervised Maximum Margin Criterion Semi-Supervised Maximum Mar- gin Criterion (SSMMC) is another semi-supervised method proposed in [68]. It is based on the use of MMC to avoid the small sample size problem. It redefines the within-class
x -4 -2 0 2 4 6 8 10 y -5 0 5
10 data samples of class 1 data samples of class 2 labelled data samples of class 1 labelled data samples of class 2 PCA feature space
LFDA feature space SELF feature space
(a) embedding 1 0 2 4 6 8 10 12 14 16 density 0 0.2 0.4 0.6 0.8 1
in PCA Feature Space
estimated density of class 1 estimated density of class 2
embedding 2 0 1 2 3 4 5 6 7 8 9 10 density 0 0.2 0.4 0.6 0.8 1
in LFDA Feature Space
estimated density of class 1 estimated density of class 2
(c) the estimated densities of the projected data samples in the LFDA feature space
embedding 3 0 1 2 3 4 5 6 7 8 density 0 0.2 0.4 0.6 0.8 1
in SELF Feature Space
estimated density of class 1 estimated density of class 2
(d) the estimated densities of the projected data samples in the DNE feature space
Figure 2.8: An example of embedding learning - SELF versus LFDA versus PCA: (a) the projections of the original 2D data in the SELF feature space possess the most discriminant ability.
scatter matrixSw in the maximisation problem of MMC to be:
Sw=β1Slw+β2XTLX (2.55)
where Slw is the within-class scatter computed from the labelled data samples, based on MMC, andβ1 >0 andβ2>0 are parameters that control the balance of the terms.