• No se han encontrado resultados

5. a) INTEGRIDAD Un contador debe ser recto, honesto y sincero al realizar su trabajo profesional.

2.2 CONTROL INTERNO

Graph-based semi-supervised learning [9] usually employ undirected graph methods to con- struct graphs that connect similar data patterns of the labelled and unlabelled data set. Both labelled and unlabelled patterns are represented as nodes and edges the distances among the patterns, respectively. The edges are assigned with weights corresponding to their pairwise similarities. Thus, the graph can be represented by the weight matrix,W, which is symmetric.

If there is a connection between bothxiandxjpatterns thenWi j>0 otherwiseWi j =0. In addition, for these patternswi jis non-zero, each pair of patterns should be in the same class if they are connected by a strong edge (highly similarity). Given such a graph, the smoothness assumption is the main assumption in the semi-supervised learning graph-based methods. The common used similarity graphs are:

• k-nearest neighbourhood graph wherewi j = 1 ifxiis among the k-nearest neighbours ofxjor vice-versa andwi j = 0 otherwise.

• ε-nearest neighbourhood graph xi are connected by an edge withxj if the distance

d(xi,xj)≤ε.

• The similarity graph with respect to the popular weight matrix is the Gaussian kernel or radial basis function (RBF) kernel,

Wi j = exp −∥xi − xj∥ 2 2σ2 , (2.56)

whereσ is the kernel bandwidth.

Given labelled,Xl ={(xi,yi)}li=1, and unlabelled,Xu={(xi)}li=+ul+1, training patterns for a binary classification problem,yi∈ {−1,+1}, wherexi∈X ⊆Rd is a feature vector of training patterns that describing theithexample. LetG= (V,E)be an image of the weighted graphs, whereE is the set of edges andV is a set of image nodes,V =VL∪VU. The goal of the graph based semi-supervised learning is to propagate the label information fromVLto the

VU of the graph. In this section, different graph-based methods are introduced that some of these discussions a based on that from Zhu and Goldberg [93].

Min-cut algorithmThe first graph-based semi-supervised learning method is the Min- cuts algorithm proposed by Blum and Chawla [9]. Based on graphs, the min-cut algorithm

attempt to find minimum cuts in graphs that minimise the number of edges that are given different labels in order to learn from both labelled and unlabelled data. Mathematically min-cut algorithm minimises:

minf:f(x)∈{−1,1} ∑li,+j=u1wi j f(xi) − f(xj) 2

s.t. f(xi) =yi f or i=1, . . . ,l. (2.57) The min-cut algorithm is subject to fixingyifor labelled patterns and results in hard labels for unlabelled patternsyi∈ {0,1}wherei=l+1, . . . ,l+u. The min-cut optimisation problem can be solved using max-flow algorithm for undirected graphs Blum and Chawla [9]. After the graph is built, the two special nodes(v+,v−)called classification vertices are connected by edges of infinite weight to the labelled patterns,w(v,v+) =∞,vis a positive pattern and

w(v,v−) =∞,v is a negative pattern. Finding the minimum cut in the graph is the main step of the min-cut algorithm. The graph can be cut in two parts that containV+,v+∈V+ andV−,v− ∈V−by finding and removing a set of edges with minimum total weight. We assign label+1 to unlabelled patterns from the set of theV+classification vertices, and label −1 from the set of theV−classification vertices. The drawback of the min-cut algorithm is that a single pattern may be left out in a partition after the cut the graph. In this case, a highly unbalanced partitioning can be obtained. Blum et al. [10] employed bagging to fix min-cut graph issue. Later, Zhu et al. [94] applied an iterative algorithms for graph based semi-supervised learning known as Label propagation. The intuition behind Label propagation is for each node to iteratively pass their label to the neighbour’s nodes until convergence.

Harmonic Function

In order to relax the binary constraintsyi∈ {−1,1}fori∈U in the min-cut algorithm to continuous labels, Zhu et al. [92] introduced a new method based on Gaussian random fields and harmonic functions. As a first step of relaxation, a harmonic function is a function that

was has given the weighted average of the value on the unlabelled data, however, the valueyi for labelled data is still fixed,yi∈ −1,1, f(xi) =yi, i=1, . . . ,l.

f(xj) = ∑

l+u

k=1wjkf(xk) ∑lk+=u1wjk

, j=l+1, . . . ,l+u.

The harmonic function simply computes a continuous prediction function f on a given graph

G= (V,E)and assigns f to the unlabelled patterns which is the weighted average of its neighbours’ value. Thus, it is known as as soft version of min-cut algorithm and it is the solution to the same optimisation problem in (2.57).

minf:f(x)R ∑li+,j=u1wi j f(xi) − f(xj) 2

s.t. f(xi) =yi f or i=1, . . . ,l (2.58) Equation (2.58) can be minimized to find the optimal value for f in a continuous space which means f(x)values fall between−1 and 1 by solving a linear equation. The unique value for

f(x)does not correspond to a label, therefore, it can be converted to a label by applying the thresholds which is a drawback for this methods:

yi= +1 i f f(x)>=0,

yi=−1 i f f(x)<0. (2.59)

Zhu et al. [92] used a random walk to interpret the harmonic function. A transition probability matrix was used to propagate labels, which probability random move from vertex ito j.

P(i/j) = wi j

∑kwik.

The easier way to obtain the closed-form solution is the harmonic function with a Laplacian matrix which basically is a matrix notation for the Laplacian matrix.W is a weight matrix for both labelled and unlabelled data,Dis a diagonal matrix,

Di j=      ∑lk+=u1wjk :i= j 0 otherwise.

Then the graph Laplacian matrixLis given as follows:

L = D−W (2.60)

Now, the regularisation term (2.58) can be written as follows:

l+u

i,j=1

wi j f(xi) − f(xj)2= fTL f, (2.61)

where f = (f(x1), . . . ,f(xl+u)). In order to find labels for unlabelled patterns, we can partition the f vector into(fl,fu)and Laplacian matrix into sub-matrix respectively:

   Lll Llu Lul Luu   

Let yi = (y1, . . . ,yl)T then using Lagrange multipliers with matrix algebra to solve the optimisation problem,

fl = yi,

fu = −L−uu1Lulyl, (2.62)

Zhou et al. [88] used the normalized graph Laplacian to propagate labels.

˜

L = I−D−12W D 1

2 (2.63)

Then, the close form solution for (2.61) can be obtain by setting to zero the partial derivative w.r.t to the regularisation matrixL.

These methods are transductive that normally use the structure of the graph to propagate labels from the labelled data to the unlabelled in the graph. Belkin et al. [8] proposed a

manifold regularisation method called the Laplacian Support Vector Machine (LapSVM). This approach is the inductive methods that has able to predict labels for the unseen patterns into the graph, f is defined over the whole feature space, f :X →R . The optimisation manifold regularisation problem can reprsent as follows:

minf:f(x)∈R λ‘fTL f+λ2∥ f ∥2 (2.64)

whereλ1,λ2≥0. λ1is a regularisation parameter to f in order to be smooth with respect

to the graph Laplacian. The second regularised term is λ2which enforces smoothness in

order to improve generalization classification performance. The complexity of the Laplacian Support Vector Machine was reduced by Melacci and Belkin [53].