• No se han encontrado resultados

In the following, a regularization framework is proposed to regroup both the sample and optimiza- tion methods. A comparison of the invariant kernel methods is then proposed to show the different spaces in which the distance measures are considered by the different methods. The end of this section briefly discusses the perspective of combining the methods.

Sample and constrained methods in a regularization framework

Though stemming from different approaches, many of the presented methods can be interpreted in a regularization framework from the way they are implemented. Actually both sample and constrained methods amount to minimizing a composite criterion JC with additional constraints

min JC= JSV M+ JP K (2.82) s.t. cSV M ≤ 0

cP K ≤ 0,

where JSV M and cSV M correspond to the usual SVM criterion and constraints in (2.10), whereas JP Kand cP Kare added to the problem to include the prior knowledge. The setting for the different methods is derived in the following and summarized in Table 2.2.

All virtual sample methods (Sect. 2.4.1) basically add new samples to the training set, which simply corresponds to augmenting the size of the training set N by the number of virtual samples NV in the optimization problem (2.10). Separating the training samples xifrom the virtual samples ˜

xi in the writing yields the setting of Table 2.2 for (2.82).

Though originally developed for different purposes and from different points of view, all the methods based on weighting (Sect. 2.4.1) can be written as a standard SVM problem with the parameter C set to a different value Cifor each sample. For asymmetric margins, Cican only take two values C+and Cdepending on the class of the sample x

i: positive (for i inP = {i : yi= +1}) or negative (for i in N = {i : yi = −1}). Considering C as an average weight allows to write the problem as in (2.82) with the settings of Table 2.2 for asymmetric margin methods and the Weighted-SVM. Besides, the method of [294] for incorporating unlabeled samples can be seen as a mixture of virtual samples and weighting by considering the NU unlabeled samples as extra samples ˜xi weighted by Ci.

The KBLP and NLKBLP problems (Sect. 2.4.3 and 2.4.3) are directly formulated as in (2.82), except for JSV M and cSV M, which correspond now to the criterion and constraints used by the linear programming form of SVM (2.13). Because of the required discretization of the domain of knowledge, the NLKBLP method can be seen as adding virtual samples to the training set. In this case, the samples are not generated by transformations of other samples as usual, but chosen in the input space by discretization of a region that might actually contain no sample. An interesting point is that, although basically corresponding to the same problem from an optimization viewpoint, the

Table 2.2: Setting for the regularization framework with respect to the different methods. Method JP K cP K ≤ 0 Virtual samples C NV X i=1 ˜ ξi y˜i(h˜xi, wi + b) ≥ 1 − ˜ξi, i = 1, . . . , NV Asymmetric margin (C+− C)X i∈P ξi +(C−− C) X i∈N ξi WSVM N X i=1 (Ci− C)ξi Unlabeled samples NU X i=1 Ciξ˜i y˜i(h˜xi, wi + b) ≥ 1 − ˜ξi KBLP µ1 N X i=1 zi+ µ2ζ −z ≤ K(XT, BT)u + KDα ≤ z, dTu− b + 1 ≤ ζ NLKBLP µ Np X p zp f (xp)− h(xp) + vTg(xp) + zp ≥ 0, p = 1, . . . , Np

VSVM and NLKBLP methods are radically different. The first one aims at improving the boundary between two regions with training samples that are close but that belong to different classes. The second one becomes of particular interest when enhancing the decision function in a region of the input space that lacks training samples but on which prior knowledge is available.

The methods of section 2.4.3 for the incorporation of invariances conform obviously to this framework since they are initially formulated as the minimization of a composite criterion. However, as they correspond in practice to a modification of the kernel, they do not appear in Table 2.2, which focuses on the practical implementation of the methods.

Kernel methods: invariance via distance measure, but in which space?

Most of the kernel methods presented in section 2.4.2 implement transformation invariance by modifying the computation of the distance between the patterns. These methods are summarized in Table 2.3. It can be noticed that while the jittering kernel minimizes a distance in the feature space F , the sample-to-object distance is considered in the input space. On the other hand, the Haar-integration kernel does not look for a minimal distance, but rather computes the distance in F between averages over transformed samples.

Table 2.3: Invariant kernel functions. Z contains all the jittered forms of z. Method kernel function specificity

Jittering kernels k(x, z) = k(x, ˆz) zˆ= arg min

z∈Z kΦ(x) − Φ(z)kF Sample-to-object k(x, z) = k(x, ˆz) zˆ= arg min

z∈Pz

kx − zk Object-to-object k(x, z) = k(ˆx, ˆz) (ˆx, ˆz) = arg min

(x,z)∈Px×Pz

kx − zk Haar-integration k(x, z) = k0(x, z) k0(x, z) =hΦ(z), Φ(x)i

Combinations of methods

Combining methods from different categories (w.r.t. our categorization) seems possible since they act on different parts of the problem. Thus nothing prevents us from using a jittering kernel on an extended training set with prior knowledge on polyhedral regions of input space. But this does not exclude combinations of methods of the same category. Actually all the methods of Table 2.2 can be combined since they all amount to the addition of a term in the criterion and optionally some constraints. Such combinations become interesting when two methods are used for different purposes.

However, the combination of methods leads to an increase of the algorithm complexity. There- fore, the methods must be chosen with care by looking at their complementarity in order to yield a respectable improvement.