=LQJLEHURIILFLQDOH5RVFRH 1RPEUHYXOJDU HQJLEUH
COMPOSICIÓN QUÍMICA Y PRINCIPIOS ACTIVOS
8 LOCK DE UGAZ, O (2001): Manual de Fitoterapia Capí-
rating Privileged Information
Consider a training data set(xi, yi)∈Rm× {1, ..., K}, wherei= 1,2, .., , n, andKis the num-
ber of ordered classesK > K −1 > ... > 1. Assume that additional (privileged) information
x∗i ∈ X∗ may be given about training examplesxi ∈X, i = 1,2, ..., p ≤ n. As in the case of
nominal version of IT for LUPI (Section3.4.2), the aim here is to learn a data metricC for the original spaceX informed by inter-point distances in the privilegedX∗ space. The privileged information inX∗is used to describe sets of similarityS+and dis-similarityS− constraints, as
defined in section3.4.2. However, due to the ordinal nature of the underlying training classes, the class order information will be explicitly taken into account in the constraints derivation, as well as in distance metric learning for the original spaceX.
5.3.1
(Dis)similarity Constraints Derivation
Consider a privileged pair (x∗i, x∗j) ∈ X∗ with distancedM∗(x∗i, x∗j), given in Eq.(3.8), and the corresponding original training pair (xi, xj) ∈ X with distancedM(xi, xj), given in Eq.(3.6).
Whereas in nominal IT for LUPI constrains are decided based on proximity information and label agreement, in the OIT instead of strict label agreement, we will use the absolute class difference,
H(xi, xj) =|c(xi)−c(xj)| (5.1)
, which has been employed before in Section4.3.1, wherec(x)denotes the class label ofx. Given a “tolerable class difference threshold”κ≥ 0, defined on the range of the loss func- tion1, the (dis)similarity setsS
+andS−are now constructed as follows2:
• IfdM∗x∗
i, x
∗
j
6l∗ andH(xi, xj)≤κ(close in their class order), then(xi, xj)∈S+.
• IfdM∗x∗
i, x
∗
j
>u∗andH(xi, xj)> κ(apart in their class order), then(xi, xj)∈S−,
wherel∗ andu∗ are ‘small’ and ‘large’ distance thresholds (onX∗), respectively.
Thus, relatively close privileged points with low rank loss error are considered as ’similar’, while relatively apart privileged points with high rank loss error are constrained as ’dis-similar’.
5.3.2
Weighting Scheme for the Metric Learning
Unlike the nominal IT for LUPI, the proposed OIT method aims to learn an optimal metric in spaceXwhere distances induced among similar/dis-similar data pairs preserve the natural order relation between their classes. Thus, the notion of similar/dis-similar data pairs vary according
1In our case[0, K−1].
2Note that it is not necessary for all training points inXto be involved pairs of points inS
to the corresponding class differences. Loosely speaking, if the class of point xi is closer in
order to the class ofxj than to the class ofxq, i.e. H(xi, xj)< H(xi, xq)≤ κ, then during the
metric learning the ‘force’ pulling togetherxi andxj is larger than the force applied onxi and xq. Analogous principle applies to the “repulsive force” applied on dis-similar pairs.
In the following we will propose a weighting scheme1for the OIT for LUPI which controls
the amount of distance updates imposed on data pairs. There are two distinct weighting schemes for similar and dis-similar points.
1) Weighting two similar points in(xi, xj)∈S+:
We propose a Gaussian weighting scheme,
ϑ+ij = exp ( −(H(xi, xj)) 2 2σ2 + ) , (5.2)
where,σ+is the Gaussian kernel width.
2) Weighting two dis-similar points in(xi, xj)∈S−:
Denote byεmaxthe maximum class rank difference within all dis-similar pairs(xl, xq)∀(l, q)∈ S−, i.e.,
εmax= max
(xl,xq)∈S−
H(xl, xq)
The weight factor ϑ−ij for two dis-similar points(xi, xj) ∈ S− is then calculated as fol-
lows: ϑ−ij = exp ( −(εmax−H(xi, xj)) 2 2σ2 − ) (5.3)
whereσ−is the Gaussian kernel width2.
The calculated weighting factorsϑ± are utilized in the new OIT scheme presented in the next section.
1A similar technique was originally introduced in Chapter4, Section4.3.2, for ordinal prototype based models. 2We employed a grid search over the training sets (via cross-validation procedure) to identify the ‘optimal’ values ofσ+andσ−.
5.3.3
Ordinal-Based Metric Learning Algorithm
We aim to learn a new positive definite matrix (metric tensor) C on X, yielding the squared distance
dC(xi, xj) = (xi−xj)TC(xi−xj), xi, xj ∈X,
that while incorporating dominant distance relations in the privileged space X∗, also respects the class order.
Distance metric updates for similar/dis-similar pairs in space X are performed using the corresponding weightsϑ±. Thus, different degree of attraction and repulsive forces (based on data pairs class order relations) are allocated among similar and dis-similar pairs, respectively.
As in the standard ITML [65], the similarity between two the metricsCandM is measured through the Bregman divergence (Burg) defined over the cone of positive definite matrices. Hence, the learning task is posed as the following constrained minimization problem:
min
C0DBurg(C, M), subject to
dC(xi, xj)≤l·ϑij+, if (xi, xj)∈S+, and
dC(xi, xj)≥u·ϑij−, if (xi, xj)∈S−, (5.4)
where0< l < uare the small and large distance thresholds onX, respectively.
Similarly to the original ITML model [65] and the IT approach in Section3.4.2, in the OIT, for guaranteeing a feasible solution forC, the trade-off parameterν >0is used governing the influence of the constraints (and hence the influence of the privileged information). Let s(i, j)
denote the index of the (i, j)-th constraint, and letξ be a vector of slack variables, initialized toξ0, with components equal to l for similarity constraints andufor dissimilarity constraints.
Section3.4.2. The optimization problem can be reformulated as follows,
min
C0,ξDBurg(C, M) +ν·DBurg(diag(ξ), diag(ξ0)) subject to dC(xi, xj)≤ξs(i,j)·ϑ+ij, if (xi, xj)∈S+, and
dC(xi, xj)≥ξs(i,j)·ϑ−ij, if (xi, xj)∈S−. (5.5)
The algorithm is initialized withC equal to the Mahalanobis matrix of the data distribution in the original space X. Similarly to the IT approach (Section 3.4.2), optimizing (5.5) involves repeatedly projecting (Bregman projections) the current solution onto a single constraint, via the update given in Eq.(5.6) [65]. The OIT algorithm for LUPI in ordinal classifications can be summarized in Algorithm9. The description of the optimization algorithm is given in section
3.3.1.