• No se han encontrado resultados

[Nguyen et al., 2011a, Nguyen et al., 2011b, Nguyen et al., 2014],

[Xue and Hauskrecht, 2017b, Xue and Hauskrecht, 2017a, Xue and Hauskrecht, 2018] and [Xue and Hauskrecht, 2019] have explored ways of enrichment of standard instance labeling with additional auxiliary information. That is, besides asking for instance labels they also allow the annotators to provide soft-labels reflecting the annotator’s belief the class label is indeed true. This additional information provides not only robustness to the acquired labels but also more flexibility for humans to express their belief in the label, thereby helping learn more accurate models. Hence, thesoft-label feedback expands onexact orhard instance-label feedback.

There are different ways to express the soft-label feedback. One way is to useprobabilistic labels [Nguyen et al., 2011a, Nguyen et al., 2011b, Nguyen et al., 2014]. For example, when obtaining feedback from a physician on whether a patient suffers from a particular disease or not, the binary true/false feedback can be refined by inquiring about the physician’s belief about the chance of the disease’s presence. Say, “The probability that this patient will have heart disease is 70%”. Another way, for example, [Xue and Hauskrecht, 2017a, Xue and Hauskrecht, 2018, Xue and Hauskrecht, 2019] proposes to use Likert-scale cate- gories [Likert, 1932] as the auxiliary information. Briefly, Likert defined a set of ordinal

categories humans can use to provide information about the strength of agreement (or be- lief) in the respective class labels. With such Likert-scale labels, humans do not have to provide an exact probabilistic label or a confidence score. For the above disease example again, a physician can express that if he/she agrees, weakly agrees, is neutral, weakly dis- agrees, or disagrees that the disease will be present on that patient.

As for model learning, they develop algorithms that are based on ordinal regression and ranking. Specifically, they aim to learn a ranking function g(x) = wTx from training data {(x1, y1, u1), ...,(xn, yn, un)}. Here xi and yi are standard (input, output) pairs, but ui is an

additional soft label indicating the confidence that the data instance xi falls into the class

yi. Both yi and ui are assigned by human annotators. The reason for learning a ranking

function is because there exists an ordinal relationship among these training triplets. In other words, if two data instances xi and xj are assigned ordinal labels where ui > uj,

they expect that the same order should be preserved as well by the ranking function, i.e.

g(xi)> g(xj). Therefore, they build such an ordinal relationship as a set of constraints and

add them to a standard optimization procedure. For example, a popular choice is ranking- SVM [Joachims, 2002]. Its key idea is to find the best separating hyper-plane among training data while also satisfying the ordinal constraints. Concretely:

minimize: wTw 2 +B n X i=1 ηi+C n X i=1 n X j6=i ξj,i subject to: yi(wTxi+w0)≥1−ηi ∀i wTxi ≥wTxj + 1−ξj,i ∀(ui > uj) ηi, ξj,i ≥0 ∀i, j

In the objective function above, the first term is a regularizer of w; the second term (single sum) defines the hinge loss on binary labels; the third term (double sum) defines the loss function between each ordered pair suggested by the soft labels. Once the minimizer ˆw is learned, the ranking function g(x) = ˆwTx will be determined. And finally, a classification

functiony=f(x) can be further built upong(x) by determining a threshold on the ranking line.

2.3.1.1 Active Learning With Auxiliary Label Information To further save human annotation effort, [Xue and Hauskrecht, 2017a, Xue and Hauskrecht, 2018] and

[Xue and Hauskrecht, 2019] propose to actively query the labels (both the class labels and the auxiliary) for the training triplets {(x1, y1, u1), ...,(xn, yn, un)}. For example, in the

work of [Xue and Hauskrecht, 2017a] they use expected model change (EMC) as the active learning strategy. EWC works similarly to the maximum model change introduced earlier, and it measures to what extent each unlabeled instance would change the model if it were labeled. As the true label is unknown before it is queried, they approximate it with a label distribution that is inferred by the classification model they aim to learn.

With more details, suppose one has already learned a classification modelfLfrom current labeled data L. Also assume that there are m possible Likert-scale ordinal categories for

x. Based on current model f = fL, the probability that x is assigned a Likert-scale label

u = 1,2, .., m is inferred as pf(u|x). With this label distribution, one can estimate the

expected model change as follows. For each u, label an instance x as (x, y, u)1, add the triplet toL, and learn an add-one modelfL∪(x,y,u). The model change of fL∪(x,y,u) compared

to fL is computed as δ(x, u). m Likert-scale labels give rise to m model changes δ(x, u),

u = 1, ..., m. Then, the expected model change ∆f(x) of x based on current model f is

calculated as: ∆f(x) = m X u=1 pf(u|x)δ(x, u)

Finally, the instance x∗ that leads to the maximum expected model change will be selected for labeling. Please note that this ∆f(x) is one instantiation of the general utility function

UM(x) in line 5 in Algorithm 1 within the framework of pool-based active learning (Section § 2.2.1).

Documento similar