POBLACIÓN Y MUESTRA

CAPITULO IV 4 RESULTADOS Y DISCUSION

We provide a reference to the notation used throughout this chapter in Table5.2. The goal of the bootstrapping framework is to augment target domain labeled data D_ltwith a subset of instances from source domain labeled data Ds

l to improve overall classification accuracy

on the target domain unlabeled data Dt

u. For this purpose, we first build a classifier using Dlt

and apply it to Ds

l to select a subset of informative instances. If the label of a source domain

instance is correctly predicted by the classifier, this instance is regarded as redundant, i.e., this knowledge is already contained in the target domain instances. If the predicted label is incorrect, then we consider this source domain instance as a candidate for addition, because it may contain knowledge that is lacking in the target domain labeled data. A scoring function (as presented in Section 5.3.2) is used to determine the informativeness of the

Table 5.2: Table of notation

Symbol Description

c Classifier

c0 Initial classifier trained on Dlt

c Final adaptive classifier

δ Threshold for selecting informative instances

γ(·, ·) Scoring function for the consistency between the content of

an instance and a label

k The number of informative instances to be selected per iteration

φ(·, ·) Scoring function for informativeness

λc_{(·, ·)} _{Scoring function for consistency factor}

λd_(·) _{Scoring function for diversity factor}

λs(·, ·) Scoring function for similarity factor

πc_{(·, ·)} _{Scoring function for the content similarity between two instances}

πl_{(·, ·)} _{Scoring function for the label similarity between two instances}

πu(·) Scoring function for the uncertainty factor

l Source domain labeled data

l Target domain labeled data

Dt_u Target domain unlabeled data

Dt _{= D}t

l ∪ Dut Overall target domain data

T Training Data for classifier c

T_correctt Set of instances from D_ltthat can be correctly classified by c0

wrong Set of instances from Tcorrectt that are misclassified by c

Ts _{Remaining source domain labeled data after selecting infor-}

mative instances in each iteration Ts

wrong Set of instances from Tsthat are misclassified by c

inf o Set of informative instances selected from Ts

X The observable feature space

Y The label space

candidate, and decide whether to select the candidate. The addition of informative source instances to Dt

l can be used to obtain a new clas-

sifier. Ideally, one would expect this new classifier to correctly classify more target domain instances. However, it may misclassify the target domain labeled instances that were correctly classified initially, if a few false informative instances containing inconsistent knowledge were selected. When such misclassification happens, we resort to a “counterbalancing” process to recover. This is achieved by adding these misclassified target domain labeled instances with their correct labels to improve the classification accuracy. In other

Algorithm 1: The bootstrapping framework Input: Ds

l, Dtl, Dut, k, δ

Output: Adaptive classifier ˆc : X → Y

1 Train an initial classifier c0with Dlt;

2 T_correctt ← Set of instances from Dt_l that can be correctly classified by c₀; 3 Initialize T ← Dt_l, T_{inf o}s ← ∅, T_wrongt ← ∅, Ts ← Ds_l;

4 repeat

5 T ← T ∪ T_{inf o}s ∪ T_wrongt ; 6 Train a classifier c with T ;

7 T_wrongs ← Set of instances from Tsthat are misclassified by c ;

8 T_{inf o}s ← Top k instances with informativeness φ(·, ·) greater than δ from T_wrongs ; 9 Ts ← Ts− T_{inf o}s ;

10 T_wrongt ← Set of instances from T_correctt that are misclassified by c; 11 until |T_{inf o}s | < k;

12 return c

words, those misclassified instances are given extra weight in the training data.

Algorithm1illustrates the bootstrapping framework. Specifically, the algorithm takes as input Ds

l, Dtl, Dut, a natural number k indicating the number of source instances to

be added per iteration, and a real number δ indicating the informativeness threshold for selecting source instances. The output is an adaptive classifier ˆc.

We start with training an initial classifier c0 using Dlt (line 1). We initialize Tcorrectt

with instances from Dt_l that can be correctly classified by c0 (line 2). We initialize the

overall training data T to D_lt, newly selected informative source domain instances T_{inf o}s to ∅, counterbalancing target domain instances T_wrongt to ∅, and source domain candidate instances Ts_{to D}s

l (line 3).

In every iteration, we first add the newly selected informative instances Ts

inf o and

counterbalancing target domain instances T_wrongt into the overall training data T (line 5) that will be used to train a new classifier c (line 6). We set T_wrongs to the instances in Ts whose labels are different from those predicted by classifier c (line 7). As discussed earlier, these instances have a potential to augment target domain training data by complement- ing them with the knowledge that they lack. We then set T_{inf o}s to the top k informative

instances selected from T_wrongs based on a scoring function that will be explained in Sec- tion5.3.2(line 8). We remove the newly selected informative source instances Ts

inf ofrom

source domain instances Ts_{(line 9). If a few false informative instances that contain incon-}

sistent knowledge were selected and added to the training data, classifier c may misclassify instances in T_correctt that were initially correctly classified by c0. To counterbalance such ef-

fect, we set T_wrongt to the instances in T_correctt that are misclassified by classifier c (line 10). The instances in T_wrongt will be added to the training data again (i.e., given extra weight) in a new iteration. As we iteratively select informative instances out of Ts, the remaining informative instances in Ts_{will be less and less. The whole process will stop when we can-}

not select sufficient number (a predefined number k) of instances in an iteration (line 11). The classifier c trained during the last iteration will be returned as the adaptive classifier.

In document Conocimientos, actitudes, prácticas y el impacto económico en el manejo de dengue en las regiones de Loreto, Ucayali y Madre de Dios: análisis de la encuesta de programas estratégicos 2017 (página 46-53)