1. MARCO TEÓRICO
1.2. MORINGA (Moringa oleífera)
1.2.1. Propagación, suelo y clima
In this section we formally define the notion of a local optimal diagnosis and mo- tivate its appropriateness by discussing the examples depicted in Figure 4.1. We refer to the four alignments of subfigures I to IV as alignment AI, AII, AIII, and
AIV. Further we use α : A → [0, 1] to refer to a function that assigns a confidence value to each correspondence in an alignment.
We start our considerations with a simple principle. This principle will be refined later on step by step. For AI there are three diagnoses. Suppose now
that we have α(a) = 0.9, α(b) = 0.8, α(c) = 0.7, α(d) = 0.6, and α(e) = 0.5. Knowing that we have to remove one of a, d, e and given the knowledge of α, we would choose {e} as diagnosis from the set of possible diagnoses listed above. According to this choice, we solve the underlying problem by removing the correspondence with lowest confidence from the alignment. Our choice is based on the general rule of thumb ‘From each MIPS M remove the correspondence with lowest confidence’. However, this principle needs to be refined in order to be applicable to overlapping conflicts. We have to add the rider ‘... unless another correspondence inM has not yet been removed’.
In [MST06] we described an algorithm that is based on a slightly modified version of this principle. It was our first approach to resolve alignment incoher-
44 CHAPTER 4. ALIGNMENT DIAGNOSIS
ence based on reasoning in DDL. The algorithm randomly choses a conflicting set of correspondences and removes the correspondence with lowest confidence. It terminates until no further conflicts can be found. However, this algorithm leads sometimes to unreasonable choices. Moreover, the result of the algorithm depends on the order in which conflicts are processed. An example arises from AII and
the confidence allocation described above. If we process {c, d} at first, we have to remove d. With this choice we resolve the second MIPS {a, d, e} at the same time and our final diagnosis is {d}. If we start with {a, d, e}, we will first remove {e}. We continue with {c, d} and are forced to remove d. As a result of the approach we have removed {d, e} and have to discover that we have removed a non-minimal hitting set, i.e., we have not constructed a diagnosis.
The crucial point is to model the interdependencies between candidates for a removal in an appropriate way. The following recursive definition introduces the notion of an accused correspondence to solve this problem.
Definition 27 (Accused Correspondence). Correspondence c ∈ A is accused by A with respect to O1,O2andα iff there exists some M ∈ MIPS (A, O2, O2) with
c ∈ M such that for all c0 ∈ M \ {c} we have (1) α(c0) > α(c) and
(2) c0is not accused byA with respect to O1andO2.
We have chosen the term ’accused correspondence’ because the correspon- dence with lowest confidence in a MIPS alignment M is ‘accused’ to cause the problem. This charge will be rebutted if one of the other correspondences in M is already accused due to the existence of another MIPS alignment.
The notion of an accused correspondence reminds of Dung argumentation frame- work [Dun95], that is based on an attack-relation defined on a set of arguments. Indeed, we show in Section 11.2 how to define the attack relation for analyzing incoherent alignments in such a way that the set of accused correspondences is a preferred extension in Dungs framework.
Problems emerge if we are concerned with an alignment that contains a MIPS M with c 6= c0 ∈ M and α(c) = α(c0) = argmin
c∈Mα(c). According to Defini-
tion 27 none of the correspondences in M is accused. For that reason, we demand in the following that α imposes a strict order on A, i.e., α(c) < α(d)∨α(c) > α(d) for each c 6= d ∈ A. The requirement is not realistic for many matching sys- tems and the confidences α that they generate. In its practical application we have to fall back to an additional criteria α0 (or even a set of criteria α0, α00, . . .) with α0(c) 6= α0(d). However, in the following we neglect this aspect for the sake of simplicity and treat α as a confidence function which imposes a strict order on A.
The recursive character of Definition 27 allows us to infer the following propo- sition. We will later propose an algorithm for computing the set of accused corre- spondences. Proposition 5 is crucial for the correctness of this algorithm.
Proposition 5. Let A0∪A· 00 be a disjoint union of A with argmin
c∈A0α(c) > argmaxc∈A00α(c), a correspondence c ∈ A0is accused byA0iffc is accused by A.
4.2. TYPES OF DIAGNOSIS 45
Proof. Suppose that Proposition 5 is incorrect and let A∗ and A† be defined as A∗ = {c ∈ A0 | c is accused by A0 and is not accused byA} and A† = {c ∈ A0 | c is accused by A and is not accused by A0}. It follows that A∗∪ A† 6= ∅,
and in particular that there exists a correspondence ˜c = argmaxc∈A∗∪A†α(c). In the following we show that there exists no such ˜c and thus we indirectly prove the correctness of Proposition 5. First suppose that ˜c ∈ A∗ and ˜c /∈ A†. It follows
that there exists M ∈ MIPS (S, A0, O1) O2such that ˜c = argminc∈Mα(c) and all
c ∈ M \ {˜c} are not accused by A0. We also know that MIPS (S, A0, O1) O2 ⊆
MIPS(S, A, O1) O2 and thus M ∈ MIPS (S, A, O1) O2. Since ˜c is not accused
by A it follows that there exists c†∈ M \ {˜c} with α(˜c) < α(c†) which is accused by A and not accused by A0. Thus, α(˜c) < α(c†) and α(c†) ∈ A†⊆ A∗∪ A†con- tradicting our assumption. Now suppose that ˜c /∈ A∗and ˜c ∈ A†. Again, it follows
that there exists M ∈ MIPS (S, A, O1) O2such that ˜c = argminc∈Mα(c) and all
c ∈ M \ {˜c} are not accused by A. We also know that M ∈ MIPS (S, A0, O1) O2
since ˜c ∈ A0 and α(c) ≥ α(˜c) for all c ∈ M. Since ˜c is not accused by A0 it follows that there exists c∗ ∈ M \ {˜c} which is accused by A0 and not accused by A. Thus, α(˜c) < α(c∗) and α(c∗) ∈ A∗ ⊆ A∗ ∪ A†again contradicting our assumption that there exists an element in A∗∪ A†with highest confidence.1
The following proposition states that the set of all accused correspondences forms a diagnosis, i.e., is a minimal hitting set over MIPS (A, O1, O2). We give an
explicit proof.
Proposition 6. ∆ = {c ∈ A | c is accused by A with respect to O1 andO2} is a
diagnosis forA with respect to O1andO2.
Proof. Let ∆ be the alignment which consists of those and only those correspon- dences accused by A with respect to O1and O2. Further let M ∈ MIPS (A, O1, O2)
be an arbitrarily chosen MIPS alignment and let c∗= argminc∈Mα(c) be the cor-
respondence with lowest confidence in M. Due to Definition 27 we know that c∗ is either accused by A or there exists some c0 6= c∗ ∈ M which is accused by A. Thus, for each M ∈ MIPS (A, O1, O2) there exists a correspondence c ∈ M
such that c ∈ ∆. We conclude that ∆0 is a hitting set for MIPS (A, O1, O2). Let
now ˜c be an arbitrarily chosen element from ∆0. Due to Definition 27 there exists a MIPS M ∈ MIPS (A, O1, O2) with M ∩ ∆0 = {˜c}. Thus, A0\ ˜c is no hitting
set for MIPS (A, O1, O2) for any ˜c ∈ A0 which means that ∆ is a minimal hitting
set. Based on proposition 4 we conclude that A0is a diagnosis.
According to the notion of an accused correspondences, the whole collection MIPS(A, O1, O2) is not taken into account from a global point of view. Each
removal decision is the optimal choice with respect to the concrete MIPS under discussion. Therefore, we define the resulting set of correspondences as local opti- mal diagnosis.
1
46 CHAPTER 4. ALIGNMENT DIAGNOSIS a b d c e a b d c e a b d c e a b d c e I II III IV
Figure 4.2: Given a confidence distribution α with α(a) = 0.9, α(b) = 0.8, α(c) = 0.7, α(d) = 0.6, and α(e) = 0.5, local optimal diagnoses are marked as filled circles.
Definition 28 (Local Optimal Diagnosis). A diagnosis ∆ such that all c ∈ ∆ are accused by A with respect to O1, O2 andα is referred to as local optimal
diagnosis.
Let us take a look at the examples introduced in Figure 4.1. In Figure 4.2 we show the same alignments and have additionally marked the correspondences belonging to the local optimal diagnosis as filled circles. These diagnoses result from a confidence distribution α already specified at the beginning, i.e., α(a) = 0.9, α(b) = 0.8, α(c) = 0.7, α(d) = 0.6, and α(e) = 0.5. In the following we discuss these examples in detail.
Subfigure I: The local optimal diagnosis is ∆ = {e}. Based on the assumption that the confidence values impose a correct order of correspondences, it is the most reasonable choice to remove e from A.
Subfigure II: Although we have e = argminx ∈{c,d ,e}α(x ), e is nevertheless not
accused. This is based on the fact that d is already accused due to {c, d} ∈ MIPS(A, O1, O2) whereas c is not accused. Notice that c cannot be accused,
because there exists no MIPS M such that c = argminx ∈Mα(x ). Thus,
∆ = {d} is the local optimal diagnosis. Again, there exists no reason for choosing one of the other diagnosis {c, e} or {a, c}.
Subfigure III: The local optimal diagnosis is ∆ = {c, e}. We prefer it over {b, d} because α(b) > α(c) and α(d) > α(e). Again, we find the local optimal diagnosis by first determining that b is not accused. This leads to the conclu- sion that c is accused. Therefore, d is not accused and finally e is accused because of the remaining MIPS {d, e} and {b, e}.
Subfigure IV: We first notice that a = argmaxc∈Aα(c) cannot be accused. It
follows that both c and d are accused, because they are the ‘weakest’ corre- spondences in a MIPS where none of the other correspondences is accused. Since {c, d} forms a diagnosis, we can conclude that ∆ = {c, d} is already a local optimal diagnosis.
4.2. TYPES OF DIAGNOSIS 47
Notice that we implicitly used Proposition 5 in our considerations. Look for example at Subfigure IV. We started our consideration with the statement that a is not accused. This insight is based on dividing A in A0= {a} and A00= {b, c, d, e}. Obviously a is not accused by A0, since A0 is not even incoherent. Based on Proposition 5 we are justified to conclude that a is also not accused by A = A0∪ A00. This train of thought is the basis for Algorithm 6, which will be introduced in
Section 6.1.
So far, we argued with regard to all four examples that the local optimal di- agnosis is the most reasonable choice. However, the quality of the local optimal diagnosis depends on the confidence values. For the same alignments there are confidence distributions resulting in a local optimal diagnosis that is not the best choice. In answer to this we introduce the notion of a global optimal diagnosis. Hereby we will revisit some of the examples discussing different confidence distri- butions.