• No se han encontrado resultados

Ventajas y Desventajas

CONCLUSIONES Y RECOMENDACIONES

The inference task that has received most attention by the PLP community is computing the marginal probability of a ground query atom q, the MARG task, and its generalization the COND task, that computes the probability of a query

qgiven evidence E = e. In PLP literature the MARG task is also referred as

the success probability of the query.

In statistical relational learning (SRL) [20] and probabilistic graphical models (PGM) [37] one of the key tasks is to find the most likely state of the world where a set of observations (the evidence) holds, also called MPE inference. ProbLog aims to bridge the gap between PLP and SRL by supporting tasks common for both fields.

Formally, the Most Probable Explanation (MPE) task, is defined as follows. Definition 5.2 (Most Probable Explanation). Given a probability distribution

P(V ) over a set of discrete random variables V and a truth value assignment e

to a subset of random variables E ⊆ V (the evidence), the task of finding the most probable explanation (MPE) is to determine a truth value assignment u to the remaining random variables U = V \ E with maximal probability, that is,

MPE(E) = arg maxuP(U = u | E = e).

A solution of the MPE task is also called an MPE state. Solution techniques developed for MPE include methods based on knowledge compilation [10], integer linear programming [82], and weighted MAX-SAT [61].

For a ProbLog program without ADs, V is the set of ground atoms – probabilistic and derived, E = e fixes the truth values for a subset of those, and the task is to find the most likely assignment to all other atoms, that is, the possible world with the highest probability in which the evidence holds. As in

[16], we assume ProbLog programs with finite groundings2.

For a set of annotated disjunctions A = {a1, .., ak}, the most probable

explanation boils down to finding the selection in T r(A) with highest probability, amongst the ones in which the evidence holds. Given that a selection defines an interpretation, the MPE state is the truth value assignments according to this

interpretation. LetSE=e= {σ|I(σ) |= E = e} be the set of selections in which

the evidence holds and ˆσ = argmaxσ∈SE=eP(σ), then MP EA(E) = I(ˆσ).

2This is known as the finite support assumption, and can be achieved by considering the

Example 5.6. For the set of ADs in Example 5.3, the evidence blue(b1)

= false makes σ3 invalid. The MPE state given the evidence is the one selection

among σ1, σ2 and σ4 with the highest probability. This is σ4= {no_pick(b1)}

with probability 0.4. That is, in the MPE state no_pick(b1) is true.

The ProbLog encoding of these ADs (cf. Example 5.5) has 8 possible worlds defined by the 3 probabilistic facts of the encoding. These possible worlds and their probabilities are (p(b1) and np(b1) are short for pick(b1) and no_pick(b1), respectively):

Possible pf(1, 1) pf(1, 2) pf(2, 1) red(b1) green(b1) blue(b1) p(b1) np(b1) Pi)

World ω1 T T T T F F T F 0.27 ω2 T T F F F F F T 0.18 ω3 T F T T F F T F 0.09 ω4 T F F F F F F T 0.06 ω5 F T T F T F T F 0.18 ω6 F T F F F F F T 0.12 ω7 F F T F F T T F 0.06 ω8 F F F F F F F T 0.04

The evidence blue(b1) = false holds in all worlds except ω7. The MPE task,

according to Definition 5.2, is to identify the possible world with the highest probability. Possible world ω1 has the highest probability (0.27). Then, the MPE

state of the ProbLog program is {pf(1, 1) = true, pf(1, 2) = true, pf(2, 1)

= true}. For this MPE state no_pick(b1) is false and pick(b1) is true, which

is different from the MPE state of the underlying set of ADs. 4 Example 5.6 illustrates that using the ProbLog encoding of annotated disjunctions can result in incorrect MPE states. The reason is that several possible worlds of the ProbLog encoding may correspond to the same selection

in the probability tree – in Example 5.6 possible worlds ω1, ω3, ω5 and ω7

correspond to one selection, namely to σ4. This is not a problem when computing

marginal probabilities, as the probabilities of all possible worlds where the query is true are summed in this case.

5.3.1

MPE as MAP

In [16] MPE is mentioned as a special case of the more general maximum a posteriori (MAP) inference. MAP is the task of finding the most likely values for a set of query atoms given partial evidence. The query atoms are a subset of all atoms of the ProbLog program for which evidence is not given. Solving then the MAP task requires to marginalize over the atoms which are neither given as queries nor as evidence. That is, we have to apply maximization over the query atoms given the evidence atoms and summation over the rest. MAP inference

can be used in order to find the MPE state of a ProbLog program with ADs. For the ProbLog encoding of such a program we can give as queries all the atoms of that program excluding the probabilistic facts generated by the encoding. Then solving the MAP task will give the truth value assignments of these atoms which is in practice the MPE state of the initial program. The MAP inference is computationally expensive, while MPE can be computed efficiently [26, 10]. The new encoding we introduce in Section 5.4.1 allows to solve the MPE task on ProbLog programs with ADs both correctly and efficiently by enforcing a one-to-one correspondence between selections and possible worlds.

5.3.2

Relation to Most Probable Proof

In PLP the term most probable explanation typically is used interchangeably with most probable proof, also called Viterbi proof [58, 67, 32, 3]. A proof

(or explanation) ω0 for a query is a partial truth value assignment (or partial

possible world) such that for all full assignments extending the proof, the query holds. Finding a most probable proof (the VIT task) is different from MPE in that it does not aim at finding the state of all unobserved variables, but an assignment to a small set of variables sufficient to explain a query. More

formally, given a query q, we have V IT (q) = arg maxω0∈E(q)P(ω0) with E(q)

the set of all explanations or proofs of q.

For certain types of models, finding the Viterbi proof for query q corresponds to solving MPE with q = true as evidence. This is true for instance in programs modeling Hidden Markov Models, where both VIT and MPE have to make one choice per time point, but does not hold in general:

Example 5.7. In the program below, query win has two proofs: the first uses

facts red and green and has probability 0.4 · 0.9 = 0.36, the second uses facts

blue and yellow and has probability 0.5 · 0.6 = 0.3. Thus, the Viterbi proof

for win is the first one. The MPE state for evidence win = true, however, is

{¬red, green, blue, yellow}, as can be easily verified, and this state does not

extend the Viterbi proof.

0.4::red. 0.9::green. win :- red, green.

0.5::blue. 0.6::yellow. win :- blue, yellow.

5.3.3

Relation to Most Probable Explanation for Bayesian

Networks

To encode a Bayesian network (BN) using annotated disjunctions, each row in each conditional probability table (CPT) is encoded as a single AD to capture the value assignments and a the probability. There is only one CPT per variable and the rows in a CPT express mutually exclusive value assignments to a set of variables (the parents of a node). The parents of a node form the body of the AD.

Example 5.8. Consider the well-known burglary-earthquake-alarm Bayesian

network:

alarm

burglary earthquake

calls(john) calls(mary)

burglary T F 0.3 0.7 earthquake T F 0.2 0.8 alarm T F burglary earthquake 0.9 0.1 T T 0.8 0.2 T F 0.1 0.9 F T 0.001 0.999 F F calls(john) T F alarm 0.8 0.2 T 0.1 0.9 F calls(mary) T F alarm 0.8 0.2 T 0.1 0.9 F

Encoding the network as a ProbLog program with annotated disjunctions uses probabilistic facts the burglary and the earthquake nodes. Probabilistic facts encode random events with binary outcome and therefore are suitable to represent the burglary and earthquake nodes of the network. With ADs we encode the other nodes. It is sufficient to use ADs with only one head atom (see Section 5.1.2). The body of an AD corresponds to the parent nodes with their value assignments.

person(john). person(mary).

0.3::burglary. 0.2::earthquake.

0.9::alarm <- burglary, earthquake. 0.8::alarm <- burglary, \+earthquake. 0.1::alarm <- \+burglary, earthquake. 0.001::alarm <- \+burglary, \+earthquake. 0.8::calls(X) <- alarm, person(X).

0.1::calls(X) <- \+alarm, person(X).

It has been shown that this encoding with ADs expresses the same probability distribution as the original Bayesian network [49]. We can shown that, given Definition 5.2, the MPE state of the Bayesian network is equivalent to the MPE state of a set of ADs that encode the Bayesian network.

The probability tree inferred by such a set of ADs has as property that each leaf has a unique interpretation. Therefore, the interpretation with the highest probability and the selection with the highest probability are equivalent. An intuitive proof can be constructed as follows: For any given CPT for a variable

a, at each point in constructing the probability tree, if all parent variables

of a certain CPT are present in the current partial interpretation, exactly one rule with a as head has a condition that is true. For each value of a a subtree is instantiated. None of the other rules can have a true condition in any of the subtrees due to the mutual exclusiveness. This guarantees that each subtree has a different value assigned to a and no two interpretations in different subtrees can be identical. When each leaf represents a unique interpretation, each selection maps to a unique interpretation with equal probability.

Documento similar