• No se han encontrado resultados

1.   INTRODUCCIÓN 1

1.1   Objeto de estudio 3

In many cases, a naive implementation of standard LP is slow. Besides an efficient parallelized implementation, possibly based on sparse matrices, there are various other modification that have been introduced to reduce the run time of LP because graph problems with several million nodes and thousands of labels call for a very efficient inference approach. Lifted Label Propagation (LLP), which will be described in Section 5.3, is akin to lifted probabilistic inference (see Section 2.4). While LP is based on matrix-matrix-multiplications, we have described above already that we could also implement LP via GaBP and in turn use lifted GaBP to exploit symmetries in LP [166]. However, this does not allow the usage of an out-of-the-box GaBP implementation because changes to the GaBP algorithm are required to account for the lifted model [3]. This is very similar to the modifications that we have seen in the previous chapters

for LBP and other lifted message passing algorithms. Counts are integrated into the messages, to reflect the structure of the lifted graph. Moreover, such a lifted LP approach is based on matrix inversion and requires several re-liftings which is impractical for graphs at massive scale.

However, having a problem specified in terms of a weighted graph and an inference algorithm in terms of matrix multiplications, we now take a look at lifting from the graph algebraic point of view. For this purpose, it is useful to define the notion of symmetries in graphs. A graph is symmetric if we can find an isomorphism to itself. Such an isomorphism is called automorphism. An automorphism of a graph induces a partition of the nodes and intuitively, the nodes in the same class are indistinguishable. Having an automorphism for a graph with n nodes, we can represent this mapping with help of an n× n matrix which commutes with the matrix specifying the graph. Unfortunately, there is no polynomial time algorithm known that decides whether two graphs are isomorphic [71]. A common intermediate step in determining graph isomorphisms is the calculation of the coarsest equitable partition (CEP) and it has been shown by Mladenov et al. [151] that the partition found by the Color Passing algorithm is identical to the CEP. The CEP also groups the nodes of the graph into classes and can equally be represented as a matrix. However, the CEP can be coarser than the partition induced by an automorphism [88].

In our lifted LP approach, we will directly lift the LP graph by calculating its CEP and we prove in the next chapter that we can then run vanilla LP on the compressed graph without any modifications to the LP algorithm itself. More specifically, we will use saucy2 [108] to find the CEP of the graph which partitions the graph into equivalence classes. This is similar to the Color Passing approach that was run on the factor graph to lift BP. However, saucy does not follow a color passing approach. Instead, it suggests to look at the problem of lifting in terms of algebraic operations and facilitates the notion of a matrix for the partitioning. saucy is not only extremely fast at computing the CEP but the resulting fractional automorphism also allows to construct the lifted matrix very efficiently. The advantages of using saucy in lifted inference algorithms have also been recognized in other approach such as [170]. For further information on equitable partitions of graphs and matrices, we refer the reader to [152] and the references in there.

Next to lifting, as presented in the next chapter, other approaches to speed up LP also avoid computations to decrease the computational costs. A recent approach was presented by Fujiwara and Irie [60], who reduce the run time by updating only the scores of a subset of labels in each iteration. Also similar in spirit to LLP, and the general idea of lifted inference, are the ideas of Alexandrescu and Kirchhoff [6]. They proposed to merge identically labeled nodes to speed up LP, whereas LLP intuitively clusters the entire graph. Actually, there are many more efficient LP approaches, see e.g. the references in [60] for an overview. However, any of these approaches can be used on top of our compressed graph.

The next chapter will introduce our relational lifted LP in greater detail and describe how this LP approach can be used to augment online bibliographies with geo-tags. We will motivate the need for our LLP approach by the existence of web-scale bibliographies where standard LP runs out of memory and any decrease in run time is most welcome.

2

Lifted Label Propagation

The previous chapter has described how Label Propagation (LP) can be used as an alternative to message passing algorithms for labeling problems. We have described how LP can be implemented by means of a simple matrix-matrix-multiplication and we have also motivated to look at lifting from a perspective independent of a message passing algorithm such as Color Passing. Accordingly, the papers most related to this chapter are:

• Zhu and Ghahramani [260], Zhu et al. [261]: the Label Propagation algorithm, its matrix-multiplication-based implementation, and entropy minimization for parameter learning.

• Katebi et al. [108]: the graph theory based symmetry detection which is implemented in saucy. saucy will be a core part of lifted LP.

In this chapter, we will enhance LP by techniques that have already been shown beneficial in Chapter 3 but are “non-standard” approaches in the LP literature. The contributions of this chapter can be summarized as follows:

C5.1 We use logical rules to construct a sparse similarity matrix for LP and we show how this approach can be used to augment online bibliographies with high accuracy geo-tags. C5.2 Standard LP does not exploit symmetries in the similarity matrix. Hence, we use lifting

techniques to compress the similarity matrix, obtaining a faster LP variant. This Lifted Label Propagation (LLP) requires less memory and still achieves identical labeling scores as ground LP.

C5.3 We show that the idea of bootstrapped evidence from the previous Section 3.3 can also be used to speed up convergence of LP.

These contributions have been submitted mainly in [86] and we proceed as follows. We will start by introducing our logical rule based LP which is used to label author-paper-pairs with geo-tags in online bibliographies. Afterwards we will show that LP’s run time can also suffer from unexploited symmetries in the underlying graph as we have seen it in Chapter 3 for

message passing algorithms. We will then describe how bootstrapping and Pseudo Evidence from Section 3.3 can be used for LP and quickly touch upon parameter learning for LP with a similarity function based on logical rules. We summarize this chapter before we continue with an in-depth analysis of our geo-annotated bibliographies.

5.1 Logical Rules for Label Propagation

As described above, the originally proposed similarity function leads to a completely connected graph that is impractical for very large problem instances. We therefore propose an alternative approach to constructing the edges and their weights. Intuitively, the weight of an edge should still be proportional to the similarity between the nodes, however, we will now define the similarity of two nodes based on formulas defined over logical predicates. Only those nodes are connected via an edge where at least one relation holds and hence Wi,j > 0. Similarly to the definition of MLNs (see Section 2.3.1), we define a set of tuples (Fa, wa). Here, Fa is a formula in first-order logic and wa∈ R is the weight of that rule. Along with these rules goes a set of constants C = C1, . . . , Cn. Each constant Ci corresponds to a variable Vi ∈ V . For now, we only use rules that are defined at most over two nodes which gives us the following weight for each edge: Wi,j = exp X a wa· fa(Ci, Cj) ! = exp X a wa· fa(i, j) ! . (5.1)

Here, the feature function fa corresponds to a grounding of formula Fa, with fa evaluating to 1 if the ground formula Fa is true, and 0 otherwise. We construct the LP-graph now based on (Fa, wa) and the constants in C. The graph construction distinguishes from the Markov network construction in MLNs in various aspects:

• MRFs based on a grounded MLN contain one binary node for each grounding of each predicate. For the relational LP, the nodes can have any finite range and each node corresponds to a domain object.

• The edges in an grounded MLN are based on the variables appearing together in formulas. In the LP-graph, edges correspond to one or several satisfied formulas over the nodes involved.