BRUJOS Y APRENDICES: LAS CIENCIAS NATURALES

This appendix contains the derivation of the dual characterization of the W-step referred to in Section 6.0.2. Problems W and H are almost entirely symmetric in our setup, so we will switch to a neutral notation and later apply the result to both problems. Consider the NMF problem as the following approximation:

BC ≈D

Symmetry will allow us to focus solely on Problem ‘B’ which will be solved using the objective function

∆Sq0(BC,D) +λ∆S_q(B,B₀).

The theory of Chapter 3 was developed for vectors, not matrices, so we rewrite the preceding line indicating for convenience how the terms map to the original notation. ∆Sq0( (C⊗I_N | {z } 0_A0 ) vecB | {z } 0_p0 , vecD | {z } 0_b0 ) +λ∆Sq(vecB, vecB0 | {z } 0_p0 0 )

This identification makes the dual objective function (with argument vecD∗ ) straightforward to calculate using Fenchel duality, the conjugation formula for Bregman divergence, and the conjugate calculus rules to handle λ in succession.

The dual objective is:

λS_q∗(λ−1(CT⊗IN) vecD∗+B0∗) +S

∗

q0(vecD∗−D₀∗),

whereB₀∗ is the image of the initial guessB0 under the gradient ofSq, and where

D∗₀ is the image of the data matrixDunder the gradient ofSq0. Given an optimal

solution for this problem, ¯D∗, we can use the recovery formula to obtain: ¯

B =∇S_q∗( (CT⊗IN) vec ¯D∗ +B0∗),

which is perhaps easier to parse without the vec operations: ¯

B =s0_q∗[ ¯D∗C+B₀∗].

For Problem W, substitution ofW forB ,HforC andV forD gives the result in the text.

As an aside, the symmetry of Problem H is nearly complete if we consider the approximation problem HT_WT _≈ _VT_{. By substituting} _HvT _for _B_{, etc.,}

the same argument reveals the solution to the H-step as another q-exponential namely:

H = exp_qH[WTV¯∗+H₀∗].

Finally, the dual problem has the choice variable V∗, which has the same shape as V, much larger than W or H. This makes the dual a poor choice of problem to solve, in comparison to, say, the classic maxent problem.

Induced Semantics for

Undirected Graphs

7.1 Another Look at the Hammersley-Clifford

Theorem

In this chapter the aim is to utilize the entropy functions introduced earlier in the framework of graphical models. The Hammersley-Clifford (H-C) theorem relates the factorization properties of a probability distribution to the clique structure of an undirected graph. If a density factorizes according to the clique structure of an undirected graph, the theorem guarantees that the distribution satisfies the Markov property and vice versa. In graphical models, connections to the exponential family are strong, mainly because of the factorization property of the exponential function. Do we have to forgo graphical models when we step away from SBG entropy? Not entirely. Here we generalize the H-C theorem to different notions of decomposability and the corresponding generalized-Markov property. Finally we discuss how our technique might be used to arrive at other generaliza- tions of the H-C theorem, inducing a graph semantics adapted to the modeling problem. This represents a first step in incoporating generalized entropies in the setting of graphical models.

Statistical distributions of the q-exponential form can be motivated by a gen- eralization of maximum entropy termed Tsallis entropy. This is one possible gen- eralization of Shannon-Boltzmann-Gibbs (SBG) entropy. An important property of Tsallis entropy is its capability of generating distributions with power-law be- haviour and distributions with finite support. One of these is the q-Gaussian distribution. Its density, assuming it is centered around the origin is

f(x) = exp_q(−βx2 −αq(β)),

where exp_q is the q-exponential, defined later in the section on q-analogues. For now, we just note that the usual Gaussian is recovered when q→1. See Naudts

(2004b); Vignat and Plastino (2006); Sears and Vishwanathan (2007); Gell-Mann and Tsallis (2004) for more discussion and references along these lines. Here, our main concern is to judge the compatibility of distributions of the form exp_q(·) with the tools and techniques of graphical models. We are motivated by the possibility of using joint distributions of q-exponential form to model situations where a collection of random variables correlated to a greater degree (or less) under dramatic circumstances than they do in an everyday environment.

If we have N independent, zero-mean, Gaussian random variables (therefore

q= 1), the joint density is

N Y i=1 exp(−βix2i −αi) = exp(− N X i=1 (βix2i)−( N X i=1 αi)).

To have a corresponding formula for the q-exponentials, an operation called the

q-product(⊗q) which satisfies

exp_q(x1)⊗qexpq(x2) ⊗q . . . ⊗q expq(xN) = expq( N

i=1

xi)

must be introduced. The q-product has been studied elsewhere (Suyari et al., 2005; Umarov et al., 2006; Borges, 2004). It leads to a definition ofq-independence that says thatX1 and X2 areq-independent if their joint density,f,q-factorizes,

i.e. if for some g and h

f(x1, x2) =g(x1)⊗qh(x2).

Note when q 6= 1, g and h are not the marginals of f. Even so the q-product can be used to create joint distributions from univariate distributions. Recently, a central limit theorem for q-independent variables was proved (Umarov et al., 2006). Bayesian updating of such distributions is a different matter.

Introducing the idea of q-independence to the setting of graphical models in- duces a different semantics for the edges of a graph. It is possible to demonstrate this by formulating aq-Markov condition and prove a version of the Hammersley- Clifford theorem that says that a distribution is q-Markov with respect to the graph in question, if and only if it q-factorizes over the maximal cliques of that graph. We will conclude with a brief discussion of how to use this type of graphical model to perform typical tasks involving graphical models, focusing on a corresponding extension of the Viterbi algorithm.

In document HISTORIA DEL SIGLO XX (página 116-149)