El ejercicio de los derechos de crédito y la defensa de los bienes

3.4. Supuestos en los que se consiente la legitimación individual de uno de los

3.4.6 El ejercicio de los derechos de crédito y la defensa de los bienes

To learn structure using classification error, we must adopt a strategy of searching through the space of all structures in an efficient manner while avoid- ing local maxima. In this section, we propose a method that can effectively search for better structureswith an explicit focus on classification. We essen- tially need to find a search strategy that can efficiently search through the space of structures. As we have no simple closed-form expression that relates structure with classification error, it would be difficult to design a gradient descent algorithm or a similar iterative method. Even if we did that, a gradient search algorithm would be likely to find a local minimum because of the size of the search space.

First we deﬁne a measure over the space of structures which we want to maximize:

Definition 7.1 Theinverse error measurefor structureSis

inve(S) = 1 p_S(ˆc(X)= C) S p_S(ˆc(X1)= C) , (7.13)

where the summation is over the space of possible structures andpS(ˆc(X)=

C)is the probability of error of the best classiﬁer learned with structureS.

We use Metropolis-Hastings sampling [Metropolis et al., 1953] to generate samples from the inverse error measure, without having to ever compute it for all possible structures. For constructing the Metropolis-Hastings sampling, we deﬁne a neighborhood of a structure as the set of directed acyclic graphs to which we can transit in the next step. Transition is done using a predeﬁned set of possible changes to the structure; at each transition a change consists of a

single edge addition, removal or reversal. We deﬁne the acceptance probability of a candidate structure,SSSnew, to replace a previous structure,SSStas follows:

min 1, inve(Snew) inve(St) 1/T q(St|Snew) q(Snew_|_St₎ =min 1, pt_error pnew error 1/T Nt N N Nnew N N , (7.14)

where q(S|S) is the transition probability from S to S and NNNt and NNNnew

are the sizes of the neighborhoods ofSSSt and SSSnew respectively; this choice

corresponds to equal probability of transition to each member in the neighborhood of a structure. This choice of neighborhood and transition probability creates a Markov chain which is aperiodic and irreducible, thus satisfying the Markov chain Monte Carlo (MCMC) conditions [Madigan and York, 1995]. The algorithm, which we name stochastic structure search (SSS), is presented in Box 7.6.

Box 7.6 (Stochastic Structure Search Algorithm)

Fix the network structure to some initial structure,SSS₀. Estimate the parameters of the structureSSS₀and compute the probability of errorp0_error.

Sett= 0.

Repeat, until a maximum number of iterations is reached (M axIter): – Sample a new structureSSSnew, from the neighborhood ofSSSt

uniformly, with probability1/NNNt.

– Learn the parameters of the new structure using maximum

likelihood estimation. Compute the probability of error of the new classiﬁer,pnew_error.

– AcceptSSSnewwith probability given in Eq.(7.14).

– IfSSSnewis accepted, setSSSt+1=SSSnewandpterror+1 =pnewerrorand

changeT according to the temperature decrease schedule. OtherwiseSSSt+1=SSSt.

– t=t+ 1.

return the structureSSj, such thatj= argmin 0≤j≤M axIter

Classiﬁcation Driven Stochastic Structure Search 145 We add T as a temperature factor in the acceptance probability. Roughly speaking,T close to1would allow acceptance of more structures with higher probability of error than previous structures. T close to0 mostly allows acceptance of structures that improve probability of error. A ﬁxedT amounts to changing the distribution being sampled by the MCMC, while a decreasing

T is a simulated annealing run, aimed at ﬁnding the maximum of the inverse error measures. The rate of decrease of the temperature determines the rate of convergence. Asymptotically in the number of data, a logarithmic decrease of

T guarantees convergence to a global maximum with probability that tends to one [Hajek, 1988].

The SSS algorithm, with a logarithmic cooling scheduleT, can find a structure that is close to minimum probability of error. There are two caveats though. First, the logarithmic cooling schedule is very slow. Second, we never have access to the true probability of error for each structure – we estimate it from a limited pool of training data. To avoid the problem of overfitting we can take several approaches. Cross-validation can be performed by splitting the labeled training set to smaller sets. However, this approach can signifi- cantly slow down the search, and is suitable only if the labeled training set is moderately large. A different approach is to change the the error measure using known bounds on the empirical classification error, which account for model complexity. We describe such an approach in the next section.

5.2 Adding VC Bound Factor to the Empirical Error

In document La administración de la Sociedad de Gananciales (página 39-41)