FOTOGRAFÍA - UNIDAD DIDÁCTICA LLUVIAS DE ESTRELLAS

The gradient descent algorithm is one of the common methods used for the optimization of continuous differentiable functions. The basic idea is to move along the direction of negative gradient of the function at each step until the gradient becomes zero. In the case of constrained optimization where the feasible region is a convex set, the gradient projection method can be used [10]. The gradient descent or projection algorithms require the gradient of the function to be defined in the feasible set. Our objective function is non smooth because it picks the maximum element among the columns of S. In these cases when the objective function is not continuously differentiable, we can use the generalized gradient descent for locally Lipschitz functions. The following result shows that the objective functions that we are dealing with are locally Lipschitz.

Lemma 3.4.1. Let fh : Rd → R be locally Lipschitz functions at x ∈ Rdfor h = {1, . . . , m}

then fmax(x) = max{fh(x)} is locally Lipschitz at x [18].

Proposition 3.4.2. The objective functions f : C ⊂ Rn×n _{→ R for intruder models 1,2,3}

and 4 are locally Lipschitz at each P ∈ C.

Proof. Each element Fk(i, j) is a polynomial in the variables pxy for x, y ∈ {1, . . . , n}.

So, Fk(i, j) is continuously differentiable with respect to the transition probabilities of the

Markov chain. Therefore, the functions sij : C → R are continuously differentiable and

hence locally Lipschitz at any P ∈ C. Applying lemma 3.4.1 on the functions sij proves

So, the objective function for any given intruder model is locally Lipschitz and hence we can define the generalized gradient of f (P ) at each point P .

Definition 3.4.3 (Generalized Gradient). [19_{] The generalized gradient ∂f : R}d _→

B_(Rd_{) of a locally Lipschitz function f : R}d _{→ R is defined as}

∂f (x) = co{ lim

i→∞∇f (xi) : xi → x, xi ∈ T ∪ Ω/ f}, (3.5)

where co denotes convex hull, B(Rd_{) denotes the collection of subsets of R}d_{, Ω}

f ⊂ Rd

denotes the set of points where f is not differentiable and T ⊂ Rd _{is a set of measure zero}

that can be arbitrarily chosen to simplify the computation.

Informally, the generalized gradient of f at x is the convex combination of all the possible limits of the gradient at neighboring points of x where f is differentiable. Using this definition, we can define the gradient for our non-smooth objective function and use it for the gradient projection method. The gradient descent method uses the gradient ∇f where it is defined, since it gives the direction of steepest descent of the function. At the non-smooth points, where we have to consider the generalized gradient, the direction of maximum descent is given by the least norm element in the generalized gradient, denoted by −Ln(∂f ) [19].

In the subsequent discussion of the gradient projection method, we will use the term gradient to refer to the actual gradient of the function when its differentiable and to the generalized gradient at the points where it is not. Let us now define the projection and then discuss the gradient projection method for our objective function.

Definition 3.4.4 (Projection on a Convex Set). Given a point x in Rd_{, the projection}

of point x on a closed convex set X ⊂ R is the point [x]X ∈ X which is at the minimum

distance from x and is the optimal point of the following problem. minimize

z kz − xk

subject to z ∈ X

We will use the notation [x]X for the projection of point x on the set X.

An iteration of the gradient projection method is of the form [10]

where

Pk = [Pk− sk∇f (Pk)]C

The variables αk _{∈ (0, 1] and s}k _{are step sizes. So, the method moves current point}

Pk in the direction of negative gradient by the amount sk∇f (Pk_{) and hence moves in the}

direction of steepest descent. The resultant point may not be a feasible point, so it is projected back onto the set of transition matrices. The vector ¯Pk_{− P}k _{is a feasible vector,}

as in moving along this vector with αk ∈ (0, 1] will keep the point in the feasible set. Proposition 3.4.5. The direction ( ¯Pk_{− P}k_{) given in (}_3.6_{) is a direction of descent for}

the function f (P ).

Proof. By the projection theorem [10], a point x∗ ∈ X is equal to the projection [z]X of

point z if and only if

(z − x∗)>(x − x∗) ≤ 0, ∀x ∈ X

since our function’s domain is matrix, let us denote the dot product of two matrices A and B of same dimensions as hA, Bi = P

i,j(aijbij). Then, since the point ¯P

k _{is the projection}

of Pk_{− s}k_{∇f (P}k_{), using the projection theorem, we get}

hPk_{− s}k_{∇f (P}k_{) − ¯}_Pk_{, P − ¯}_Pk_)i _{≤ 0,} _{∀P ∈ C}

Plugging in P = Pk, we obtain

h∇f (Pk_{), P}k_{− ¯}_Pk_i _{≤ −}1

skhP

k_{− ¯}_Pk_{, P}k_{− ¯}_Pk_i

Since hPk− ¯Pk, Pk− ¯Pki is the norm of the vector Pk_{− ¯}_{P , it is always positive and hence}

the dot product of the direction ( ¯Pk_{− P}k_{) with the gradient is negative, implying that it}

is a direction of descent of the function.

So, taking the projection after moving in the direction of steepest descent and then moving towards the projected point decreases the function. Hence the method will termi- nate if the projected point is the same as the current point, which means that the gradient when projected onto the set is 0.

At each iteration, the resultant point after moving in the direction of gradient may not be a feasible point and hence it needs to be projected onto the set of allowed transition matrices. The following section gives a method to project any point ˜_{P ∈ R}n×n _{onto the}

Projection Onto the Set of Allowed Transition Matrices

The projection onto the convex set can be written as the following optimization problem.

minimize ¯ P kvec( ¯P ) − vec( ˜P )k subject to n X j=1 ¯ pij = 1 for all i ¯ pij ≥ 0 for all i, j ¯ pij = 0 for {i, j} /∈ E.

where vec(P ) vectorizes the matrix P ∈ Rn×n _{to a vector in R}n2_{. Note that}

kvec( ¯P ) − vec( ˜P )k2 =

i=1

k ¯Pi− ˜Pik2

where Pi is the ith row of matrix P . Moreover, the constraints of a row are independent

of the constraints of the other rows. Hence, we can project each row i of the matrix ˜P separately.

Each row i of the transition matrix P should lie in the qi dimensional simplexP_jpij =

1, pij ≥ 0, where qi is the degree of vertex i. So, we can use the following simple algorithm

for the projection of each row of the point onto the simplex [22].

Let vj be the elements of the vector to be projected onto the simplex. In our case they

will be the elements of a row i of the point ˜P such that (i, j) ∈ E. Then the corresponding projected vector wj is given by

wj = max{vj− θ, 0}

where θ is found as follows. Sort v into µ, i.e. µ1 ≥ µ2 ≥ ... ≥ µqi, and evaluate

ρ = maxnj ∈ {1, 2, ...qi} : µj− 1 j j X r=1 µr− 1 > 0 o θ = 1 ρ ρ X r=1 µr− 1.

So, we can project each row i of the matrix onto the qi dimensional simplex using this

algorithm. This method is efficient as compared to the general convex optimization since it employs simple vector operations and its runtime is dominated by sorting the elements of v.

Complexity of Gradient Projection Method: Calculating the gradient of the function at each iteration in gradient descent algorithm is too expensive. If we assume matrix multiplication of two n × n matrices takes O(n3_{) time, then the gradient calculation}

for one element of S with respect to P takes O(n5l) time if the graph has O(n2) edges. That is because we have to take derivative of Fk(i, j) for all i, j ∈ {1, . . . , n} and k ∈ {1, . . . , l}.

So, we have to calculate the derivative of O(n2_{l) elements.} _{Each of these derivatives}

involves matrix multiplication and hence the total runtime becomes O(n5l). For planar graphs, with O(n) edges, the runtime for gradient calculation is O(n4_{l). This is quite}

inefficient and therefore, we look at some gradient free methods to optimize our objective function.

In document UNIDAD DIDÁCTICA LLUVIAS DE ESTRELLAS (página 37-41)