Estudio funcional de la proteína hnRNPK - Estudio funcional de proteínas que interaccionan con

3. Estudio funcional de proteínas que interaccionan con elementos del RaV

3.3 Estudio funcional de la proteína hnRNPK

In this section, we build on the work by Golub and van Loan (1996) an propose an iterative approach for solving the optimization problem in Eq. (2.12) that has the quadratic runtime cost per iteration (with respect to the number of instances). The approach is based on the conjugate gradient descent method (Golub and van Loan, 1996) for solving linear systems of equations defined with symmetric and positive definite matrices. First, we describe (in Section 2.5.1.1) a procedure for an approximate computation of the smallest value of the Lagrange multiplier satisfying the stationary constraints from Eq. (2.16). The procedure is based on the conjugate gradient descent method and has the quadratic runtime cost in the number of instances. For the optimal value of the Lagrange multiplier, the optimal solution to problem in Eq. (2.12) is the solution of the following linear system

(S − µminI) z = b . (2.24)

As discussed in Section 2.4, the matrix S is symmetric and µmin< σn≤ σn−1≤ ··· ≤ σ1. From

here it then follows that the matrix P = (S − µminI) is symmetric and positive definite. Hence, we can apply the conjugate gradient descent method (Section 10.2, Golub and van Loan, 1996) to iteratively solve this system with the quadratic cost per iteration. In Section 2.5.1.2, we provide a brief review of this method and present a theoretical guarantee on the quality of the solution obtained in this way. In our review of the approach, we follow closely the exposition by Golub and van Loan (Chapter 10, 1996).

2.5.1.1 Iterative Computation of the Lagrange Multiplier

In this section, we propose a mean to approximate the optimal Lagrange multiplier (defining the linear system in Eq. 2.24) in large scale problems. In order to compute the multiplier, we first need to derive the open interval containing this root of the secular equation. As shown in Section 2.4.2, the optimal multiplier lies in the open interval determined by the smallest

2.5 Large Scale Approximations 31 eigenvalue of the matrix S. To obtain the smallest eigenvalue of the matrix S, we propose to use the power iteration algorithm (Golub and van Loan, 1996) which has the quadratic runtime cost per iteration. However, as we need the smallest eigenvalue and the power iteration algorithm computes the largest one, we apply the algorithm to the matrix −S.

Having computed the smallest eigenvalue of the matrix S, we have determined the interval of the secular root corresponding to the optimal Lagrange multiplier. In order to compute this multiplier we form a slightly different version of the secular equation,

g (µ) = z>(S − µI)−2z− R2.

In our empirical evaluations (Section 2.9), the iterative algorithm described in Section 2.4.3 proved to be very fast and always converged in few iterations to machine precision. To apply this algorithm with the conjugate gradient descent method and without an eigendecompo- sition of S, we need to be able to derive the coefficients, ptand qt(t > 0), of the surrogate

quadratic function (see Section 2.4.3). For this, we need to be able to evaluate the secular equation and its derivative at any iteration. The first is simple to achieve using the conjugate gradient descent algorithm from the previous section. In particular, for the derivative of the secular equation at an estimate µ_t of µminwe have

g0(µ_t) = 2z>_µ_t(S − µtI)−1zµt ,

where z_µ_t is the solution of the linear system P_µ_tz = b with P_µ_t = S − µtI, obtained using the conjugate gradient descent method. Thus, by applying the conjugate gradient descent method one more time to solve the linear system Pµtˆz = zµt, one obtains the gradient of the secular equation at µ_t. The described procedure has quadratic runtime complexity stemming from the cost per iteration of the conjugate gradient descent method. Hence, for low-rank kernel matrices (or matrices with a fast decaying spectrum) we can use this approach to compute an approximation of the optimal multiplier for problem (2.12) in O(n2_{) time.}

2.5.1.2 Conjugate Gradient Descent

This section reviews the conjugate gradient descent approach (Chapter 10, Golub and van Loan, 1996) in the context of Section 2.4 and the optimization problem in Eq. (2.24). The approach is based on the observation that solving the linear system, P z = b, is equivalent to minimizing the quadratic form

Φ(z) =1

2z>P z− b>z .

The fact that P is a symmetric and positive definite matrix implies that the minimal value of Φ (z) is attained by setting z = P−1_{b. Thus, the simplest iterative method for solving the}

linear system in Eq. (2.24) is the gradient descent approach. The negative gradient of the quadratic form at the step t is given by the residual at that step, i.e.,

rt= b − P zt = −∇Φ (zt) .

If the residual vector is non-zero then there exists a positive constant τ ∈ R+ _{such that}

z_t+1 = z_t+ τr_t and Φ (z_t+1) < Φ (z_t). While simple and easy to implement, the gradient descent method can be inefficient when the condition number κ(P ) =σ1−µmin/σn−µminis large.

To avoid this issue, the conjugate gradient descent method minimizes the quadratic form Φ (z) along a set of linearly independent directions {gi}ti=1 that do not necessarily

correspond to residuals {ri}ti=1, with t = 1,2,...,n. The convergence is guaranteed in at most

n steps because that is the dimension of the problem and a solution can be written as a

linear combination of at most n linearly independent vectors. Similar to Golub and van Loan (1996), let us first consider the choice of a direction g_t. For this purpose, let us now take (we subsequently show that this can always be done)

z_t= z0+ Gt−1ξ + τgt,

where G_t−1 is a matrix with columns {gi}t−1i=1, ξ ∈ Rt−1, and τ ∈ R. Then, we have that

Φ(z_t) = Φ (z₀+ G_t−1ξ + τgt) = Φ(z₀+ G_t−1ξ) + τξ>G>_t−1P g_t+τ2 2 gt>P gt+ τgt>(P z0− b) = Φ(z₀+ G_t−1ξ) + τξ>G>_t−1P gt+τ 2 2 gt>P gt− τgt>r0.

If g_t ⊥ span({P g1, . . . , P gt−1}) then ξ>G>t−1P gt = 0 and the search for zt splits into two

independent optimization problems, min z∈ z0+span({g1,...,gt}) Φ(z) = min ξ∈Rt−1_{, τ∈R} Φ(z0+ Gt−1ξ + τgt) = argmin ξ∈Rt−1_{, τ∈R} Φ(z₀+ G_t−1ξ) +τ₂2g_t>P g_t− τgt>r0= min ξ∈Rt−1 Φ(z0+ Gt−1ξ) + min_τ∈R τ2 2 gt>P gt− τgt>r0 ! .

From here it then follows that the solution to the first optimization problem minimizes the quadratic form over z0+ span({g1, . . . , gt−1}). On the other hand, the optimal solution to the

second problem is τ_t= gt>r0

gt>P gt. Moreover, the fact that gt⊥ span({P g1, . . . , P gt−1}) implies

g_t>r_t−1= −gt>(P zt−1− b) = −gt>(P z0+ P Gt−1ξ− b) = gt>r0.

Thus, direction g_t should be chosen so that g_t ⊥ span{P g1, . . . , P gt−1} and gt>rt−1 , 0. In Golub and van Loan (Section 10.2, 1996), the authors show that such conjugate directions can always be selected by setting

gt= rt−1+ πtgt−1.

Multiplying the latter equation with g>

t−1P from the left and using the fact that the vectors

P g_t−1and gtare mutually orthogonal we obtain that

πt = −g > t−1P rt−1

g_t−1> P g_t−1 .

Hence, the conjugate gradient descent can be performed by setting

z_t= z_t−1+ τ_tg_t = z_t−1+ gt>r0 g_t>P g_t(rt−1+ πtgt−1) = zt−1+ g_t>r_t−1 g_t>P g_t rt−1− g_t−1> P r_t−1 g_t−1> P g_t−1gt−1 ! .

The conjugate gradient descent iteration in this form requires three matrix-vector multiplica- tions. This is computationally inefficient and it can be improved by observing that

2.5 Large Scale Approximations 33 From here it then follows that

krt−1k2= rt−1> rt−1= rt−1> rt−2− τt−1rt−1> P gt−1.

Noting that r>

t−1rt−2= 0 (e.g., see Theorem 10.2.3 in Golub and van Loan, 1996) we get

krt−1k2= −τt−1rt−1> P gt−1.

On the other hand, from the definition of τ_t−1it follows that

g_t−1> r_t−2= g_t−1> r0= τt−1gt−1> P gt−1.

The latter expression implies that we can express π_tas

π_t = krt−1k2

g_t−1> r_t−2 .

Hence, we can now give a conjugate gradient descent iteration that requires only one matrix- vector multiplication, zt = zt−1+g > t rt−1 g_t>P gt rt−1 + krt−1k2 g_t−1> r_t−2gt−1 ! .

Having given an iterative solution that requires a single matrix-vector multiplication and, thus, has the quadratic runtime cost per iteration, we now review the theoretical properties of the method. First, we present a worst case bound on the approximation error of the approach expressed in terms of the number of iterations and condition number of the matrix defining the linear system in Eq. (2.24).

Theorem 2.2. (Luenberger, 1973) AssumeP ∈ Rn×nis a symmetric and positive definite matrix andb∈ Rn_{. If the conjugate gradient descent method produces iterates}_{z

i} and κ = κ (P ) then kz∗− ztkP ≤ 2 √ κ− 1 √ κ + 1 !t kz∗− z0kP , wherez∗= P−1_{b and}_kzk2 P = z>P z.

Corollary 2.3. The approximation error of the conjugate gradient descent method satisfies zt− P−1b ≤ 2 √ κ √ κ− 1 √ κ + 1 !t z0− P−1b .

Proof. This corollary is formulated as a self-study problem in Golub and van Loan (Problem 10.2.8, 1996). In order to show this claim, let us first observe that

zt− P−1b 2 P =zt− P−1b > Pz_t− P−1b = P 1_/2_z t− P−1b 2 .

For the resulting expression, using the properties of the operator norm, we obtain √_σ n− µmin zt− P−1b ≤ P 1_/2_z t− P−1b ≤ √_σ 1− µmin zt− P−1b . Hence, from Theorem 2.2 and the latter inequality it follows that

√ σn− µmin zt− P−1b ≤ kz∗− ztkP ≤ 2 √ κ− 1 √ κ + 1 !t kz∗− z0kP ≤ 2√σ1− µmin √ κ− 1 √ κ + 1 !t z0− P−1b .

From these two bounds, we conclude that the conjugate gradient descent method converges fast, i.e., in a small number iterations, for well-conditioned matrices. Thus, for knowledge-based kernel principal component analysis with a well-conditioned matrix P the approach can provide an efficient approximation of the optimal solution for the optimization of a quadratic form over a hypersphere of constant radius (described in Section 2.4). Beside these two results, Golub and van Loan (1996) give an upper bound on the number of required iterations for matrices that can be written as a sum of the identity and a low-rank matrix. The following theorem states that result more formally.

Theorem 2.4. (Golub and van Loan, 1996) Assume thatP = I + P ∈ Rn×nis a symmetric and positive definite matrix andrankP = r. Then, the conjugate gradient descent method converges in at mostr + 1 steps.

Thus, for low-rank kernel matrices the conjugate gradient descent method can provide an effective approximation of the optimal solution defining the knowledge-based kernel principal components. Having reviewed this approach and theoretical results giving insights into its effectiveness, we proceed to the next section where we derive knowledge-based kernel principal components using an approximate low-rank factorization of a kernel matrix.

In document Identificación y estudio de las interacciones virus-célula del vesivirus de conejo (RaV) (página 112-178)