• No se han encontrado resultados

CAPÍTULO IV: MARCO PROPOSITIVO

4. PROPUESTA DE CREACION DE LA UNIDAD DE ASESORAMIENTO

4.16. DESCRIPCIÓN DEL SERVICIO DE IMPORTACIÓN

4.16.3. Registro de importador

SLAM on

FPGA Volumetric

Graph-Based SLAM

Combination of sparse and non-sparse

Linear Solver Cholesky Decomposition LU triangular decomposition Conjugate gradient

The choice of using a(partially) sparse matrix has a lot of impact on the way a linear system containing this matrix is solved. All the systems have in common that the size of the matrix has influence on the time to solve the system. Several algorithms allow for solving systems containing a sparse matrix. In this section reasonable techniques are analysed to find an appropriate method to solve the linear system.

9.3.2.1 Cholesky decomposition

Cholesky decomposition is a matrix-decomposition method which can potentially be used to solve a linear problem with a sparse matrix. Decomposition is a method to find solutions of linear systems without having to invert the matrix or applying row reduction. Cholesky decomposition of a matrix is only possible whenever a matrix is positive definite which means that the equation~zTM~zis positive for every non-zero vector~z. TheH-matrix has this property and is therefore suitable for Cholesky decomposition. The first step of solving a system with a decomposition method is finding two matrices from which the product is the decomposed matrix. With this resulting matrices, new equations can be formed which can be solved easily because the matrix in these equations is a triangular matrix. In Cholesky the decomposition of a matrix is given by:

A = LLT

In this equation L is a lower triangular matrix, which means all the items above and right of the diagonal are zero. The lower diagonal matrix can be found by working from left to right and top to bottom and can be found by square roots and subtractions. The standard equation

A~x=~bcan be rewritten as:

LLT~x=b

Another vector (~z) can be introduced as a temporary vector, it replaces LT~x and can be used to which can be found by forward substitution:

L~z=~b

Forward substitution means that the equation will be solved from top to bottom, which is a trivial operation since L is a lower triangular matrix. Every row only contains one unknown. The value for~zhas now been found and the value forxcan be solved by backward substitution, which is substitution from the bottom to the top:

LT~x=~z

The Cholesky decomposition will be much faster than normal row reduction, because only the non-zero items are taken into account. However, the resulting triangular matrix will end up having one or more non-sparse vectors. This means that the matrix representation needs to be changed in order to support the full version of Cholesky decomposition, which makes implementation in hardware a challenge.

9.3.2.2 LU decomposition

Another decomposition method is LU decomposition, this is also a decomposition method for which the matrix will be decomposed into a Lower and an Upper triangular matrix. However, in Cholesky decomposition, the upper and lower matrix are each others transpose. In LU decomposition, the upper and lower triangular and upper triangular matrix are unique which means the decomposition will be different. The upper and lower triangular matrices which are also called the factors are harder to find because linear equations still need to be solved. However, the linear equations become simple because the matrix that is decomposed is sparse and contains few items.

Once the decomposition is found, the steps to find the actual values for the x vector are the same. First the matrix is decomposed using linear equations which have a maximum fixed amount of unknown variables:

A = LU

The the matrix can be seen as the product of the upper and lower triangular matrix:

LU~x=~b

The U x can be substituted with a temporary vector~z which can be found by triangular sub- stitution, just as~x can be found by solving it using~z:

L~z=~b

U~x=~z

Although LU decomposition does give different resulting matrices than Cholesky decom- position, the matrix also contains vectors that have more non-zero items than the static size sparse matrices support.

9.3.2.3 Conjugate gradient algorithm

Instead of using decomposition, it is also possible to solve sparse systems with iterative algo- rithms. The property that is required from the solver method is that unlike the decomposition methods, there should be no matrices involved that are dense or have dense vectors. The con- jugate gradient algorithm is an algorithm in which the error is determined based on the matrix, the given vector, and the wanted vector. This determined error is in vector form and can be used to manipulate the wanted vector based on a direction found with the error vector.

The conjugate gradient algorithm is an iterative algorithm which will approach the wanted vector by decreasing the total error. The algorithm is shown in 3. An initial ~x0 is chosen to

calculate an error that needs to be converged. The error that is calculated is called the residual vector ~r. The initial search direction is set to the initial residual vector which is defined by

~

p0 = ~r0 = ~b − A~x0. A scalar α is calculated by the fractional of the squared sum of the

errors and the Amatrix scaled with the search direction vector. It can be seen that the result of A~pk is used two times, and is the only sparse matrix to non sparse vector reduction. The

rest of the vector calculations are done with non sparse vectors because the resulting vectors are not sparse anyway. The state vector is altered by the found scalar α times the direction vector. The calculated β scalar is calculated by the fractional of the new and current squared errors and is used to alter the search direction. Finally the search direction is altered in step 10.

Algorithm 3 Conjugate gradient algorithm

1: initialize:

2: ~r0 =~b−A~x0

3: ~p0=~r0

4: k= 0

5: repeat until convergence of r:

6: αk= ~ rkT~rk ~ pTkA~pk 7: ~xk+1=~xk+αk~pk 8: ~rk+1=~rk−αkA~pk 9: βk= ~ rkT+1~rk+1 ~rT k~rk 10: ~pk+1=~rk+1−β~pk 11: k=k+ 1

The computational load of the algorithm can be analysed by looking at the amount of oper- ations needed for each iteration. The operation A~x in step 2 is a matrix vector multiplication which is a heavy computation for large matrices. The complexity of this particular operation will become less because matrix Ais a sparse matrix. However, it still requiresn dot products of a sparse vector with a non-sparse vector where n is the system size. The division in step 6 is a normal division that is operated on two numbers instead of on a vector or matrix, which is a relatively light operation compared to vector operations. Table 9.1 shows the resources used

division scaleNSV dotNSV addNSV subNSV SMNSV

step 2 1 1

step 6 1 2, (1 shared, id=0) 1, (1 shared, id = 1)

step 7 1 1

step 8 1 1 1, (1 shared, id = 1)

step 9 1 2, (1 shared, id = 0)

step 10 1 1

total 1 iteration 2 3 3 1 3 2

total i iterations 2*i 3*i 3*i 2*i 1+2*i 1+i

Table 9.1: A resource overview for the parallel conjugate gradient algorithm

scaleNSV dotNSV addNSV subNSV SMNSV

adders n-1 n n P i=0 (mi−1) subtracters n multipliers n n n P i=0 mi

Table 9.2: Low-level cost of the high-level functions

to execute the conjugate gradient algorithm in terms of operations. The amount of cycles will determine the total amount of operations that the algorithm needs to perform. On an FPGA the number of resources is limited, the resources used to perform operations like additions and multiplications are dedicated digital signal processors (DSPs) and look-up tables (LUTs) and performing large vector operations with operators created in LUTs will make the FPGA run out of resources very quickly. A possible solution to this problem will be discussed in the next level of the design space exploration.

Table 9.2 shows the hardware cost of the used vector operations in expressions of low level components, wherenis the system size. Only the sparse matrix non-sparse vector multiplication has the variable m which is the amount of active items in the each vector. When mi =mi+1

for each value ofi, which means the number of non-zero values in each vector is the same, the total number of adders becomesn∗(m−1) and the number of multipliers becomesn∗mfor the complete matrix vector multiplication. In practice this is the case because the number of active elements in the sparse vectors has been fixed at 5 items. The items are not checked before they are used for calculations so they will be used inside the multipliers and adders.

9.3.2.4 Conclusions on the linear solver

The conjugate gradient algorithm will be implemented to solve the linear systems in hardware for graph-based SLAM. The conjugate gradient algorithm does not alter the sparse matrix and therefore no additional memory is required to store multiple matrices which is the case in the proposed decomposition methods. Because the conjugate gradient algorithm does not change the sparse matrix by making it larger of creating other matrices, and consists of only vector operations, the conjugate gradient algorithm seems like the most promising method for solving linear systems on hardware consisting of sparse matrices with static memory.

9.3.3 Joining sparse and non sparse vectors into single vector operations

Documento similar