R.A Las Villuercas - PROPUESTAS DESARROLLADAS

PROPUESTAS DESARROLLADAS

C. R.A Las Villuercas

While three QR decomposition algorithms are used in software-based solutions, only two of those are, according to the open literature, implemented into FPGA or ASIC based MIMO communication systems. The first prominent QR decomposition algorithm is based on Givens rotations used by [LBH+07, Lü10, PSG09] and shown in Alg. 2. The second QR decomposition algorithm used as preprocessing circuit in MIMO communication systems is the MGS algorithm, shown in Alg. 3 and used by [SPB07, SBST08, CLL+10].

For both – the Givens rotation based and the MGS based QR decomposition algorithms – two fundamentally different architectures have been presented. A first class of implementations perform extensive time-sharing of all computational components resulting in processor-like

MIMO-OFDM systems deploying low-latency pipelined FFTs, which deliver the OFDM tones in a serial fashion, such as the PHY layer ASIC elaborated in Section 2.2.

Selection of a QR Decomposition Algorithms for Soft-Output MMSE Detection

Despite the fact that in the literature only two algorithms are used for the VLSI implementation of QR decompositions, we are evaluating all three QR decomposition algorithms for the implementation of the preprocessing circuit of a MIMO detector in an IEEE 802.11n compliant receiver. In order to use the QR decomposition of the augmented channel matrices H as preprocessing circuit of a soft-output MMSE detector, the computation of the matrix Q(a)and the matrix R is required for the MMSE estimation. Additionally, the matrix Q(b)should be computed in order to easily compute the per-stream post-equalization SINR. The computation of the matrices Q(c)and Q(d) on the other hand is not required by the linear MMSE detector and should be omitted, whenever possible. Hence, a QR decomposition algorithm calculating only the used matrix components would be preferred.

From the three main strategies for computation of the QR decomposition, the householder reflection based QR decomposition seems the least suitable to be implemented in VLSI circuits for computing the QR decomposition of the augmented channel matrix H, due to the large number of required floating point operations (flops) [GV96] for non-square matrices. The other two algorithms are based on Givens rotations or on MGS orthogonalization steps and are more suited for non-square matrices, such as the augmented channel matrix H. Therefore, these two algorithms are analyzed in terms of their capability to compute the required sub-matrices exclusively.

The Givens-rotation based QR decomposition algorithm, given in Alg. 2, applies a sequence of plane rotations to the matrix H until the upper-triangular structure of R is achieved. The same rotations are also applied to an identity matrix to compute the matrix QH. Due to the use of unitary transformations as atomic operations, the Givens rotation based QR decomposition algorithm has a limited dynamic range. This enables the use of strictly limited fixed-point values for the computation of the QR decomposition in VLSI implementations. Furthermore, Givens rotations can be efficiently implemented in VLSI using the CORDIC algorithm. The CORDIC

Algorithm 4 VLSI Implementation Friendly MGS-Based QR Decomposition for Back- Substitution Based Soft-Output MMSE Detection

1: Q ← [H; σ2I]T, R ← 0, η ← 0, N ← 0 2: for i= 1 to Nssdo . MGS QR decomposition 3: s2= ¯qH_i ¯qi 4: _r1 i,i = 1 √ s2 5: ¯qi= ¯qi_R1_i,i 6: for k= i + 1 to Nssdo 7: ri,k = ¯qHi ¯qk 8: end for 9: for k= i + 1 to Nssdo 10: ¯qk = ¯qk− ri,k¯qi 11: end for 12: end for 13: η = 1 diag √ Q(b)H_Q(b) − 1 . SINR Computation

algorithm is implemented in VLSI systems using only adders and shifter to iteratively compute Givens rotations. Unfortunately, an economy sized version of the Givens rotation based QR decomposition algorithm for an augmented channel matrix that delivers only the required matrix elements Q(a)and Q(b) of the full matrix Q is not available. If an economy sized version of the Givens rotation based QR decomposition is implemented, then the implementation only computes the matrices R, Q(a), and Q(c). While the inherently computed matrix Q(c)is of no use for the soft-output MMSE detection, the matrix Q(b), which would be required for the efficient computation of the per-stream post-equalization SINR, is not output by the economy sized Givens rotation based QR decomposition. Therefore, a full QR decomposition computing the entire matrix Q would be required if the Givens rotation based algorithm, given in Alg. 2, is used as preprocessing circuit of the soft-output MMSE MIMO detector.

The other commonly used algorithm to compute the QR decomposition of a non-square matrix suited for VLSI implementation is the Gram-Schmidt orthogonalization steps based QR decomposition algorithm given in Alg. 3. While the original Gram-Schmidt QR decomposition algorithm had numerical issues, the MGS QR decomposition is numerically stable. In Alg. 4, an extension of the MGS based QR decomposition is shown that also computes the per-stream post-equalization SINR. Some other modifications to Alg. 3 enable the efficient, division free VLSI implementation of the soft-output MMSE detector. The MGS algorithm given in Alg. 4 performs one Gram-Schmidt orthogonalization step in each iteration of the main loop from line 2 to line 12. Thereby, the matrix H is orthogonalized column by column, resulting in the matrix [Q(a)Q(b)]T. Due to the moderate to large dynamic range of the column norm and the need for the square root of its inverse, the MGS algorithm is known to require larger internal bit-widths than Givens-rotation based algorithms. However, in contrast to the Givens-rotations based algorithms, the economy sized implementation, already given in Alg. 4, computes only the required matrices R, Q(a), and Q(b). Hence, the MGS algorithm is well suited for our specific

In document Actas de las II Jornadas sobre Bibliotecas Escolares de Extremadura (página 136-140)