TALLER DE BIBLIOTECAS PARA EDUCACIÓN SECUNDARIA

To implement LRALD based on SA, the algorithm is similar to the algorithms in Chapter 3 and Chapter 4, divided into a preprocessing part performing all operations that do not depend on the actual data, and a data detection part computing an estimate of the transmitted bit, based on the output of the preprocessing part and the received data vector. The specific architecture presented in this thesis is based on the architectural concepts presented in Section 2.3, and is implemented as an extension of the circuit presented in Section 3.3. Furthermore, the architecture is designed to fit into the PHY layer ASIC implementation discussed in Section 2.2.

In the remaining part of this section, a short review of the Cholesky decomposition based Moore- Penrose pseudo inverse from Section 3.3 is provided with the interconnection to the SA based preprocessing. Then the actual architecture of the module performing SA is presented. Finally the architecture of the data detection circuit is discussed.

3_{Note that hard-output LRALD still outperforms soft-output LD as shown in Fig. 5.1.}

4_{Most MIMO-OFDM communication standards specify multiple code rates for channel encoding. The code rate is}

usually adapted by puncturing (i.e., dropping some encoded bits). Hence, even hard-input channel decoders generally have the ability to deal with such punctured bits without the need for additional overhead.

Figure 5.8 – Architecture and OFDM-symbol based clock-gating strategy of the implementation.

5.2.1 Preprocessing Architecture

The matrix preprocessing circuit for LRALD based on SA is composed of a MMSE filter matrix computation module and a subsequent LR module. To this end, the outputs of the architecture presented in Section 3.3, are buffered and aligned before they are input into a LR module. Further, the matrix output of the LR module is realigned with the MMSE filter matrices to perform the basis transformation, as illustrated in Fig. 5.8.

MMSE Filter Matrix

In order to compute the MMSE filter matrix in the original basis, the architecture proposed in Section 3.3 is reused with minor modifications. In addition to the MMSE filter matrix in the original bases also the intermediate results of the filter matrix computation, the matrices G and G#, are used for LRALD. For this purpose, the Gram matrix and the inverted Gram matrix are forwarded to realignment buffers after their computation. These buffers ensure that the Gram

Figure 5.9 – Detailed schematic of a single division-free λ calculation module.

matrix and its corresponding inverted Gram matrix both are available to the LR module during input. Furthermore, the varying number of iterations of SA requires a buffer at each input of the LR module that mitigate the irregular input interval of the LR module. Additionally, the buffers always provide OFDM tones to the LR module that are ready to be processed. The second modification to the architecture proposed in Section 3.3 is an additional block-floating point normalization module after computing the MMSE filter matrix in the original bases. In addition to the Gram matrix and its inverse, the normalized MMSE filter matrix is also forwarded to a realignment and delay buffer.

Lattice Reduction

The actual LR module performs SA based on the Gram matrix G and it’s inverse G#, as described in Section 5.1.2. Contrary to the processing rate of the Gram matrix computation module and the inversion of the Gram matrix, which both have a processing rate independent of the matrices itself, the number of iterations of SA is dependent on G and G#. Therefore, SA has an irregular channel matrix acceptance interval. Nevertheless, the average processing rates of the Gram matrix computation and inversion modules and the LR module are matched. To mitigate the effect of the varying processing rate of SA, the LR module has FIFOs at the input and the output. The FIFOs increase the utilization of the LR module and also align the Gram matrix G with the corresponding inverted Gram matrix G#of the same OFDM tone.

Figure 5.10 – Detailed schematic of a single ∆-calculation module. Showing the reduced complexity multiplier.

The architecture for one LR iteration is composed of three sub-blocks. The initial module computes the λ values. Then, the ∆i, j calculation and the index selection step described in Section 5.1.2 are processed in the second module. Finally, the third module completes an LR iteration by updating the matrices.

As described in Section 2.3, each module computes its tasks on a channel matrix corresponding to one OFDM tone. Hence, up to three channel matrices (i.e., OFDM tones) are processed in parallel with a pipeline-interleaved processing scheme in the SA module. After the three sub-blocks, the processed channel matrices are either fed forward to the equalizer cache or fed backward to the first sub-block of the LR-pipeline for another LR iteration.

λ Calculation: The first LR-iteration sub-module computes all possible candidate unit-update values λi, j according to (5.13) and (5.14). A detailed block diagram of one instance of the λ calculation processing element is shown in Fig. 5.9. Exploiting the unit lambda updates, allows a division-free implementation (highlighted for the real part). Depending on the HR (i.e., the number of clock cycles available for the module), multiple instances of the λ calculation PEs are instantiated in parallel to compute multiple candidates in one cycle.

∆ Calculation and Index Selection: The computed candidate update values are fed into the subsequent delta calculation and index selection sub-module, where all possible improvements∆i, j

Figure 5.11 – Detailed schematic of a detection module with final puncturing feature.

calculation module is shown in Fig. 5.10. The two complex-valued multipliers on the left side can be implemented each as two conditional adders or subtractors respectively, due to the unit lambda constraint. After the computation of the∆i, j-values, a greedy selection on all∆i, jis performed. Thereby, the locally optimal index pair {s, t} resulting in the largest reduction of Seysen’s metric is chosen. The indices and the selected candidate update value λs,t are then forwarded to the matrix update module.

Matrix Updates: The actual LR step is performed by adding an integer multiple (i.e., with the update value λs,t) of column gtto the column gsaccording to (5.9) and (5.10). Furthermore, the

transformation matrices T and T#are updated according to (5.11). This completes one iteration of SA-based LR.

LR Termination: Once the maximum improvement∆s,ton Seysen’s metric is zero or if the specified iteration limit defined in Section 5.1.3 is reached, the LR for the specific channel matrix is terminated and the transformation matrices T and T#are forwarded to the output of the LR module. Otherwise, another iteration is computed by feeding back the transformation matrices and the updated matrices G and G#to the first sub-module of the LR module. At the output of the LR module the matrix T is directly forwarded to the correct location in the equalizer cache, while the matrix T# is forwarded to a reordering buffer. This reordering buffer is required to realign the matrices for the remaining operations of calculating the MMSE filter matrix within the

transformed basis. Since the number of iterations within SA is not constant, and hence OFDM tones can overtake each other in the LR stage, the computed matrices also have to be sorted. Note that the fixed iteration limit, discussed in Section 5.1.3, also defines the maximum necessary length of the reordering buffer.

Basis Transformation: After realignment, the matrices are fed into another matrix-matrix multiplication block, where the MMSE estimation matrix is multiplied with the dual transformation matrix T#to transform the estimation matrix into the new basis B. The elements of the resulting matrix W are then scaled and quantized according to the block-floating-point format. Finally, the matrices are fed into the equalization cache. This cache between the preprocessing circuit and the MIMO detector has to operate in both modes – the preprocessing and the detection mode.

5.2.2 Detection Architecture

The detector is much less complex than the preprocessing part, and operates at the symbol rate, that is given by the number of OFDM data tones and the OFDM symbol duration. In Fig. 5.11 it can be seen that the first processing step within the detector is to equalize the receive symbol vector with the equalization matrix W= T#G#HH. This equalization is performed in the reduced lattice basis B. Then, the elements of the resulting vector are quantized to the nearest integer value. In this step, the per matrix computed block-floating-point exponent is compensated as well. The next processing step in the detector is to remap the detected symbols in the transformed basis to the original lattice basis. Thereby, an estimate ˆs in the original lattice basis H is obtained by applying a matrix-vector multiplication with the matrix T according to

ˆs= Tˆx. (5.15)

The final processing step in the detector is to demap the estimated vector ˆs. To this end, the demapper checks whether the estimated lattice points are valid constellation points for the given QAM constellation map and, if required, applies puncturing (i.e., delivers LLRs of zero) for symbols with detected, but non-modulated lattice points, as described in Section 5.1.3.

In document Actas de las II Jornadas sobre Bibliotecas Escolares de Extremadura (página 185-189)