C UANDO C RISTO ANDABA DE MILAGROS : LAINNOVACIÓNDELMITOCOLONIAL

Although they were already defined as operations, the processor was not able to perform trigonometric calculations because no EU had been included to handle them. The new Sin- Cos execution unit was built to fill this gap. As the floating point IP core used for all other operations [Xil17c] did not include any trigonometric ones, the coordinate rotational digital computer (CORDIC) IP core [Xil17b] was chosen to fulfill this rule. It allows calculating the sine and cosine using an iterative algorithm with a very high amount of precision of up to 48 bits (configurable).

The CORDIC uses fixed-point numbers for input and output, so conversions from floating to fixed point on the input side and from fixed to floating point on the output side were added to the EU (shown in Figure 4.1 for the maximum precision). These conversions were implemented using the standard floating point IP core [Xil17c]. Fixed-point numbers are divided into an integer part and a fractional part that can have their widths tuned to the range of numbers they should be able to hold. For the CORDIC, the integer part is always 3 bits wide on the input and 2 bits wide on the output regardless of the precision of the calculation. Both sides include a sign bit as for the sine and cosine operations both the input and output can be negative.

Float-to-fixed CORDIC Fixed-to-float 48 bits precision in1 64-bit float 3/45-bit fixed Sine Cosine in2 Sine/cosine select 2/46-bit fixed 64-bit float out

Figure 4.1: Data flow inside the SinCos execution unit highlighting the data type conversions.

Fixed number precision is indicated by the width of the integer part (including a sign bit) and the width of the fractional part separated by a slash. This illustration assumes the maximum

of 48 bits of precision for the CORDIC.

This structure is the characteristic difference to floating-point numbers that do not store an integer part at all, but instead include an exponent field that specifies to which power of two the fractional part should be taken. Their precision is fixed and equal to the width of the fractional part plus 1 bit (i.e. 53 bits for double precision). Although this does allow them to represent a much larger range of values without sacrificing precision, calculations with floating-point numbers are much more time- and resource-intensive2_{, which is why the CORDIC algorithm}

is usually implemented with fixed-point calculations [Ort+_03].

Both sine and cosine are calculated at the same time by the algorithm. However, the ViSARD can currently only write back one result per clock cycle. Which value should be considered the output of the EU is given in the first bit of the second input operand that would otherwise be unused. As this information is relevant in the clock cycle the CORDIC produces the output to a given input value and not when calculation is started, the select input of the multiplexer is delayed by the combined latency of the float-to-fixed conversion and the CORDIC IP core. Prior practical tests (Section 3.2.5) have indicated that the IP core cannot always be operated in a fully-pipelined parallel computation mode like all other current EUs are, but for highest precision has to be switched to serial mode instead due to timing issues. This is unproblematic as long as only one operation is started for testing purposes or optimization is disabled in the assembler, but for real-world usage the assembler would have to be modified to honor the restriction that only one trigonometric calculation can be in progress at all times.

The last consideration concerning the SinCos EU is whether the achieved precision of 48 bits in calculation is appropriate for the purposes of the ViSARD which is conceived as co-processor for highly precise calculations. As the ViSARD works with 64-bit numbers when double precision is selected, it appears like 16 more bits of information would be needed. This can, however, not be compared easily since the IP operates on fixed-point numbers which are represented differently as explained above. The smallest difference expressible by the 46-bit fraction of the CORDIC output is fixed at 2−46 _{≈ 1.4·10}−14_{, which is also the smallest non-zero number repre-} sentable. The smallest non-zero double precision floating point number is 2−1022 ≈ 2.2 · 10−308_. As the CORDIC algorithm requires one more iteration per additional bit of calculatory precision, hundreds of iterations would be necessary to reach a comparable level for all possible input values. Table 3.2 already showed that resource usage of the IP core in parallel mode of operation grows rapidly. Having even more iterations is not realistic at least with the imple- mentation offered by Xilinx. It is worth mentioning that the lack of range is most problematic for the sine operation when numbers are extremely small3_{. When looking at input numbers}

2_{Many operations allow to treat fixed-point numbers the same as or very similarly to integers.}

3_{This applies likewise to both input and output numbers, since the sine operation on very small numbers is}

approximately equal to the identity function. With the cosine operation, small inputs produce values around 1, for which floating-point numbers do not offer a significant range advantage.

between 0.1 and 1, the fixed-point representation offers between 42 and 45 bits of precision. This is a loss of only 8 to 11 bits compared to the 53 bits of a double floating-point number. Alternative approaches use e.g. precomputed LUTs or Taylor series approximations, but they similarly cannot reach the full range of precision of floating point numbers. It is therefore the opinion of the author that the CORDIC module represents an appropriate compromise between precision, speed, and resource utilization fit for the ViSARD. [NDB05; Ort+_{03; DdD07]}

In document D E H O M B R E S DIOSE y S (página 190-200)