MANEJO DE REPRODUCTORES
PROGRAMAS DE ALIMENTACIÓN
5.1. FORMACIÓN DEL HUEVO
2.9
Conclusion
This chapter introduces the background and mathematical tools that are of prime im- portance in the design of elliptic curve scalar multiplier. Basic concepts of different cryptographic schemes with their recommended key sizes are introduced first. Then, finite fields and elliptic curve arithmetic over prime field are presented. Next, different implementation strategies of EC scalar multiplication at different levels of its imple- mentation hierarchy are discussed. Finally, FPGA structure is briefly introduced. The discussion of hardware acceleration of finite field arithmetic operations is presented in the next chapter.
Chapter 3
Hardware Architectures for Finite Field
Arithmetic
Crypto-systems based on public-key cryptography (PKC) [1], [2], [3], [8] are struc-
tured using finite field arithmetic primitives such as modular addition, subtraction, multiplication, and inversion [45]. Among these primitives, modular multiplication
and inversion/division are the most computational intensive operations. In fact mod- ular inversion/division are the most tedious and expensive operations as compared to modular multiplication operation. Therefore, alternative ways have been investi- gated and designed to perform inversion/division free EC group operations. These methods are known as projective coordinates as described in Chapter 2. Therefore using projective coordinates is the most critical field operation is a modular multipli- cation[16], [15].
This chapter first describes algorithms and design strategies to perform modular addition, modular subtraction, and modular inversion/division operations. Then, it presents two novel modular multiplier architectures based on radix-4, radix-8, Booth encoding and interleaved multiplication techniques. Radix-4, radix-8 and Booth en- coding techniques are used to optimize the interleaved modular multiplication algo- rithm. The optimized radix-4 and radix-8 versions of interleaved modular multipli- cation algorithms result in 50% and 66% reduction in total number of clock cycles
3.1. BACKGROUND AND RELATED WORK
required to perform a modular multiplication operation. The proposed multipliers do not require any operand and result conversion as required in Montgomery method discussed in the next section. Performance of the presented multiplier architectures is discussed and analysed for different field sizes.
3.1
Background and Related Work
Finite filed arithmetic operations are the fundamental components to construct any EC crypto-systems. Among these components field multiplication, field inversion and field division are more critical than field addition and subtraction due to their inher- ent computational difficulties. In fact, field inversion/division operations are more expensive in terms of computation time and resource requirements as compared to field multiplication both on hardware and software platforms. Projective coordinates systems enable inversion/division free EC group operations. Thus, the most critical operation in EC group operation in projective coordinates is finite field multiplication. Several techniques have been proposed to speed-up finite field multiplication opera- tion discussed as follows.
The classical method to perform a finite field multiplication of operands a and b over a prime modulus p is defined in equations (3.1) and (3.2).
R= a × b (3.1)
c= R mod p (3.2)
It is a two-step process: integer multiplication and reduction modulo p. The reduction modulo p step typically requires a trail division operation which is a very computa- tional intensive operation, therefore many strategies have been proposed to lower the computational intensity of the reduction step. Generally these strategies can be divided into in three main categories[35,84]: designs over standard primes [85], de-
signs based on Montgomery multiplication method[54] and designs over interleaved
multiplication method[86,87].
In order to lower the computational intensity of the reduction step NIST recom- mended five specialized primes p of size (p192, p224, p256, p384, p512) as given in Table 2.4. These primes have a special structure that are very close to a power of 2 i.e.,
3.1. BACKGROUND AND RELATED WORK
2a± 2b± 2c± 2d± 1, and are called pseudo-Mersenne primes. Modular multiplication
operation over this type of prime can result in higher performance and lower com- putational cost. However, a design optimized for a particular modulus value results in a very dedicated architecture, which can not be used for any other prime values, hence the architecture lacks flexibility. A pipelined modular multiplier design reported in [88] can support five NIST recommended primes. Its datapath is comprised of 8
pipeline stages with a latency of 80ns for primes of size 192, 224, 256-bits and 200 ns for 384, 256-bits. It consumes 8340 slices and 259 dedicated DSPs blocks on Virtex-6 FPGA platform, which may not fit into smaller FPGAs, but is suitable for high speed applications. Designs reported in [48], [52] also exploited special structure of NIST
primes, p224 and p256. These implementations are devoted to p224,p256 and are not
able to provide the flexibility to accommodate other primes, which is one of the main focuses of this thesis.
Montgomery multiplication method converts the required division operation into cheaper shift and addition operations. However, to make use of the Montgomery method operands must be transformed from normal to Montgomery representation to perform operation in the Montgomery domain, and the result must be transformed back to the normal domain to yield the final result of a modular multiplication op- eration. The method is suitable where the conversion overhead is negligible as com- pared to the main operation cost, for example in exponentiation algorithms. Mont- gomery multiplication based designs are reported in [89] and [90], in which [90] is
based on radix-4 and [89] incorporates radix-216 techniques. The designs reported in[57], [60], [63], [69], [71] are based on radix-2 implementation. In [91] several
possible implementation strategies of Montgomery multiplication are discussed on the basis of performance and implementation cost. Amnor et al in[92] report that radix-2
implementation of interleaved modular multiplication has better area-delay product. Other interesting hardware implementations of Montgomery modular multiplica- tion are [89], [93], [94] and [95]. Among these [93] presents interleaved modular
multiplier based on Montgomery and Barrett reduction techniques,[94] presents time
and area efficient modular multiplier. The design in[95] is based on redundant radix-
216while[89] is based on radix-256.
The designs reported in [96], [97] used built-in FPGA Digital signal Processing
3.1. BACKGROUND AND RELATED WORK
method.
Interleaved multiplication method was proposed by Blakley[86,87] in 1983. The
method is based on iterative addition and reduction of partial products. Partial prod- ucts accumulation and intermediate results reduction are integrated in a way to elim- inate the final division. The idea is to reduce intermediate results below the modulus value in each iteration so that the final division can be avoided. The algorithm starts traversing a multiplier from most-significant-bit (MSB) to least-significant-bit (LSB). Several modifications and hardware architectures have been reported[55], [56], [92],
[93], [98], [99], [100], [101], [102]. In [93] a faster interleaved modular multiplier
based on Montgomery and Barrett reduction techniques is reported. Its 130-nm ASIC implementation runs at a maximum frequency of 320 M Hz and computes one 256-bit modular multiplication in 0.05 us.
Ghosh et al. in [100] reports a radix-2 parallel interleaved modular multiplier. Its
Virtex-II Pro FPGA implementation consumes 3475 slices with a latency of 3.2 us and takes n clock cycles to perform an n-bit modular multiplication. The same multiplier is utilized in [99] in construction of a dual core pairing processor. A robust GF(p)
parallel arithmetic unit for public key cryptography is reported in[103]. The parallel
arithmetic unit can perform modular addition, subtraction, multiplication and inver- sion/division operations. The arithmetic unit adopted interleaved modular multipli- cation to perform modular multiplication and extended Euclidean algorithm (EEA) to perform inversion/division operations.
Similarly in [55] a compact programmable arithmetic unit is based on the same
algorithms (Interleave multiplication and EEA ). The required number of adders is reduced by exploiting hardware sharing techniques, however the unit is not able to execute field operations in parallel and is not suitable for high performance applica- tions. The design in[104] is based on pre-computation, carry save addition and sign
estimation techniques. However, it requires carry propagation adder at the final stage. Montgomery and interleaved multiplication methods are widely used in the de- sign of finite field multiplier. Performance comparison of these methods are discussed and analysed in [105]. The proposed higher radix modular multipliers in this thesis
is based on interleaved multiplication method, works directly on numbers in two’s complement formats and thus do not require any conversion. Performance of these