Cryptographic algorithms often dominate in computational costs of security protocols which yields a need for acceleration; for example, over one half of the processing time in Secure Sockets Layer (SSL) is consumed in cryptographic algorithms [231]. Acceleration is typically performed by delegating the most expensive operations to an accelerator while the rest of the application is imple- mented in the main system which is usually a general-purpose processor [281]. Consider DSA presented in Sec. 2.2.4 as an example. The most demanding task in DSA is the modular exponentiation and, hence, DSA can be accelerated by delegating exponentiations to an accelerator while the main processor performs other operations concurrently. Because this thesis predominantly considers im- plementing cryptographic algorithms using FPGAs, the following assumes that
Memory
caches interfaceI/O
(a) (b) (c) (e) Processor (d) Processor
Figure 3.4: Classification of reconfigurable systems [69, 280]; (a) external stand- alone processing unit, (b) attached processing unit, (c) coprocessor, (d) reconfigurable functional unit, and (e) processor embedded in reconfigurable logic.
the accelerators are implemented with reconfigurable logic. Many of the princi- ples are, however, valid also for other implementation platforms, such as ASICs. Fig. 3.4 shows the classification of reconfigurable systems. The first four classes were presented by Compton and Hauck in [69] and their classification was complemented by Todman et al. in [280] with the class where the pro- cessor is embedded in reconfigurable logic. The closer the reconfigurable unit is to the control processor the faster the communication between the unit and the processor is but, as a downside, the amount of reconfigurability is usually smaller and more control is required [280]. The external stand-alone processing unit favors applications which are computationally intensive but require little communication and the reconfigurable functional unit, e.g., reconfigurable in- structions inside a processor, is ideal for small but frequently used tasks [69]. The attached processing unit and coprocessor classes can be seen as tradeoffs between these two extremes. The last class, where the processor is embedded in reconfigurable logic, is currently emerging in many applications because the growth in FPGA resources has enabled implementing entire systems within one FPGA. This approach combines many of the advantages of the other four classes but requires a large (and expensive) device. The embedded processor can be either hard core, such as PowerPC in Xilinx Virtex-4 FX [301], or soft core, such as Nios II in Altera FPGAs [5]. Notice that all classes except (d) can be implemented with a stand-alone FPGA. Thus, the implementations of this thesis suit for all other classes except (d).
FPGAs have several advantages in cryptographic applications [91, 293]: Algorithm agility It is possible to easily switch from an operation to an-
cause modern cryptosystems must often support many encryption algo- rithms. It is also possible to delete broken algorithms and introduce new algorithms into the system.
Algorithm upload It is possible to easily update devices which are already in use. This is, again, important for cryptographic applications because parameters of the current cryptosystem may need to be changed, e.g., key length needs to be increased or an algorithm needs to be changed as standards expire, e.g., DES needs to be changed to AES.
Architecture efficiency Significantly more efficient implementations can be designed with fixed parameters in certain cases. Reprogrammability of FPGAs allows designing optimized implementations for all parameters separately because the device can be reprogrammed when parameters are changed whereas an ASIC design must support all parameters and such optimizations are out of reach. One example is the finite field multiplica- tion in polynomial basis; see Ch. 4. If an irreducible polynomial is fixed, reductions required in multiplication can be hardwired resulting in faster and smaller multipliers. If the design must support arbitrary irreducible polynomials, such optimizations cannot be made.
Resource efficiency Many protocols are hybrid in the sense that a public-key algorithm is used in the beginning for key exchange after which the actual communication is encrypted using a secret-key algorithm. Because these algorithms are not used simultaneously, they can be switched on-the-fly by reprogramming the device which leads to a considerable reduction in required resources.
Algorithm modification Parts of standardized algorithms, e.g., S-boxes, may need to be modified, and such modifications are not a problem because of reprogrammability.
Throughput Considerable increase in throughput is achievable with FPGAs when compared to general-purpose processors as shown in an extensive number of publications; see, e.g., [59, 106, 107, 280].
Cost efficiency FPGA-based projects are typically significantly cheaper than ASIC-based projects when the products are targeted to low-to-medium sized markets. The main reasons are shorter time-to-market and lower initial costs. ASICs become cheaper when the production quantities rise because production of a single chip is cheaper after the often very large startup cost.
It has been stated based on the above reasonings that cryptographic al- gorithms are prime candidates for FPGA-based implementations [281]. The suitability of FPGAs for cryptography has been pointed out also in other pub- lications, e.g., [34, 280].
Although embedded function units, such as DSP blocks, certainly improve performance and power consumption in many applications, they have little use in most cryptographic applications, with the exception of embedded memories. The reason for this is that cryptographic algorithms commonly require “exotic” operations, such as arithmetic in finite fields, which are not supported by the
embedded blocks of modern FPGAs. Introduction of blocks specialized for cryptographic operations would naturally benefit the performance and power consumption of cryptographic algorithms on FPGAs, but currently none of the FPGAs available at market supports cryptography if hardwired cryptomodules for programming bitstream decryption are not taken into account.