Fundaciones del sector público autonómico

As mentioned earlier, Type 1 pairings can be evaluated on supersingular curves. These curves can be separated into three sub-classes: curves over binary fields (q = 2mwith k = 4), curves over fields of characteristic 3 (q = 3mwith k = 6) and curves over fields of large prime characteristic (q = p, p > 3 with k = 2). The representation of field elements inF3m

is not very straightforward on a binary system. The case where k = 2 requires p to have at least 512 bits (due to security reasons). Arithmetic operations in such a big ﬁnite ﬁeld would be very slow on a constrained 8-bit processor. Therefore, the most suitable curve to implement Type 1 pairings on sensor nodes is the binary curve with the embedding degree k = 4.

In order to achieve an adequate security level, the binary ﬁeld can be chosen asF2271.

The equation of the supersingular curve overF2271 is as follows:

y2+ y = x3+ x (5.8)

In this case the number of points on the curve is known to be 2271₊₂136_{+1 = 487805}_·r

where r is a large prime, large enough to make any Pohlig-Hellman-like attack [58] on the Elliptic Curve Discrete Logarithm Problem infeasible. The next thing is to consider the security of the discrete logarithm overF(2271₎4, where it is vulnerable to an index calculus

type of attack. The current state of knowledge of the Discrete Logarithm Problem over extension ﬁeldsF2m·k is not well studied. However, it is believed to be roughly the same

as that of the Discrete Logarithm Problem overF2m, for prime m, where m ≈ 4 · 271 =

1084. In this setting the current record is for m = 613 [68], so there is (for now) a relatively wide margin of safety.

From the previous sections, it is known that the fastest pairing algorithm on super- singular curves is the ηT pairing [7]. Supersingular curves lead to more efﬁcient imple-

mentations in terms of bandwidth, memory usage and processing speed, hence they are more suitable for wireless sensor networks. The ηT pairing was implemented according

to Algorithm 9. Elements in the extension ﬁeld F24m were represented as polynomials

with 4 coefﬁcients in F2m. So, for example, element b(z) = B₃z3+ B₂z2 + B₁z + B₀ in

F24m was represented as a vector [B₃, B₂, B₁, B₀]. The elements s, t∈ F₂4m were equal to

[0, 1, 1, 0]and [0, 0, 1, 0] respectively. Both values were derived from the distortion map Ψ. Looking at Algorithm9, one can see that the most expensive part is the for loop which needs to be executed (m + 1)/2 times. All the operations inside this loop have to be optimized in terms of execution speed in order to achieve good performance of the algorithm. The most time-consuming operation inside this loop is the polynomial multiplication, and its implementation has a major impact on the overall execution time of the ηT pairing. In particular, multiplication ofF24m values (line 7) is very time-critical, but

consists of multiplications in F2m, which again emphasizes the importance of the base

ﬁeld multiplication. Using the Karatsuba method [74] forF₂4m multiplication, decreases

the number of necessary operations to nine modular multiplications and some additions which are very cheap in binary ﬁelds (just bitwise exclusive-or of the coefﬁcients of the polynomials). Inside the main loop there are also other important operations like squar-

Table 5.3:Cost of the ηT pairing on y2+ y = x3+ xcurve

ModMuls ModSquares Square roots

Main loop 1904 544 544

Final exp. 114 139 0

ing and calculation of the square root of the ﬁeld element. However, these functions are not as complex as binary polynomial multiplication and when efﬁciently implemented can be performed much faster.

The last stage of the ηT algorithm is the ﬁnal exponentiation. This step is not very time

consuming as it can be performed for the relatively inexpensive cost of (m + 1)/2 exten- sion field squarings, four extension field multiplications, one extension field division and some other cheaper arithmetic operations.

5.3.2.1 Binary ﬁeld arithmetic

The total cost of the ηT pairing (m = 271) in terms of basic arithmetic operations is given

in Table5.3. Additions are not taken into consideration because they are fast inF2m(being

just XOR operations). The key to efﬁcient ηT implementation lies in the performance

of binary polynomial multiplication. Therefore, Micro-pairings used assembly language routines for all the basic arithmetic operations on binary polynomials.

The binary polynomial multiplication inF₂271 was implemented using the optimized

hierarchical method described in Section 3.4.1.1. The optimizations for particular hardware platforms were performed according to Section3.4.1.1 as well. Other polynomial arithmetic operations like squaring, reduction and calculation of the square root were also implemented in assembly language on all three target platforms. Operations like modular reduction and calculation of the square root were strictly optimized for a spe- ciﬁc form of the irreducible polynomial f (z) = z271+z201+1. Reduction modulo f (z) was implemented based on Algorithm3, and squaring of binary polynomials was performed as described in Section3.4.2. Calculation of the square root inF₂271 was implemented us-

ing the techniques from Section3.4.3. Table5.4summarizes the performance of the main arithmetic routines in F2271 on all three target processors. All values are in clock cycles

of a given CPU and the multiplication and squaring routines also include the reduction operation.

Table 5.4:Timings in clock cycles for modular arithmetic routines inF2271

Atmega128 MSP430 PXA271

Operation Mul Sqr Sqrt Mul Sqr Sqrt Mul Sqr Sqrt

Assembly 13557 1581 1730 10147 1363 1644 4926 499 546 C code 66271 4711 12021 40666 3667 11212 13183 2375 2496

Decrease 80% 66% 86% 75% 63% 85% 62% 79% 78%

All the figures in Table5.4were obtained on simulation environments like AVR Studio (Atmega128) and IAR Embedded Workbench (MSP430 and PXA271). There were signifi- cant differences for the same optimization levels when different compilers were used (for example gcc and the IAR compiler). That it is why the same settings for the compilers in all cases were used. Optimization flag -O0 was set during all simulations, so the results in Table5.4can be directly compared with other implementations no matter which compiler is used. Results achieved with the assembly language routines are compared with a C-only implementation to show the savings in execution time.

As can be seen in Table5.4, the difference between the standard C code functions and specially optimized assembly routines is quite signiﬁcant. Handcrafted code gave a nice improvement in execution time on all tested hardware platforms. All the operations timings were decreased by between 60% and 85%. Square root computation was around four to seven times faster and, (of the most signiﬁcance for the ηT algorithm), polynomial mul-

tiplication improved up to ﬁve times. Field-speciﬁc assembly code gives the maximum speed up for the ηT pairing algorithm. The timings in clock cycles for the ηT pairing to-

gether with memory occupation on all three processors are presented in Table 5.5 and Table5.6.

Table 5.5:Performance of the ηT pairing on Atmega128 and MSP430

Atmega128 MSP430

Cycles ROM Stack Cycles ROM Stack

Assembly 19,660,993 47.41KB 3.17KB 14,097,304 23.66KB 4.17KB C code 80,608,843 41.23KB 3.17KB 50,684,686 23.01KB 4.17KB

Decrease 76% -15% 0% 72% -3% 0%

With the introduction of specially optimized arithmetic routines, Micro-pairings cal- culate the ηT pairing 65-76% quicker. In the best case (the ATmega128 CPU) the execution

Table 5.6:Performance of the ηT pairing on the PXA271 PXA271

Cycles ROM Stack

Assembly 6,002,134 29.55KB 4.12KB C code 16,974,044 37.24KB 4.12KB

Decrease 65% 20% 0%

optimization of critical routines in assembly language leads to a large performance increase on embedded microcontrollers. Usually on standard desktop computers, savings of around 20-30% are possible to achieve when using assembly language.

The results presented in Tables5.5and5.6are especially signiﬁcant because in almost all cases, the same level of memory usage was achieved. The memory requirements for the ηT pairing on the three platforms tested are reasonable when taking into considera-

tion the complexity of the operations. Stack usage in all implementations remained at the same level, as assembly routines did not use any additional variables. RAM utilization may seem high, but the memory is reserved only for the duration of the pairing calculation. After that, all of that RAM memory is released and can be reused for different purposes. Stack size values presented in Tables5.5and5.6were also the peak numbers during program execution. Average stack utilization was usually 60% of those values. The increase in memory overhead is considerable only on the ATmega128 platform, but provides the best performance results. For the MSP430 processor, the 3% increase in ROM utilization is negligible, as it leads to 72% improvement in execution time. On the PXA271 microcontoller the assembly routines resulted in a 20% decrease in program code.

In document LEGISLACIÓN CONSOLIDADA. TEXTO CONSOLIDADO Última modificación: 15 de febrero de 2016 (página 39-47)