Leakage resilient can be seen as a complementary approach to mask- ing. Indeed, masking (and shuffling) reduces the amount of information leaked. But, with sufficient measurements (and limited computational power), a successful attack can always be performed. The aim of leakage resilience is to limit the total amount of information that an implementa- tion will leak. Hence, a successful attack can be mounted only if enough computational power is available.
In secret key cryptography, most leakage resilient constructions use re-keying strategies, i.e. the key is changed after a limited number of computations, as first suggested by Kocher [112]. Examples of primitives include Pseudo-Random Generators (PRGs) [63, 66, 148, 179, 180, 195, 196] and Pseudo-Random Functions (PRFs) [2, 59, 66, 133, 180, 195].
In this thesis, we will consider the block cipher-based PRG and PRF illustrated in Figures 2.11 and 2.12, respectively. For every master key k(i), the PRG produces a new master key k(i+1) and n− 1 strings
c(i),0, c(i),1, . . . , c(i),n−2, both obtained by encrypting n public plaintexts pjwith the block cipher E and the master key k(i).
For the PRF, we use the tree-based construction from Goldreich, Goldwasser and Micali [78], where each step incorporates log2[n] input bits and generates k(i+1) = Ek(i)(pj). Following [133], the last stage is
optionally completed by a whitening step, in order to limit the data complexity of attacks targeting the PRF output to one (e.g. when using large n values, typically).
Quite naturally, there is a simple security versus efficiency tradeoff for both types of constructions. In the first (PRG) case, we produce a 128-bit output stream every n−1n AES encryptions. In the second (PRF) case, we produce a 128-bit output every log128
2.4. Other countermeasures 31 k(0) E k(1) E k(2) . . . . . . c(0),1 p1 c(0),0 p0 c(0),n−2 pn−1 . . . c(1),1 c(1),0 c(1),n−2 . . . p1 p0 pn−1 Figure 2.11: Leakage-resilient PRG. k E px0 px1 E . . . px15 E E p Fk(x) k(0) k(1) k(15) k(16) Figure 2.12: Leakage-resilient PRF.
(+1 if output whitening is used). The details of these primitives are not necessary for the understanding of this work. The only important feature in our discussions is that the PRG construction is stateful while the PRF one is stateless. As a result, the PRG limits the number of measurements that a side-channel adversary can perform with the same key, while the PRF limits his data complexity (i.e. the number of plaintexts that can be observed). In practice, it means that in this latter case, the same measurement can be repeated multiple times, e.g. in order to get rid
of the physical noise through averaging. As discussed by Medwed et al. in [133], Section 3, this may lead to significant difference in terms of security against DPA.
Part I
Chapter 3
Evaluation of State of the Art
Masking Schemes
In this chapter, we investigate the performance gap between masking and MultiParty Computation (MPC) in the relevant case of AES im- plementations in an 8-bit microcontroller. We considered two different directions.
First, we compared a number of existing schemes.1 Our selection was
motivated by the two following criteria: (i ) exclude “broken” proposals (i.e. with low-order weaknesses), such as the multiplicative masking in [79], the higher-order masking in [175] (broken in [44]), or Goubin and Martinelli’s proposal in [82] (broken in [46]); (ii ) exclude schemes that do not systematically generalize to higher-orders, such as the affine masking in [71, 190], the threshold implementations in [140],2and several
ideas from the “early” DPA literature (see [123] for a survey).3
This essentially leaves with us the scheme introduced in the back- ground Chapter, the boolean masking over extension of F2 RivP [169],
its optimization by Kim et al. using extension fields for the AES S- box implementation in [105] (KHL), the switching between additive and multiplicative masking GPQ [73], and the MPC-inspired proposal RocP [170]. We implemented these different schemes up to the 10-th security order, with results illustrating a large gap between the MPC- inspired RocP (for which we additionally propose a slight optimization)
1Since I started this thesis, several new higher-order schemes were proposed. How- ever, they do not influence the conclusion of this work , the most interesting directions are [41, 45].
2Even if recent research has been done in that direction [24, 166]
3We also excluded the recently proposed “inner product” masking scheme from [10], although it is certainly an interesting scope for further investigation. The suggested masking scheme has shown to have a flaw [155] which was fixed in [9].
and other masking schemes.
Motivated by the large performance gap, we investigated a standard solution used in the MPC literature to improve performances, namely “packed secret sharing” [70]. In particular, we evaluated the extent to which the techniques proposed by Damg˚ard et al. in [51] could be used to enhance the performances of shared AES implementations, and how this performance gain depended on the order d. Intuitively, the idea of packed secret sharing is to “hide” several secrets (e.g. key bytes) in a high-degree polynomial, which leads to more efficient computations if operations on these secrets can be performed in parallel. We show that such a technique is indeed useful for protecting the AES S-boxes, and exhibit the linear amortized complexity that it allows. Yet, we also show that this amortized complexity only becomes beneficial for quite large orders. Such large orders are not used nowadays but can be applicable in the future.
Eventually, we tackled a usually neglected problem in the literature on masking, namely the randomness requirements. First, we briefly dis- cuss the impact of slight defaults in the Random Number Generator (RNG) used to produce fresh shares. In particular, we provide an in- formation theoretic evaluation of the cases where (i ) the RNG has a small bias, and (ii ) a counter was used to generate equally likely but predictable outputs. This evaluation naturally suggests that uniform randomness is a strong requirement for the security of masking (and MPC). Then, we evaluated the performances of our different masking schemes again, including the cost of (strong-enough) randomness gener- ation.
Overall, these results allow an implementer to decide which state-of- the-art masking scheme to use and why, in function of his security goals (in terms of order of the scheme and glitch-freeness), and performance constraints.
Methodology. As clear from the previous introduction, our goal is to compare the performances of a large number of masked implementations, up to high security orders. Relying exclusively on optimized assembly language was out of reach in this context. As a result, we systematically took advantage of C language descriptions, and paid a particular atten- tion in optimizing them in such a way that their compilation on an 8-bit device was close enough to the one of published implementations. In particular, we used the AVR-GCC compiler (with option -o2) to obtain codes for an Atmel AtMega644p 8-bit microcontroller. And for each im- plementation published by independent authors (e.g. in [73, 105, 169]), we made sure that our performances were comparable up to a factor two
3.1. Comparison & improvement of existing schemes 37