Respuesta del testimonio de vida - Programa de educación para la sexualidad y construcción de c

Secure multiparty computation (SMC) refers to a computation performed by two or more mutually untrusted parties [10, 11]. Each party owns some private data which it is not willing to share with other parties. However, all the parties are interested in performing a computation on the union of data belonging to individual parties. An example of such a situation may be a taxation oﬃce and a social security department which are both interested in mining their joint data but are legally precluded from sharing conﬁdential individual information without the explicit consent of their clients.

The SMC problem was originally introduced in 1982 by Yao [11] and has been extensively studied since then. In addition to privacy-preserving data mining, the SMC problem has applications in other areas, including the security of statistical databases and private information retrieval (PIR).

In order to illustrate SMC in privacy-preserving data mining, we describe a computation of a secure sum, which is often used to illustrate the concepts of SMC, as well as to show how the system can be subverted if the parties

11 Privacy-Preserving Data Mining 157 collude [12]. In our example there are s parties (sites), where site i owns a valuexi which it is not willing to share with other parties. Suppose that the sum x= s i=1 xi

is in the range [0. . . n]. Then site 1 generates a random numberR1in the range [0. . . n], computes R2 = (R1+x1) modn and sendsR2 to site 2. Note that likeR1,R2is also uniformly distributed over the range [0. . . n], and thus site 2 cannot learn anything aboutx1. Site 2 then computesR3= (R2+x2) modn and sends it to site 3. Finally, site s receives Rs, computes Rs+1 = (Rs+

xs) modn and sends it back to site 1. Site 1 then calculates x = (Rs+1 −

R1) mod n, and sends xto all the parties. Note that each party is assumed to have used their correct valuexi.

If there is no collusion, party ionly learns the total sum x, and can also calculate (x−xi) mod n, that is, the sum of values of all the other parties. However, if two or more parties collude, they can disclose more information. For example, if the two neighbors of partyi(that is, partiesi−1 andi+ 1) collude, they can learnxi= (Ri+1−Ri) modn. The protocol can be extended in such a way that each party divides its value intomshares, and the sum of each share is computed separately. The ordering of parties is diﬀerent for each share, so that no party has the same neighbors twice. The bigger the number of shares, the more colluding parties are required to subvert the protocol, but the slower the computation. In general, collusion is considered to be a serious threat to SMC.

In the last few years a number of SMC algorithms for various data mining tasks have been proposed [13, 14, 15, 16, 17, 18, 19, 20, 21, 15, 22, 23, 24]. Most of these algorithms make use of similar primitive computations, including secure sum, secure set union, secure size of set intersection and secure scalar product. Clifton et al. have initiated building a toolkit of such basic computation techniques, in order to facilitate the development of more-complex, application-specific privacy-preserving techniques [12]. For the benefit of the interested reader, we next describe some of these application specific techniques, and where applicable we specify which primitive computation technique was used.

Secure multiparty computation for association rule mining has been studied in [13, 14, 15, 16]. The task here is to develop an SMC for ﬁnding the global support count of an item set. For data that is vertically partitioned among parties, and boolean attribute values, ﬁnding the frequency of an item set is equivalent to computing the secure scalar product [14]. For horizontally partitioned data the frequency of an item set reduces to the secure set union [15].

An algorithm for SMC of association rules that prevents a k-compromise is presented in [13], wherek-compromise refers to the disclosure of a statistic

158 L. Brankovic, Md.Z. Islam, H. Giggins

based on fewer thankparticipants (for more details see Chap. 12, Sect. 12.2). However, this algorithm is not resistant to colluding participants.

Another technique for horizontally partitioned datasets [16] relies on the fact that a global frequent item set (GFI) has to be a frequent item set in at least one of the partitions [25]. GFIs are those itemsets having global support count greater than a user-deﬁned threshold. In this technique, maximal frequent itemsets (MFI) of all partitions are locally computed by the parties. The union of all these local MFIs is then computed by a trusted third party. The support counts of all possible subsets of each of the MFIs belonging to the union are computed by the parties locally. Finally, the global summation of the support counts for each itemset is computed using the secure sum computation. GFIs can be used for various purposes, including the discovery of association rules and correlations.

Building a decision tree on horizontally partitioned data based on oblivious transfer was proposed in [21]. The protocol uses the well-known ID3 algorithm for building decision trees. Each party performs most of the computations independently on its own database. This increases the efficiency of the protocol. The results obtained from these independent computations are then combined using an efficient cryptographic protocol based on oblivious transfer and specifically tailored towards ID3.

A secure multiparty computation function for naive Bayesian classiﬁer on horizontally partitioned data that relies on the secure sum was proposed in [15]. The same paper also provides an extension based on a secure algorithm for evaluating a logarithm [21] to enhance the security.

Secure protocols for classification on vertically partitioned data relying on secure scalar product were proposed in [17, 18]. The protocol proposed in [17] builds a classifier, but does not disclose it to any of the parties, due to legal and/or commercial issues. Rather, the parties collaborate to classify an instance. However, the classifier can be reverse-engineered from knowledge of a sufficient number of classified instances.

A solution for building a decision tree on vertically partitioned data was proposed in [19]. This method is based on a secure scalar product, and uses a semi-trusted third-party commodity server in order to increase performance. A secure multiparty computation of clusters on vertically partitioned data was studied in [22]. Regression on vertically partitioned data was considered in [23, 20], while secure computing of outliers for horizontally and vertically partitioned data was studied in [24].

In document Programa de educación para la sexualidad y construcción de ciudadanía y sus incidencia en el proyecto de vida de los estudiantes (página 158-167)