• No se han encontrado resultados

CAPÍTULO II METODOLÓGIA DE LA

5. Unidad de análisis

1.2 Almidones modificados

Several of the threats within the digital security and political security domains (e.g. automated spear phishing, personalised propaganda) rely on attackers gaining access to private

information about individuals. In addition to procedural and legal measures to ensure individuals’ privacy, there is increasing research on technological tools for guaranteeing user data

privacy, which may also be applicable in the context of AI systems. We highlight two technologies as potentially relevant here:

differential privacy-guaranteeing algorithms and secure multi- party computation. There remain open questions regarding both technologies:

Can algorithmic privacy be combined with AI technologies,

either in general or in specific domains?

What are the trade-offs, if any, for implementing algorithmic

privacy, e.g. in terms of performance or in terms of financial viability of services?

What mechanisms (financial, educational, legal or other) could

encourage the adoption of algorithmic privacy in AI systems?

What lessons can be learned by efforts at technologically

guaranteed privacy (such as Apple’s use of differential privacy)?

Differential privacy

Many machine learning models are currently being developed by companies for commercial use in APIs (see central access licensing above). Without precautions it is possible for individuals to break anonymity in the underlying dataset of a machine learning model that has been deployed for public use via a model inversion attack or membership inference attack . That is, even without access to the training data, an attacker can in some cases query a model in such a way that information from the underlying data set is revealed.

Ji et al. (2014) surveyed methods for providing differential privacy in machine learning systems , though they do not address differential privacy in neural networks. Such methods have been reported by, for example, Abadi et al. (2016). In general, differentially private machine learning algorithms combine their training data with noise to maintain privacy while minimizing effects on performance. Generally, differentially private algorithms

1

2

3

Fredrikson et al., 2015 Shokri et al., 2016

a concept first developed in (Dwork, 2006) referring to strong guarantees on the probability of information leakage

p

.9

6

Appendix B: Questions for Further Research

lose some performance compared to their non-private equivalents, and so privacy may become a concern if the teams developing models are not incentivized to keep their datasets private. Secure Multi-Party Computation

Secure multi-party computation (MPC) refers to protocols that allow multiple parties to jointly compute functions, while keeping each party’s input to the function private . For instance, one simple MPC protocol allows users to jointly compute the outcome of a vote, without sharing their individual votes with one another. As an important practical application, MPC protocols make it possible to train machine learning systems on sensitive data without significantly compromising its privacy . For example, medical researchers could train a system on confidential patient records by engaging in an MPC protocol with the hospital that possesses them. A technology company could similarly learn from users’ data, in some cases, without needing to access this data. An active open source development effort (OpenMined) is

currently aiming to develop a platform to allow users to sell others the right to train machine learning systems on their data using MPC . A number of other frameworks for privacy-preserving machine-learning have also been proposed .

In addition, MPC opens up new opportunities for privacy-

preserving web applications and cloud computation. For example, one company may develop machine learning models that can make predictions based on health data. If individuals do not want to send this company copies of their personal medical data, they may instead opt to engage in an MPC protocol with the company, and in particular an MPC protocol where only the individual receives the output. At no point in this process does the company gain any knowledge about the individual’s medical data; nevertheless, it is still able to provide its service.

MPC could also help to enable privacy-preserving surveillance . To the extent that AI systems play active roles surveillance, for instance by recognizing faces in videos or flagging suspicious individuals on the basis of their web activity, MPC can be used to increase individual privacy. In particular, MPC makes it possible to operate such systems without needing to collect or access the (often sensitive) data that is being used to make the relevant classifications.

At the same time, the use of MPC protocols remains limited by the fact that, in many cases, they can increase overhead associated

1

2

3

4

5

Yao, 1982

Lindell and Pinkas, 2009

OpenMined, 2017

e.g. Rouhani et al. (2017)

Dowlin et al., 2016; Trask, 2017; Garfinkel, forthcoming

p

.97

with a computation by multiple orders of magnitude. This means that MPC is best-suited for relatively simple computations or for use cases where increased privacy would be especially valuable.

Monitoring Resources

One type of measure that might help to predict and/or prevent misuse of AI technology would be to monitor inputs to AI systems. Such monitoring regimes are well-established in the context of other potentially dangerous technologies, most notably the monitoring of fissile materials and chemical production facilities for the purpose of implementing nuclear and chemical weapon agreements. An obvious example of an input that might be possible to monitor is computing hardware. While efforts have been made in the past to survey computing resources , there is no major ongoing public effort to do so, with the best available information likely withheld due to commercial or state secrecy. One possible benefit to having a public, or semi-public, database of the global distribution of computing resources could be to better understand the likely distribution of offensive and defensive AI/cybersecurity capabilities. Additionally, having such monitoring in place would be valuable if stronger measures were to be employed, e.g. enforceable limitations on how hardware could be used. Questions for further consideration include:

How feasible would it be to monitor global computing

resources?

Are different domains more or less tractable to monitor, or more

or less important for AI capabilities, than others (e.g. should video game consoles be considered, in light of their large share in total computing but limited current role in AI)?

What could be done with such information?

Are there drawbacks to such an effort (e.g. in encouraging

wasteful “racing” to have the most computing power)?

Would other AI inputs be better suited to monitoring than

computing resources?