Due to a sharp decrease in hardware costs and form factor in recent years, sensors have become ubiquitous. A variety of sensors are being embedded into user devices such as smartphones, tablets, and wearable devices. Our homes, buildings, and automobiles are also being increasingly outfitted with sensors. On one hand, data collected from sensors is used to fuel a variety of applications including those for energy efficiency, safety, and healthcare. On the other hand, sensor data can be processed to reveal unwarranted private information about the user. As a result, many users avoid using such applications that require access to private sensor data to protect their privacy. It is therefore imperative to preserve data privacy while enabling a rich ecosystem of applications that process users’
private sensor data.
The focus of this dissertation has been on building a system that enables data-driven applications while preserving data privacy, without burdening users with the tasks of visioning durable data storage and computational resources for the applications. Our pro-posed approach leverages users’ personal virtual execution environments (VEEs) hosted in the cloud to create an ecosystem of privacy-preserving applications that process personal data. We examine the feasibility of the proposed architecture by providing a proof-of-concept instantiation targeting home energy as an example use case. We observe that
applications have specific data storage and retrieval requirements for sensor data. Con-sequently, we build a data management system for time-series sensor data to meet these requirements while leveraging commodity storage service providers. We also observe that our approach provides each user with a personal VEE and will require provisioning a large numbers of personal VEEs for supporting large numbers of users. Therefore, we develop a new methodology to provision a large number of VEEs with low-duty cycle workloads on a single machine. We formulate the theoretical foundations of the methodology and demonstrate its application to an example virtualization solution.
The key insight underlying our proposed architecture is that allocating appropriate levels of commodity services offered by modern clouds directly to users (rather than ap-plication developers) enables users to host both their data and their desired data-driven applications within their confines; thus enabling a large ecosystem of data-driven applica-tions and preserving users’ data privacy. Our work has to been on designing and imple-menting the mechanisms, techniques, and methodologies to realize this vision. The key contributions of each chapter are now summarized.
As justified in Chapter 3, the initial focus of our proposed architecture is home energy data. Our main contributions in that chapter are as follows.
• We present the architecture of a system that allows users to own and control access to their home energy consumption data and freely use data-driven applications of their choice.
• We demonstrate the feasibility of the architecture by building a prototype implemen-tation using commodity cloud computing platforms.
• We present a qualitative evaluation of the prototype implementation with respect to data privacy and data use and describe how it meets our design goals.
The main architectural components that we design and implement include, (i) an in-home gateway for relaying sensor data and control commands to the personal VEE, (ii) a personal VEE framework which warehouses home energy data, exposes it to third-party applications via APIs, and provides data access control, and (iii) sample home energy applications that run within an instance of VHome. Our personal VEE framework also enables applications to securely and privately control some home appliances.
As described in Chapter 4, existing sensor data storage systems (including our proto-type implementation in Chapter 3) do not meet all the data management requirements of applications. Our work in Chapter4addresses this problem by designing a storage system for sensor data and makes the following contributions.
• We survey a wide range of data-driven applications and present a formulation of their data management requirements.
• We describe the design and implementation of Bolt, a system for storage, retrieval, and sharing of in-home sensor data that meets the applications’ requirements.
• We present a performance evaluation of Bolt using three sample applications. We demonstrate that, when compared with OpenTSB [27], Bolt’s use of chunking, seg-mentation, and index-DataLog separation techniques leads to a decrease in data retrieval time of up to a factor of 40, and a 3-5 times reduction in storage space.
Moreover, Bolt provides applications with programming abstractions for storage, range-query based retrieval, and sharing of sensor data, thus simplifying application development.
As described in Chapter 5, we argue that there is, or will soon be, a need for hosting large numbers of personal VEEs on a single machine. We hypothesize that this may be accomplished in an efficient way by multiplexing VEEs across multiple inactive states.
Our insight is that many workloads executed on personal VEEs would exhibit frequent, often long, and uncorrelated idle periods. Therefore, existing work on inactive states for reducing the resource footprint of idle VMs can be leveraged to maximize the number of VMs hosted on a machine at the cost of a small latency penalty for client requests (called the miss penalty). Our work in Chapter5 explores this design alternative and makes the following key contributions:
• We present a formal model for policies for managing idle VMs across inactive states and derive a lower bound on the miss penalty of reactive policies.
• We present a measurement of model parameters using microbenchmarks with LXC [22]
as our example virtualization solution.
• We present a study of a few representative idle VM management policies, quantify their miss penalties, and draw insights into their behaviour.
We design and implement a simulator for studying the behaviour of the idle VM manage-ment policies. Our simulator is extensible to other policies, to other types of workloads, and to inactive state hierarchies of other virtualization solutions.
As described in Chapter 6, we argue that application developers and users are in a constant tussle over access to sensor data. Therefore, we posit that device operating systems need to formally recognize tussles and need to implement mechanisms and policies
to resolve them. Our work in Chapter6outlines the design of tussle-based framework, the abstractions, mechanisms, and policies it requires, and identifies various open problems in instantiating it.
In summary, the architecture, techniques, mechanisms, and methodologies proposed in this dissertation present a viable solution to achieve the seemingly conflicting requirements of enabling data privacy and data use.