• No se han encontrado resultados

Museo de Prehistoria y Arqueología de Cantabria, Santander.

I further implemented a prototype ARIS application for a smart building ML system, i.e., predictive building energy optimization system as described in section 2.2.2, at a large commercial building in midtown Manhattan, New York City [244,209]. The building’s regular hours are 7:00 AM to 7:00 PM Monday through Friday, and 8:00 AM to 1:00 PM on Saturdays. The estimated energy cost of running the HVAC system of the building for an hour amounts to approximately $2,000 to $2,500 in 2011. The building uses electricity, steam and natural gas supplied by Con Edison, the main utilities company in New York City, for heating and cooling in the building. Management has installed a state-of-the-art energy monitoring system, which provides an archived data log of energy demand that can be used for predictive building energy optimization.

CHAPTER 3. AUTOMATED ONLINE EVALUATION FOR CPS 85

Data Aggregator

Data Preprocessor

Machine Learning Predictor

Business Rule Engine

Operation Actions by

Operator, BMS, Actuator

Building Systems

A

ut

om

at

ed

O

nl

in

e

E

va

lu

at

or

External Data

multiple stages in the ML system workflow. The evaluator employs intelligent real-time data quality analysis components to quickly detect data anomalies (e.g., malfunctions of digital thermostats that interfere with temperature reading or introduce variances from normal expected HVAC set-points) and gives feedback to building management, who can then respond appropriately.

More than 10 suspicious data anomalies were identified for over a two-month period (December 2011 to January 2012) and investigated the related data sources.

The following will describe the self-tuning of the ARIS to adapt the data quality analysis to the changing data patterns including seasonality changes.

In order to identify the ML model parameters (i.e., specifically C and γ values), and number of time delays that yield the most accurate and efficient model, a step-wise search method was used. The step-wise method works by running regressions using values of different orders of magnitude for a specific parameter, calculating the R2 value to assess accuracy, then evaluating on finer scales until the appropriate value is established. The same method was used for variable selection of C, γ and time delay values, where the test file incorporating real values as classifiers in order to compare the model’s accuracy at predicting for those values.

Based on the results of the R2 statistical tests, the best combination of variables for a February regression would be to use one year of energy data. For May, the best combination of variables would be two years of energy and temperature. While these statistical tests proved the accuracy of these models, two years of energy, temperature and humidity were used for all regressions.

It was important to include those variables in the creation of the model. A model using fewer variables produces smooth, highly cyclical curves, while the addition of more variables creates curves with more noise and statistically poorer fits. However, the inclusion of more variables allows the model to adapt more dynamically to changes in weather that occur within a single day or week, and it aids the model in predicting minimal and maximal energy

CHAPTER 3. AUTOMATED ONLINE EVALUATION FOR CPS 87

Figure 3.41: Predicted versus actual energy demand in May 2011 [209].

demand values.

Figures3.41and3.42show regression results of SVR prediction versus actual energy demand for two different five-month datasets at different times of the year. The spring graph is closer to the actual energy consumption of the building, with an R2value of about 0.95, while the winter graph is less accurate, with an R2 of about 0.71. The likely reason for the less accurate winter regression is that the SVR predictive model may need additional features in its dataset in order to better handle low winter temperature values, which cause increased energy demand for heating.

In summary, the experiments showed that the ARIS is effective in ensuring that the smart building ML system continues to run reliably and the self-tuning component can adapt the ARIS to the changing data patterns such as seasonality changes. The system runs independently from the smart building ML system. It runs efficiently and does not add much overhead to the smart building ML system.

3.6

Summary

This chapter presents automated online evaluation AOE that performs data quality analysis using computational intelligence and self-tuning techniques to improve system reliability for cyber-physical systems that process large amounts of data, employ software as a system component, run online continuously and maintain an operator-in-the-loop. My experiments with the ARIS system, a prototype architecture and implementation of AOE, in smart power grid CPS and smart building CPS have demonstrated that this approach is effective and efficient. The data-dependence of this system makes it easily applicable to different types of cyber-physical systems, and the open expandable architecture also enables the incorporation of new data quality analysis and self-tuning techniques.

Chapter 4

Cloud-Based Reliability Assurance

Framework for CPS

One limitation of the ARIS architecture and implementation described in sections3.3and 3.4is that it is not scalable or economical for dealing with certain types of CPS where large volumes of data need to be processed in parallel within a short period of time. In order to conduct efficient and cost-effective automated online evaluation for data-intensive CPS, I will describe a cloud-based reliability assurance framework called COBRA in this chapter. Using the language similar to the definition of quality assurance [78], reliability assurance is defined as the planned and systematic activities implemented in a system so that reliability requirements for a product or service are fulfilled.

In the following section, I will describe an overview with some background information on data-intensive computing and cloud computing. In section 4.2, I will describe the COBRA framework, followed by architecture in section4.3. In section4.4, I will describe implementation and some applicable cloud computing environments. Then in section4.5, I will describe the empirical studies through controlled experiments and an application of the framework on a smart building CPS before my conclusion in section4.6.

4.1

Overview

For the purpose of this thesis, data-intensive computing is a class of computing applications which often use a data parallel approach to processing large volumes of data, typically terabytes or petabytes in size and commonly referred to as Big Data [107]. Computing applications which devote most of their execution time to computational requirements are deemed compute-intensive and typically require small volumes of data, whereas computing applications which require large volumes of data and devote most of their processing time to I/O and manipulation of data are deemed data-intensive [155]. The challenge of data- intensive computing is to provide the hardware architectures and related software systems and techniques which are capable of transforming ultra-large data into valuable knowledge. Data-intensive applications are well suited for large-scale parallelism over the data and also require an extremely high degree of fault-tolerance, reliability, and availability [82].

Many CPS are data-intensive, requiring large volumes of data and devoting much of their processing time to I/O and data manipulation [155]. BMS and smart grid control systems are examples of typical data-intensive CPS; a BMS processes large amounts of data streams captured by sensors installed throughout the building, while smart grids process many continuous data sources from electrical and computational components.

Because of their data-intensive characteristics, complexity and the unpredictable running environment, it is often hard to estimate and improve the reliability of a data-intensive CPS prior to deployment. During the actual use phase, runtime evaluation can be used. This works in parallel with the CPS, continuously conducting automated online evaluation at multiple stages along the system workflow and providing operator-in-the-loop feedback for reliability improvement [240]. But reliability assurance using only local computing resources is often impossible, unscalable or too expensive for data-intensive CPS.

Thus, a cloud-based approach might be a good solution. According to the definitions by the NIST [151], cloud computing is a model for enabling ubiquitous, convenient, on- demand network access to a shared pool of configurable computing resources (e.g., networks,

CHAPTER 4. CLOUD-BASED RELIABILITY ASSURANCE FRAMEWORK FOR CPS91 servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

This chapter presents a cloud-based reliability assurance framework called COBRA, which stands for ClOud-Based Reliability Assurance. COBRA employs cloud-based runtime evaluation to conduct reliability assurance for data-intensive CPS. It mitigates the limitations imposed by the large amounts of data transfer required for cloud-based processing using data serialization and messaging systems. Its data quality analysis processes use scheduler and elastic load-handling on demand to achieve a degree of scalability, responsiveness and cost-effectiveness that is not possible with traditional approaches such as local server clustering. COBRA makes use of self-tuning to manage and configure the evaluation system to ensure that it adapts itself to changes in the system and exogenous conditions whilst hiding intrinsic complexity from operators and users. Furthermore, a set of performance metrics is used to evaluate the performance of COBRA.

I have developed a prototype COBRA system, which I implemented and used in real- world experiments with a BMS in a New York City building. The evaluation results showed that it is effective, efficient, scalable and easy to implement. COBRA can also be tested in a simulated environment using historic data or induced reliability issues, including failures caused by fault-injection, human error or abnormal environmental variables.