The author’s research evolved from trialling the SWE specifications to capturing, filtering and processing potentially hundreds or thousands of sensor readings, so the author identified CEP to enable the capture and filtering of sensor data obtained.
CEP is a technology to process events and discover complex patterns among multiple streams of event data. It deals with the task of processing that event data with the goal of identifying the meaningful events within the streams, and deriving meaningful information from them. It appears to be a natural fit with the large scale and high volume data feeds promised by the M2M vision. As Service Orientated Architecture (SOA) methodologies evolve, the ability to react in a timely manner in a push-pull business environment becomes crucial [236].
Event Driven Architecture (EDA) is becoming increasingly relevant to businesses because of recent advancements in software protocols (particularly IP-based) and hardware that have brought about this kind of large-scale, real-time, event driven capability. Enterprises have the desire to be able to understand their critical processes and other aspects of their complex environment and be able to react in a timely and appropriate way. This capability is often termed ‘situational awareness’ or ‘sense-and-
respond’, often used interchangeably with the emphasis on the latter being the ability to react to a situation.
One form of EDA is CEP which is able to analyse events and data from many disparate sources in order to determine patterns of behaviour or to derive a more complex event from the constituent parts. Appropriate application of CEP can reduce the information overload present and provide a level of automation and intelligence-gathering for a more productive and reactive enterprise.
CEP is inherently large-scale, applying rules that aggregate and process large volumes of events and messages. It is a technology to process events and discover complex patterns among multiple streams of event data. It deals with the task of processing multiple streams of event data with the goal of identifying the meaningful events within those streams, and deriving meaningful information from them. Within the M2M context, CEP technology appears to lend itself well to automatic aggregation and analysis needed for the large volumes of observations anticipated from the disparate sensors and other data streams in the real and virtual worlds.
What is a CEP Engine?
A CEP engine is defined by one of the major vendors, SAP’s Sybase CEP (formerly Coral8) as “A software product used to create and deploy applications that process large volumes of incoming messages or events, analyze those messages or events in various ways, and respond to conditions of interest in real-time” [237].CEP offers the ability to aggregate and correlate large volumes of events (such as sensor events) through the real-time processing of continuous queries and apply event pattern matching using logical and temporal event correlation, and window views. Alternative CEP engines are available from EsperTech and StreamBase Systems.
Complex Event Processing (CEP)
Under the IN theme, the author examined how SWE could be extended with CEP;
CEP was investigated to ascertain its potential to harvesting project sensor data obtained and enabling data fusion for WSNs deployed within diverse domains;
CEP has certainly proved to be feasible technology (as well as preferable to a database) to interface with sensor systems in order to handle (capture, store, archive, query, filter and process) sensor data in real-time.
Advantages of CEP:
Ideal for aggregating and correlating thousands of sensor readings;
Dynamic (not static like databases – “turns database concept on its head”) and real-time, complex continuous queries, pattern matching and inference, sliding window views, aggregation, causality, correlation, etc;
Supports complex computation, process large volumes, low latency, scalable and reliable, easy to build, modify and maintain;
Event Processing Language (EPL);
Event Stream Processing (ESP);
Continuous Computation Language (CCL);
CEP data aggregation is a form of data fusion;
Ideal for IN which may incorporate feeds from sensors and sensor systems.
Sybase CEP
SAP’s Sybase products include a server configuration for running their CEP engine and a development studio for the creation and analysis of CEP behaviours. Some of the key elements pertinent to CEP technology are how Sybase has implemented features that address processing latency, data persistence, connectivity and the expressiveness and inherent extensibility of its CEP engine.
An initial analysis of the Sybase product found the following features of particular merit:
Data input and output adaptors for connectivity;
Extensible rules engine through Remote Procedure Calls (RPC) or User Defined Functions (UDF);
SDK that allows the engine to be controlled programmatically and for dynamic queries to be compiled.
The practical evaluation included extending the rules engine so that geospatial functions could be called and form part of the data/event aggregation. This was achieved by using a SOAP proxy already provided by the Sybase engine and whilst using this proxy may introduce additional latency in the system, an alternative, native, route is possible.
CEP Architecture
As depicted in Figure 58, an overview of Sybase’s CEP architecture (formerly Coral8) comprising the Event Processing Server which runs the rules and analytics that are formulated from within the developer interface, Coral8 Studio. Input adaptors provide a mechanism where data feeds can be virtualized and output adaptors allow aggregate/complex events and notifications to interact with dashboards, other applications and middleware.
Figure 58 - CEP Architecture [238]
The Sybase CEP engine comprises a number of developer-configured streams, for the flow of data, and queries that are performed on that data. Of particular interest are the following product features:
Native XML processing;
Latency measurement;
Data persistence;
Connectivity;
Rules engine and extensibility;
Dash-boarding technologies.
Rules Engine and Extensibility
CEP engines typically rely on an internal rules language that performs the event aggregation, filtering and analytics necessary. The rules language used by Coral8 is the Continuous Computation Language (CCL). This is an SQL-like language that has additional features that relate to the continuous way in which queries are run. In a typical relational database, an application will run a query on the entire database, receiving appropriate results. The application must then resubmit that same query on a regular basis to obtain up-to-date results. In a CEP engine, the queries are run as the data is received, ensuring a near real-time response. Once a query is received by the engine and is active, it is run every time an event is received.
Geospatial Extension
Part of an evaluation by the author and Dr. Churcher of Coral8 CEP engine was to extend its core CCL functionality to be able to call geospatial functions directly from within the rules. Geospatial Reasoning is the capability to access and analyze all kinds of data that has a geographical or spatial context to it. Application domains relevant to BT and its customers are diverse and include Supply Chain Management, Asset Lifecycle Management, Field Force Scheduling, infrastructure and environmental
monitoring, and transport sector applications such as freight management, traffic management and congestion charging.
Geospatial Reasoning offers the potential to ask richer questions about our environment, to correlate previously unrelated sensors and their observations, and fuse data with features from richer geographical data sets. Geospatial data, location-based-services and ‘whereness’ all play an essential part for many different lines of business in BT, from the products it offers 3rd parties to the day-to-day operation of the business.
Geospatial reasoning functionality can be provided via spatially-enabled databases such as Postgres/PostGIS and Oracle Spatial, or directly through a geospatial library available for a variety of platforms. A popular, Open Source, library is the Java Topology Suite (JTS). There is a .NET port called NetTopologySuite (NTS) which exposes the same core functionality. JTS and NTS allow a programmer to define geometric types, such as Point or Polygon, and to perform geometric functions that transform, or modify them or ask questions regarding their spatial relationship, for example, whether they intersect.
Applying Geospatial Reasoning to CEP allows the developer to introduce a spatial constraint to the data being received and derive complex events that have some geometric relationship. A simple example might centre around monitoring weather stations and deriving the risk of fire occurring in a localised area based on matching weather conditions on a local basis.
Figure 59 - Extending Coral8 with Geospatial Functions [239]
The exercise undertaken was to determine how a geospatial library such as JTS or NTS can be accessed from within the CCL rules engine of Coral8. As discussed above, Coral8 provides two main ways of extending its CCL rules engine: a User Defined Function (UDF) or Remote Procedure Call (RPC). The
main advantage of a UDF is that it is managed and run as an in-process call and subsequently it typically runs with less latency than a call made with RPC. The disadvantage of using UDF is that it must be written in C/C++ with particular interface entry points. The mapping between the CCL engine and the library containing the function is defined as static XML in the server configuration file.
RPCs are also defined in the server configuration file, but can be written effectively in any language. A proxy written in C/C++ provides the ability to interface with the program or service providing the functionality. Whilst this appears to have the same disadvantages to UDF, Coral8 have provided a number of proxies for interfacing with technologies. The SOAP proxy, commonly used to interface with web services, was used to map geospatial function calls in CCL with the NTS (.NET) geospatial library. Figure 59 illustrates the overall architecture. The server configuration file specifies the mapping from a function name in CCL to the SOAP proxy and the appropriate arguments that identify the web service.
For the exercise carried out by the author and Dr. Churcher, a number of functions in the NTS geospatial library were wrapped in a web service, along with access to a Postgres/PostGIS database containing a subset of Ordinance Survey MasterMap data. A data feed from a virtual set of water gauge sensors was created and fed into a complex query that could then determine whether there were any water gauge sensors reading above a threshold level that were within a specified distance of one another. Within Coral8 Studio, the formulation and testing of the exercise was straightforward with additional benefits such as being able to identify the latency incurred by using the SOAP proxy and web service approach. Whilst the latency was greater than that anticipated by an approach using only UDF, it was felt an appropriate exercise since creating the appropriate proxies is a straight-forward engineering exercise.