• No se han encontrado resultados

CAPÍTULO III. INTEGRANDO LOS CONCEPTOS DE CULTURA Y CLIMA ORGANIZACIONAL

3.1 DIFERENCIAS Y ÁREAS DE CONVERGENCIA

3.1.2. Áreas de convergencia entre la cultura y el clima

Parallelism involves partitioning, distributing and processing the incoming events in multiple computing nodes. Partitioning has its origin in centralized systems where a single file or tablespace were too large to handle within a single piece of hardware. Distributed databases used data partitioning by placing relational segments in multi- ple sites exploiting factors such as spreading the I/O bandwidth through reading and writing in parallel.

The most common strategy is to distribute tuples through round-robin, range or hash partitioning. Round robin partitioning is a technique where tuples are split sequen- tially in multiple computing nodes. Range partitioning is a technique where subse- quent events are transferred to one computing node, move forward to the next node and returns to the first node. Range partitioning uses associative scans to distribute the tuples based on a particular range of the attribute value. Hash partitioning is a technique where tuples are distributed by applying a hashing function to an attribute value.

Both hash and range partitioning specify the distribution of the tuple on a few com- puting nodes or a disk, avoiding the overhead of complete search in all the computing

S E L E C T p e r s o n _ n a m e , avg ( accel - x ) /* the o u t p u t a t t r i b u t e ( s ) */ F R O M a c c e l e r o m e t e r _ l o g s /* the i n p u t r e l a t i o n */

W H E R E p e r s o n _ i d = ’201 ’; /* the p r e d i c a t e */

Figure 8.1: Example of the scan of acceleration logs to compute average

nodes registered by the system. The range partitioning clusters the tuples with sim- ilar attributes in the same partition; however, the hashing tends to randomize the incoming data rather than clustering it. The problem with the range partitioning is in data skew, where all the data is placed in one partition or the computing node and the execution skew in which all the execution occurs in one partition. Hashing and round robin partitioning are less susceptible to the issues surrounding skew. Some partitioning techniques use frequency of event or tuple access as one of the factors to partition and spread the data access to the partitions rather than the actual number of tuples or the attribute.

For example the system receives an event stream from accelerometers tied to the individuals residing in a care home. One of the basic arithmetic functions such as average could be used on the events as illustrated in Figure 8.1. In Figure 8.2, the events are re-directed between three nodes. The label within each node represents the identification of the computing node. The three partitioning techniques stated above are demonstrated as illustrated in Figure 8.2 to narrate the simple use case of computing average. The events are split in three different ways and the results are obtained through aggregation in the final node. Each node performs the sum of the events in x axis, called accel-x, and maintains the count of the events. The overall output is computed using the aggregate node, through the average operation as described below:

Average = Σ3i=0accel−x

Σ3

i=0eventcounti

The position of the events in the multidimensional space is determined by the list of attribute values defined by the multi-dimensional partitioning schemes. The partition dimension or the attribute can be decided based on the performance requirements such as throughput or response time. The number of nodes can be increased based on the events arrival. Grouping computing nodes by adjacent occurrences of event types through hashing techniques or clustering events through range partitioning or sequential event distribution in round robin partitioning might improve the perfor- mance specific to a few use cases. This example of computing an average through par- titioning of events is a simple one, where computational dependency between events is very minimal. Whether partitioning distributes events to few nodes or all nodes is

Figure 8.2: Illustration on types of stream partitioning process

an orthogonal issue. Each sophisticated partitioning improves the response time and is reserved for future work.

The main objective behind parallelization of the EPN is to distribute the incoming events to the multiple EPNs residing in multiple computing nodes with automated redirection of the events. Each EPN deployed in a computing node performs identical tasks on the arriving events. Each attribute of the event can be viewed as a co- ordinate along the (attribute) dimension. Each event tuple consists of a list of co- ordinate values and is viewed as a data point in the multidimensional space. Given the nature of the use cases, a range partitioning is used as the default strategy to test the Inter-EPN parallelization. The semantic integrity of the results is one of the main areas of this qualitative research on Inter-EPN parallelization. The results emerging out of the Inter-EPN parallelization should be equivalent to the results obtained by processing the events in a single computing node. From the performance point of view, one seeks to balance the incoming events to each computing node registered in the system.

In order to implement the Inter-EPN parallelization, application independent algo- rithms were implemented in Java to direct the input event streams and to manage the output event aggregation. The event processing engine (ESPER) maintain local memory holding the previously computed disjoint intervals of the arriving events to a certain point in past. In this research, it is assumed that incoming events arrive in an orderly manner. EPNs use continuous queries to filter the event streams and/or push the results further down to other event streams. The EPNs make use of the views which are similar to the SQL tables to hold multiple events for the pattern recognition. This research use sequence-based views with expiry policies (pre-determined count of incoming events) through sliding windows. The features from the event processing engine is used to create the sequence based views of the incoming events. Complex processing such as aggregation and grouping are performed on the range of the events in a particular sequence of events.Based on these reasons range partitioning is use exclusively in this research mainly focusing on the ambient kitchen use case. Other types of partitioning can be utilised to improve efficiency. However, this is beyond the scope of this research.