• No se han encontrado resultados

2.1 REVISION BIBLIOGRAFICA

2.2. BASE TEÓRICA.

2.2.1. LA CAPACITACION CONTINUA DEL DOCENTE.

2.2.1.1. LA CAPACITACION CONTINUA O PERMANENTE.

The distributed algorithm of Figure 3.2 is implemented directly in ABS, which is sub- sequently translated to Haskell code [10], by utilizing the ABS-Haskell [20] transcom- piler (source-to-source compiler). The translated Haskell code is then linked against a Haskell-written parallel and distributed runtime API. Finally, the linked code is compiled by a Haskell compiler (normally, GHC) down to native code and executed directly.

The performance results of an experimental validation of the proposed approach in ABS-Haskell transcompiler is presented in [10]. The parallel runtime treats ABS active objects as Haskell’s lightweight threads (also known as green threads), each listening to its own concurrently-modifiable process queue: a method activation pushes a new continuation to the end of the callee’s process queue. Processes await- ing on futures are lightweight threads that will push back their continuation when the future is resolved; processes awaiting on boolean conditions are continuations which will be put back to the queue when their condition is met. The parallel run- time strives to avoid busy-wait polling both for futures by employing the underlying OS asynchronous event notification system (e.g. epoll, kqueue), and for booleans by retrying the continuations that have part of its condition modified (by mutating fields) since the last release point.

For the distributed runtime we rely on Cloud Haskell [32], a library framework that tries to port Erlang’s distribution model to the Haskell language while adding type-safety to messages. Cloud Haskell code is employed for remote method acti- vation and future resolution: the library provides us means to serialize a remote method call to its arguments plus a static (known at compile time) pointer to the method code. No actual code is ever transferred; the active objects are serialized to unique among the whole network identifiers and futures to unique identifiers to the caller object (simply a counter). The serialized data, together with their types, are then transferred through a network transport layer (TCP,CCI,ZeroMQ); we opted for TCP/IP, since it is well-established and easier to debug. The data are de-serialized on the other end: a de-serialized method call corresponds to a contin- uation which will be pushed to the end of the process queue of the callee object, whereas a de-serialized future value will wake up all processes of the object awaiting on that particular future.

The creation of Deployment Components is done under the hood by contacting the corresponding (cloud) platform provider to allocate a new machine, usually done through a REST API. The executable is compiled once and placed on each created machine which is automatically started as the 1st user process after kernel initialization of the VM has completed.

The choice of Haskell was made mainly for two reasons: the ABS-Haskell back- end seems to be currently the fastest in terms of speed and memory use, attributed perhaps to the close match of the two languages in terms of language features:

Implementation 31 Haskell is also a high-level, statically-typed, purely functional language. Secondly, compared to the distributed implementation sketched in Java [68], the ABS-Haskell runtime utilizes the support of Haskell’s lightweight threads and first-class continu- ations to efficiently implement multicore-enabled cooperative scheduling; Java does not have built-in language support for algebraic datatypes, continuations and its system OS threads (heavyweight) makes it a less ideal candidate to implement co- operative scheduling in a straightforward manner. On the distributed side, layering our solution on top of Java RMI (Remote Method Invocation) framework was de- cided against for lack of built-in support for asynchronous remote method calls and superfluous features to our needs, such as code-transfer and fully-distributed garbage collection.

3.3.1

Implementing Delegation

The distributed algorithm described in Section 3 uses the concept of a delegate for asynchronicity: when the worker actor demands a particular slot of the graph array, it will spawn asynchronously an extra delegate process (line 9) that will only execute when the requested slot becomes available. This execution scheme may be sufficient for preemptive scheduling concurrency (with some safe locking on the active object’s fields), since every delegate process gets a fair time slice to execute; however, in cooperative scheduling concurrency, the described scheme yields sub-optimal results for sufficient large graph arrays. Specifically, the worker actor traverses its partition from left to right (line 3), spawning continuously a new delegate in every step; all these delegates cannot execute until the worker actor has released control, which happens upon reaching the end of its run method (finished traversing the partition). Although at first it may seem that the worker actors do operate in parallel to each other, the accumulating delegates are a space leak that puts pressure on the Garbage Collector and, most importantly, delays execution by traversing the partitioned arrays “twice”, one for the creation of delegates and one for “consuming them”.

A naive solution to this space leak is to change lines 8,9 to a synchronous in- stead method call (i.e. this.delegate(f,current)). However, a new problem arises where each worker actor (and thus its CPU) continually blocks waiting on the network result of the request. This intensely sequentializes the code and defeats the purpose of distributing the workload, since most processors are idling on network communication. The intuition is that modern CPUs operate in much larger speeds than commodity network technologies. To put it differently, the worker’s main cal- culation is much faster than the round-trip time of a request method call to a remote worker. Theoretically, a synchronous approach could only work in a parallel setting where the workers are homogeneous processors and requests are exchanged through shared memory with memory speed near that of the CPU processor. This hypothesis requires further investigation.

1: Unit run(...)

2: for each node i in the partition do

3: for j = 1 to m do step

4: target← random[1..(i−1)2m]

5: current= (i−1)m+j

6: x=whichActor(target)

7: Fut<Int> f =actor[x]!request(target)

8: aliveDelegates=aliveDelegates+ 1

9: this! delegate(f, current)

10: if aliveDelegates=maxBoundWindow then

11: awaitaliveDelegates<=minBoundWindow

Figure 3.3: The modified run method with window of delegates.

We opted instead for a middle-ground, where we allow a window size of dele- gate processes: the worker process continues to create delegate processes until their number reaches the upper bound of the window size; thereafter the worker process releases control so the delegates have a chance to execute. When only the number of alive delegate processes falls under the window’s lower bound, the worker process is allowed to resume execution. This algorithmic description can be straightforwardly implemented in ABS with boolean awaiting and a integer counter field (named

this.aliveDelegates). The modification of therun is shown in Figure 3.3; Similarly the delegate method must be modified to decrease the aliveDelegates counter when the method exits.

Interestingly, the size of the window is dependent on the CPU/Network speed ratio, and the Preferential Attachment model parameters: nodes (n) and degree (d). In [10], the performance results of the PA model presented in this chapter in the Haskell backend are given. We empirically tested and used a fixed window size of [500,2000]. Finding the optimal window size that keeps the CPUs busy while not leaking memory by keeping too much delegates alive for a specific setup (cpu,network,n,d) is planned for future work.

3.4

Conclusion and Future Work

In this chapter, we have presented a high-level distributed-memory algorithm that implements synthesizing artificial graphs based on Preferential Attachment mecha- nism. The algorithm avoids low-level synchronization complexities thanks to ABS, an actor-based modeling framework, and its programming abstractions which sup- port cooperative scheduling. The experimental results for the proposed algorithm presented in [10] suggest that the implementation scales with the size of the dis- tributed system, both in time but more profoundly in memory, a fact that permits the generation of PA graphs that cannot fit in memory of a single system.

Conclusion and Future Work 33 For future work, we are considering combining multiple request messages in a single TCP segment; this change would increase the overall execution speed by having a smaller overhead of the TCP headers and thus less network communication between VMs, and better network bandwidth. In another (orthogonal) direction, we could utilize the many cores of each VM to have a parallel-distributed hybrid implementation in ABS-Haskell for faster PA graph generation.

Part III

Enhancing Parallelism

This part consists of the following chapters:

Chapter 4 Asynchronous Actor-based software programming has gained increas- ing attention as a model of concurrency and distribution. Many modern distributed software applications require a form of continuous interaction between their compo- nents which consists of streaming data from a server to its clients. In this chapter, we extend the basic model of asynchronous method invocation and return in order to support the streaming of data [13]. We introduce the notion of “future-based data streams” by augmenting the syntax, type system, and operational semantics of ABS. The application involving future-based data streams is illustrated by a case study on social network simulation.

Chapter 5 In this chapter we introduce a new programming model of multi- threaded actors which feature the parallel processing of their messages [15]. In this model an actor consists of a group of active objects which share a message queue. We provide a formal operational semantics, and a description of a Java-based implementation for the basic programming abstractions describing multi-threaded actors. Finally, we evaluate our proposal by means of an example application.

Chapter 4

Futures for Streaming Data

4.1

Introduction

Since the rapid growth in big data, data streaming is widely used in many distributed applications, e.g., telecommunications, event-monitoring and detection, and sensor networks. Data streaming is a client/server pattern which, in essence, consists of a continuous generation of data by the server and a sequential and incremental processing of the data by the client. Data streams are naturally processed differently from batch data. Functions cannot operate on data streams as a whole, as the produced data can be unlimited. Hence, new programming abstractions are required for the continuous generation and consumption of data in the streams.

Data streaming is highly relevant in modern distributed systems. Actor-based languages are specifically designed for describing such systems [1]. They provide an event-driven model of concurrency where messages are communicated asyn- chronously and processed by pattern matching mechanism [7]. Concurrent objects generalize this model to programming to interface discipline by modeling messages as asynchronous method invocations. The main contribution of this chapter is to integrate data streaming mechanism with concurrent object systems.

In this chapter, we extend the ABS language in order to support the streaming of data between a server and its clients. We introduce “future-based data streams” which integrates futures and data streams, and which specifies the return type of so-called streaming methods. Upon invocation of such a method a new future is created which holds a reference to the generated stream of data. Data items are added to the stream by the execution of a yield statement. Such a statement takes as parameter an expression the value of which is added to the stream, without terminating the execution of the method. The return statement terminates the execution of a streaming method, and is used to signal the end of data streaming. Even though no new data is produced, the existing data values in the stream buffer can be retrieved by the consumers.

The values generated by the server (the streaming method) can be obtained 37

incrementally and sequentially by a client by querying the future corresponding to this method invocation. By the nature of data streaming, it is natural to restrict the streaming to the asynchronous method calls. Therefore there is no support for synchronous invocation of streaming methods.

In this chapter, we introduce two different implementations of streams tailored to different forms of parallel processing of data streams. Obtaining data from a destructive stream involves the removal of the data, whereas in a non-destructive stream the data persists. Which of the implementation is used is determined by the caller of the streaming method (the creator of the stream) which is not necessarily the consumer of the data stream. The creator can then provide the consumers with a reference to the stream. Both the streaming method (producer) and the consumers which hold a reference to the data stream are not exposed to the underlying imple- mentation of the stream, i.e., these different implementations are not represented by different types of data streams. This allows for a separation of concerns between the generation and processing of data streams, on the one hand, and their orchestration, on the other hand. This also enables reusability of the implementation of producers and consumers for both consumption approaches.

A preliminary discussion of the overall idea underlying this chapter is given in [11]. As an extension, in this chapter we introduce the different implementations of data streams, an operational semantics for both implementations of streams, a new type system which formalizes the integration of futures and data streams, and a proof of type-safety. Further, we show how the basic mechanism in ABS of cooperative scheduling of asynchronously generated method invocations itself can be used to implement data streams and the cooperative scheduling of streaming methods.

As a proof of concept, exploiting a prototype implementation for supporting future-based data streams on top of ABS, we present the usage of the above- mentioned feature in the implementation of a distributed application for the gener- ation of distributed PA (chapter 2 and 3). The notion of data streaming abstracts from the specific implementation of ABS. In our case, we make use of the distributed Haskell backend of ABS [20] for the case study on future-based data streams reported in this chapter.

The overall contribution of this chapter is a formal model of streaming data in the ABS language, which fully complies and generalizes the asynchronous model of computation underlying the ABS language. Since ABS is defined in terms of a formal operational semantics which supports a variety of formal analysis techniques (e.g., deadlock detection [39] and [14]), we thus obtain a general formal framework for the modeling and analysis of different programming techniques for processing data streams, e.g., map-reduce and publish-subscribe [34]. To the best of our knowledge, our work provides a first formal type system and operational semantics for a general notion of streaming data in a high-level actor-based programming language.

Future-Based Data Streams 39 Plan of the chapter This chapter is organized as follows: the notion of a future- based data stream is specified as an extension of ABS in Section 4.2. In section 4.3, it is shown that the well-typedness of a program in the extended ABS is preserved. Section 4.4 discusses the usage of streams in a distributed setting. In section 4.5, an implementation of data streams is given as an API written in ABS. In Section 4.6, a case study on social network simulation is discussed, which uses the proposed notion of streams. Related works are discussed in section 4.7. Finally we conclude in section 4.8.

Documento similar